拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)

参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。

可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。

https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。

1234+567+89+0+0的语法树生成过程☟

(1234+567)/(89-0)的语法树生成过程☟

可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)

词法分析器的生成器

  • 分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)

  • 支持全Unicode字符、/后缀、前缀<'Vt'>、状态信号<signal1, signal2, ..>,便于识别id = refIdstruct type_name<Comment>[^*\n]*subroutine(type_name_list)这类情况。类似lex,但不完全相同。

  • 无须显式书写Token类型、状态信号、保留字,即无须lex中的%s NUM ID ..%x Comment End Text ..[(] { return ('('); }

  • 注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。

  • 生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。

语法分析器的生成器

  • 分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。

  • 支持优先级指令%nonassoc%left%right%prec,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。

  • 注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。

  • 生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。

  • 生成nullable、FIRST集、FOLLOW集的文档。

其他

  • 无须lex和yacc中的%option%union%define%{%}%parse-param%lex-param%pure-parser%expect%token%type。做成类库,直接按如下方式调用即可:
csharp 复制代码
var compiler = new CompilerXxx();
var sourceCode = File.ReadAllText("input.st");
var tokens = compiler.Analyze(sourceCode);
var syntaxTree = compiler.Parse(tokens);
var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);
// use extractedObj for user-defined business ..
  • 支持多行注释指令%blockComment on/off和单行注释指令%inlineComment on/off。默认格式同C语言的/**///,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
smalltalk 复制代码
%%#[^\r\n]*%% 'inlineComment'
  • 生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。

  • 大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。

点击查看 其他功能

  • 支持Scope范围指令%validScopeChars和全局范围指令%validGlobalChars,默认范围均为[\u0001-\uFFFF](即除'\0'外的全部Unicode字符),可自定义其范围。

  • 支持%omit指令,可指定要忽略的空白符。默认为'空格''\t''\n''\r''\0'

  • 支持%start指定起始语法结点。

举例-Calc.st

输入文件Calc.st

能够处理加减乘除和括号运算的解析器,其文法如下:

smalltalk 复制代码
// 输入文件Calc.st
Exp    : Exp '+' Term
       | Exp '-' Term
       | Term ;
Term   : Term '*' Factor
       | Term '/' Factor
       | Factor ;
Factor : '(' Exp ')'
       | 'number' ;

%%[0-9]+%% 'number' // 示例只处理非负整数
//无须书写 %%[+]%% '+' 等

据此文法,我们可以生成下述内容:

生成的词法分析器

生成的ε-NFA的词法分析器的状态图如下:

生成的miniDFA的词法分析器的状态图如下:

点击查看 生成的 终结点Vt和非终结点Vn 代码

csharp 复制代码
// 如不需要,可删除此数组
public static readonly IReadOnlyList<string> stArray = new string[] {
    "'¥'", // @终 = 0;
    "'+'", // @Plus符 = 1; // '+'
    "'-'", // @Dash符 = 2; // '-'
    "'*'", // @Asterisk符 = 3; // '*'
    "'/'", // @Slash符 = 4; // '/'
    "'('", // @LeftParenthesis符 = 5; // '('
    "')'", // @RightParenthesis符 = 6; // ')'
    "'number'", // @number = 7; // 'number'
    // end of 1 + 7 Vt
    "Exp", // Exp枝 = 8; // Exp
    "Term", // Term枝 = 9; // Term
    "Factor", // Factor枝 = 10; // Factor
    // end of 3 Vn
};
/// <summary>
/// Vt types are used both for lexical-analyze and syntax-parse.
/// <para>Vn types are only for syntax-parse.</para>
/// <para>Vt is quoted in ''.</para>
/// <para>Vn is not quoted in ''.</para>
/// </summary>
public static class st {
    // Vt
    /// <summary>
    /// Something wrong within the source code.
    /// </summary>
    public const int Error错 = -1; // "'×'";

    /// <summary>
    /// end of token list.
    /// </summary>
    public const int @终 = 0; // "'¥'";
   
    /// <summary>
    /// '+'
    /// </summary>
    public const int @Plus符 = 1; // "'+'"
    /// <summary>
    /// '-'
    /// </summary>
    public const int @Dash符 = 2; // "'-'"
    /// <summary>
    /// '*'
    /// </summary>
    public const int @Asterisk符 = 3; // "'*'"
    /// <summary>
    /// '/'
    /// </summary>
    public const int @Slash符 = 4; // "'/'"
    /// <summary>
    /// '('
    /// </summary>
    public const int @LeftParenthesis符 = 5; // "'('"
    /// <summary>
    /// ')'
    /// </summary>
    public const int @RightParenthesis符 = 6; // "')'"
    /// <summary>
    /// 'number'
    /// </summary>
    public const int @number = 7; // "'number'"
    /// <summary>
    /// count of ('¥' + user-defined Vt)
    /// </summary>
    public const int VtCount = 8;


    // Vn
    /// <summary>
    /// Exp
    /// </summary>
    public const int Exp枝 = 8; // "Exp"
    /// <summary>
    /// Term
    /// </summary>
    public const int Term枝 = 9; // "Term"
    /// <summary>
    /// Factor
    /// </summary>
    public const int Factor枝 = 10; // "Factor"
}

点击查看 生成的 保留字(即一个语言中的keyword) 相关代码

csharp 复制代码
public static class reservedWord {
    /// <summary>
    /// +
    /// </summary>
    public const string @Plus符 = "+";
    /// <summary>
    /// -
    /// </summary>
    public const string @Dash符 = "-";
    /// <summary>
    /// *
    /// </summary>
    public const string @Asterisk符 = "*";
    /// <summary>
    /// /
    /// </summary>
    public const string @Slash符 = "/";
    /// <summary>
    /// (
    /// </summary>
    public const string @LeftParenthesis符 = "(";
    /// <summary>
    /// )
    /// </summary>
    public const string @RightParenthesis符 = ")";
}

/// <summary>
/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.
/// <para>otherwise, return false.</para>
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
private static bool CheckReservedWord(AnalyzingToken token) {
    bool isReservedWord = true;
    switch (token.value) {
    case reservedWord.@Plus符: token.type = st.@Plus符; break;
    case reservedWord.@Dash符: token.type = st.@Dash符; break;
    case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;
    case reservedWord.@Slash符: token.type = st.@Slash符; break;
    case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;
    case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break;

    default: isReservedWord = false; break;
    }

    return isReservedWord;
}

以下是用8个Action<LexicalContext, char, CurrentStateWrapper>函数委托实现的词法状态转换表
点击查看 生成的 lexi状态0 相关代码

csharp 复制代码
// lexicalState0
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [0-9] */
	else if (/* possible Vt : 'number' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState1;
	}
	/* \) */
	else if (/* possible Vt : ')' */
	c == ')'/*'\u0029'(41)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState2;
	}
	/* \( */
	else if (/* possible Vt : '(' */
	c == '('/*'\u0028'(40)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState3;
	}
	/* \/ */
	else if (/* possible Vt : '/' */
	c == '/'/*'\u002F'(47)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState4;
	}
	/* \* */
	else if (/* possible Vt : '*' */
	c == '*'/*'\u002A'(42)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState5;
	}
	/* - */
	else if (/* possible Vt : '-' */
	c == '-'/*'\u002D'(45)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState6;
	}
	/* \+ */
	else if (/* possible Vt : '+' */
	c == '+'/*'\u002B'(43)*/) {
		BeginToken(context);
		ExtendToken(context);
		wrapper.currentState = lexicalState7;
	}
	/* deal with everything else. */
	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
		wrapper.currentState = lexicalState0; // skip them.
	}
	else { // unexpected char.
		BeginToken(context);
		ExtendToken(context);
		AcceptToken(st.Error, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态1 相关代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [0-9] */
	else if (/* possible Vt : 'number' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		ExtendToken(context);
		wrapper.currentState = lexicalState1;
	}
	/* deal with everything else. */
	else {
		AcceptToken(st.@number, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态2 相关代码

csharp 复制代码
// lexicalState2
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@RightParenthesis符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态3 相关代码

csharp 复制代码
// lexicalState3
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@LeftParenthesis符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态4 相关代码

csharp 复制代码
// lexicalState4
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Slash符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态5 相关代码

csharp 复制代码
// lexicalState5
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Asterisk符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态6 相关代码

csharp 复制代码
// lexicalState6
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Dash符, context);
		wrapper.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态7 相关代码

csharp 复制代码
// lexicalState7
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =
static (context, c, wrapper) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Plus符, context);
		wrapper.currentState = lexicalState0;
	}
};

其调用过程如下:

csharp 复制代码
// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
    var context = new LexicalContext(sourceCode);
    var wrapper = new CurrentStateWrapper(this.initialState);
    do {
		char currentChar = context.MoveForward();
        wrapper.currentState(context, currentChar, wrapper);
        // wrapper.currentState will be set to next lexi-state.
    } while (!context.EOF);

    return context.result;
}

下面是用一个二维数组ElseIf[][]实现的词法状态转换表,其占用空间较少,且执行效率也有所提高。

csharp 复制代码
private static readonly ElseIf[][] lexiStates = new ElseIf[8][];
static void InitializeLexiTable() {
	lexiStates[0] = new ElseIf[] {
	// possible Vt: '('
	/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),
	// possible Vt: ')'
	/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),
	// possible Vt: '*'
	/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),
	// possible Vt: '+'
	/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),
	// possible Vt: '-'
	/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),
	// possible Vt: '/'
	/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),
	// possible Vt: 'number'
	/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),
	};
	lexiStates[1] = new ElseIf[] {
	// possible Vt: 'number'
	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),
	// possible Vt: 'number'
	/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),
	};
	lexiStates[2] = new ElseIf[] {
	// possible Vt: ')'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),
	};
	lexiStates[3] = new ElseIf[] {
	// possible Vt: '('
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),
	};
	lexiStates[4] = new ElseIf[] {
	// possible Vt: '/'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),
	};
	lexiStates[5] = new ElseIf[] {
	// possible Vt: '*'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),
	};
	lexiStates[6] = new ElseIf[] {
	// possible Vt: '-'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),
	};
	lexiStates[7] = new ElseIf[] {
	// possible Vt: '+'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),
	};
}

顾名思义,一个ElseIf与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }的作用相同。这样,就用一个ElseIf[]取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf
点击查看 调用二维数组实现的词法分析器 相关代码

csharp 复制代码
// skip '\0' at lexi-state 0
private static readonly ElseIf skipZero = new(
    char.MinValue, char.MaxValue, Acts.None,
    nextState: 0);
// construct a error token at lexi-state 0
private static readonly ElseIf unexpectedChar = new(
    char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,
    nextState: 0, -1);// -1 means error("'×'");
// construct a error token at other lexi-states
private static readonly ElseIf errorToken = new(
    char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,
    nextState: 0, -1);// -1 means error("'×'");

// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
    var context = new LexicalContext(sourceCode);
    var currentStateId = 0;
    do {
        // read current char,
        char currentChar = context.MoveForward();
        ElseIf[] lexiState = lexiStates[currentStateId];
        // binary search for the segment( else if (left <= c && c <= right) { ... } )
        var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);
        if (segment is null) {
            if (currentStateId == 0) { // the initial state of lexical analyze.
                segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);
                if (segment is null) {
                    // '\0' must be skipped.
                    if (currentChar == '\0') { segment = skipZero; }
                    else { segment = unexpectedChar; }
                }
            }
            else { // token with error type
                segment = errorToken;
            }
        }
        // construct the next token,
        var scripts = segment.scripts;
        if (scripts != 0) { // it is 0 in most cases.
            if ((scripts & Acts.Begin) != 0) {
                this.beginToken(context);
            }
            if ((scripts & Acts.Extend) != 0) {
                this.extendToken(context);
            }

            if ((scripts & Acts.Accept) != 0) {
                this.acceptToken(context, segment.Vt);
            }
            else if ((scripts & Acts.Accept2) != 0) {
                this.acceptToken2(context, segment.ifVts);
            }
        }
        // point to next state.
        currentStateId = segment.nextStateId;
    } while (!context.EOF);

    return context.result;
}

private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {
    var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);
    if (right < 0) { }
    else {
        var result = -1;
        while (left < right) {
            var mid = (left + right) / 2;
            var current = lexiState[mid];
            if (currentChar < current.min) { result = -1; }
            else if (current.max < currentChar) { result = 1; }
            else { return current; }

            if (result < 0) { right = mid; }
            else { left = mid + 1; }
        }
        {
            var current = lexiState[left];
            if (current.min <= currentChar && currentChar <= current.max) {
                return current;
            }
        }
    }
    if (specialLast) {
        var last = lexiState[lexiState.Length - 1];
        return last;
        // no need to compare, because it's ['\0', '\uFFFF']
        //if (last.min <= currentChar && currentChar <= last.max) {
        //    return last;
        //}
    }

    return null;
}

生成的语法分析器

点击查看 nullable、FIRST集、FOLLOW集

C 复制代码
nullable:
[0]: nullable( Exp' ) = False
[1]: nullable( Exp ) = False
[2]: nullable( Term ) = False
[3]: nullable( Factor ) = False
[4]: nullable( '¥' ) = False
[5]: nullable( '+' ) = False
[6]: nullable( '-' ) = False
[7]: nullable( '*' ) = False
[8]: nullable( '/' ) = False
[9]: nullable( '(' ) = False
[10]: nullable( ')' ) = False
[11]: nullable( 'number' ) = False

FIRST集:
[0]: FIRST( Exp' ) = { '(' 'number' }
[1]: FIRST( Exp ) = { '(' 'number' }
[2]: FIRST( Term ) = { '(' 'number' }
[3]: FIRST( Factor ) = { '(' 'number' }
[4]: FIRST( '¥' ) = { '¥' }
[5]: FIRST( '+' ) = { '+' }
[6]: FIRST( '-' ) = { '-' }
[7]: FIRST( '*' ) = { '*' }
[8]: FIRST( '/' ) = { '/' }
[9]: FIRST( '(' ) = { '(' }
[10]: FIRST( ')' ) = { ')' }
[11]: FIRST( 'number' ) = { 'number' }
[12]: FIRST( Exp '+' Term ) = { '(' 'number' }
[13]: FIRST( Exp '-' Term ) = { '(' 'number' }
[14]: FIRST( Term '*' Factor ) = { '(' 'number' }
[15]: FIRST( Term '/' Factor ) = { '(' 'number' }
[16]: FIRST( '(' Exp ')' ) = { '(' }

FOLLOW集:
[0]: FOLLOW( Exp' ) = { '¥' }
[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }
[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }
[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }

点击查看 生成的 规约规则 代码

csharp 复制代码
public static IReadOnlyList<Regulation> Regulations => regulations;
private static readonly Regulation[] regulations = new Regulation[] {
	// [0]=Exp : Exp '+' Term ;
	new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝), 
	// [1]=Exp : Exp '-' Term ;
	new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝), 
	// [2]=Exp : Term ;
	new(2, st.Exp枝, st.Term枝), 
	// [3]=Term : Term '*' Factor ;
	new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝), 
	// [4]=Term : Term '/' Factor ;
	new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝), 
	// [5]=Term : Factor ;
	new(5, st.Term枝, st.Factor枝), 
	// [6]=Factor : '(' Exp ')' ;
	new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符), 
	// [7]=Factor : 'number' ;
	new(7, st.Factor枝, st.@number), 
};

点击查看 生成的 LALR(1)语法分析表 代码

csharp 复制代码
const int syntaxStateCount = 16;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 78 actions
	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 5);
	states[1] = new(capacity: 3);
	states[2] = new(capacity: 6);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 5);
	states[5] = new(capacity: 6);
	states[6] = new(capacity: 4);
	states[7] = new(capacity: 4);
	states[8] = new(capacity: 3);
	states[9] = new(capacity: 3);
	states[10] = new(capacity: 3);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	states[14] = new(capacity: 6);
	states[15] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times
	LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times
	LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times
	LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 4 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce7 = new(regulations[7]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 4 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 4 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	LRParseAction aReduce6 = new(regulations[6]);// refered 6 times
	#endregion re-used actions

	// 78 actions
	// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Term ;☕ '-' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Term ;☕ '-' '+' '¥' 
	// [2] Exp : ⏳ Term ;☕ '-' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));
	/*1*/states[0].Add(st.Term枝, aGoto2);
	/*2*/states[0].Add(st.Factor枝, aGoto3);
	/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);
	/*4*/states[0].Add(st.@number, aShift5);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Term ;☕ '-' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Term ;☕ '-' '+' '¥' 
	/*5*/states[1].Add(st.@Plus符, aShift6);
	/*6*/states[1].Add(st.@Dash符, aShift7);
	/*7*/states[1].Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [2] Exp : Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*8*/states[2].Add(st.@Asterisk符, aShift8);
	/*9*/states[2].Add(st.@Slash符, aShift9);
	/*10*/states[2].Add(st.@Dash符, aReduce2);
	/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);
	/*12*/states[2].Add(st.@Plus符, aReduce2);
	/*13*/states[2].Add(st.@终, aReduce2);
	// syntaxStates[3]:
	// [5] Term : Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*14*/states[3].Add(st.@Dash符, aReduce5);
	/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);
	/*16*/states[3].Add(st.@Asterisk符, aReduce5);
	/*17*/states[3].Add(st.@Slash符, aReduce5);
	/*18*/states[3].Add(st.@Plus符, aReduce5);
	/*19*/states[3].Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [6] Factor : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Term ;☕ '-' ')' '+' 
	// [1] Exp : ⏳ Exp '-' Term ;☕ '-' ')' '+' 
	// [2] Exp : ⏳ Term ;☕ '-' ')' '+' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));
	/*21*/states[4].Add(st.Term枝, aGoto2);
	/*22*/states[4].Add(st.Factor枝, aGoto3);
	/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);
	/*24*/states[4].Add(st.@number, aShift5);
	// syntaxStates[5]:
	// [7] Factor : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*25*/states[5].Add(st.@Dash符, aReduce7);
	/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);
	/*27*/states[5].Add(st.@Asterisk符, aReduce7);
	/*28*/states[5].Add(st.@Slash符, aReduce7);
	/*29*/states[5].Add(st.@Plus符, aReduce7);
	/*30*/states[5].Add(st.@终, aReduce7);
	// syntaxStates[6]:
	// [0] Exp : Exp '+' ⏳ Term ;☕ '-' ')' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));
	/*32*/states[6].Add(st.Factor枝, aGoto3);
	/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);
	/*34*/states[6].Add(st.@number, aShift5);
	// syntaxStates[7]:
	// [1] Exp : Exp '-' ⏳ Term ;☕ '-' ')' '+' '¥' 
	// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));
	/*36*/states[7].Add(st.Factor枝, aGoto3);
	/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);
	/*38*/states[7].Add(st.@number, aShift5);
	// syntaxStates[8]:
	// [3] Term : Term '*' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));
	/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);
	/*41*/states[8].Add(st.@number, aShift5);
	// syntaxStates[9]:
	// [4] Term : Term '/' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));
	/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);
	/*44*/states[9].Add(st.@number, aShift5);
	// syntaxStates[10]:
	// [6] Factor : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Term ;☕ '-' ')' '+' 
	// [1] Exp : Exp ⏳ '-' Term ;☕ '-' ')' '+' 
	/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));
	/*46*/states[10].Add(st.@Plus符, aShift6);
	/*47*/states[10].Add(st.@Dash符, aShift7);
	// syntaxStates[11]:
	// [0] Exp : Exp '+' Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*48*/states[11].Add(st.@Asterisk符, aShift8);
	/*49*/states[11].Add(st.@Slash符, aShift9);
	/*50*/states[11].Add(st.@Dash符, aReduce0);
	/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);
	/*52*/states[11].Add(st.@Plus符, aReduce0);
	/*53*/states[11].Add(st.@终, aReduce0);
	// syntaxStates[12]:
	// [1] Exp : Exp '-' Term ⏳ ;☕ '-' ')' '+' '¥' 
	// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥' 
	/*54*/states[12].Add(st.@Asterisk符, aShift8);
	/*55*/states[12].Add(st.@Slash符, aShift9);
	/*56*/states[12].Add(st.@Dash符, aReduce1);
	/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);
	/*58*/states[12].Add(st.@Plus符, aReduce1);
	/*59*/states[12].Add(st.@终, aReduce1);
	// syntaxStates[13]:
	// [3] Term : Term '*' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*60*/states[13].Add(st.@Dash符, aReduce3);
	/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);
	/*62*/states[13].Add(st.@Asterisk符, aReduce3);
	/*63*/states[13].Add(st.@Slash符, aReduce3);
	/*64*/states[13].Add(st.@Plus符, aReduce3);
	/*65*/states[13].Add(st.@终, aReduce3);
	// syntaxStates[14]:
	// [4] Term : Term '/' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*66*/states[14].Add(st.@Dash符, aReduce4);
	/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);
	/*68*/states[14].Add(st.@Asterisk符, aReduce4);
	/*69*/states[14].Add(st.@Slash符, aReduce4);
	/*70*/states[14].Add(st.@Plus符, aReduce4);
	/*71*/states[14].Add(st.@终, aReduce4);
	// syntaxStates[15]:
	// [6] Factor : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	/*72*/states[15].Add(st.@Dash符, aReduce6);
	/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);
	/*74*/states[15].Add(st.@Asterisk符, aReduce6);
	/*75*/states[15].Add(st.@Slash符, aReduce6);
	/*76*/states[15].Add(st.@Plus符, aReduce6);
	/*77*/states[15].Add(st.@终, aReduce6);
	#endregion init actions of syntax states
}

生成的LALR(1)语法分析器的状态图和状态表如下:

由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。

状态 '+' '-' '*' '/' '(' ')' 'number' '¥' Exp Term Factor
0 S4 S5 G1 G2 G3
1 S6 S7
2 R[2] R[2] S8 S9 R[2] R[2]
3 R[5] R[5] R[5] R[5] R[5] R[5]
4 S4 S5 G10 G2 G3
5 R[7] R[7] R[7] R[7] R[7] R[7]
6 S4 S5 G11 G3
7 S4 S5 G12 G3
8 S4 S5 G13
9 S4 S5 G14
10 S6 S7 S15
11 R[0] R[0] S8 S9 R[0] R[0]
12 R[1] R[1] S8 S9 R[1] R[1]
13 R[3] R[3] R[3] R[3] R[3] R[3]
14 R[4] R[4] R[4] R[4] R[4] R[4]
15 R[6] R[6] R[6] R[6] R[6] R[6]

上表中:

  • S6表示Shift并进入状态6

  • R[2]表示用regulations[2]=Exp : Term ;规约

  • G1表示进入状态1

  • ✅表示Accept,即分析完毕,语法树生成成功;

  • 空白的地方表示遇到语法错误

点击查看 调用语法分析表 代码

csharp 复制代码
public SyntaxTree Parse(TokenList tokenList) {
    var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);
    var accept = false;
    do {
        Token token = context.CurrentToken;
        int nodeType = token.type;// auto-convert from string to string.
        while (nodeType == blockComment || nodeType == inlineComment) {
            // skip comment token
            context.cursor++;
            token = context.CurrentToken;
            nodeType = token.type;// auto-convert from string to string.
        }

        Dictionary<int/*Node.type*/, LRParseAction> currentState =
            context.stateStack.Peek();
        if (currentState.TryGetValue(nodeType, out var parseAction)) {
            parseAction.Execute(context);
            accept = parseAction.kind == LRParseAction.Kind.Accept;
        }
        else { // syntax error happened.
            return new SyntaxTree(context);
        }
    } while (!accept);

    var root = context.root;
    Debug.Assert(root != null);
    return new SyntaxTree(root);
}

public class LRParseAction {
    public enum Kind {
        Error,
        Shift,
        Reduce,
        Goto,
        Accept,
    }

    public readonly Kind kind;
    [StructLayout(LayoutKind.Explicit)]
    struct Union {
        [FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;
        [FieldOffset(0)] public Regulation regulation;
    }
    private Union union;

    // 执行分析动作。
    public void Execute(LRSyntaxContext context) {
        switch (this.kind) {
        case Kind.Error: { throw new NotImplementedException(); }
        //break;
        case Kind.Shift: {
            var token = context.CurrentToken;
            var leaf = new LRNode(token);
            context.nodeStack.Push(leaf);
            var nextState = this.union.nextState;
            context.stateStack.Push(nextState);
            // prepare for next loop.
            context.cursor++;
        }
        break;
        case Kind.Reduce: {
            var regulation = this.union.regulation;
            int count = regulation.right.Length;
            var children = new LRNode[count];
            var start = Token.empty; LRNode? lastNode = null;
            var first = true;
            for (int i = 0; i < count; i++) {
                var state = context.stateStack.Pop();//只弹出,不再使用。
                var node = context.nodeStack.Pop();
                children[count - i - 1] = node;
                if (node.start.index >= 0) { // this node includes token
                    if (first) { lastNode = node; first = false; }
                    start = node.start;
                }
            }
            int tokenCount = 0;
            if (lastNode is not null) {
                tokenCount =
                    lastNode.start.index   // comment tokens inside of parent are included.
                    - start.index          // comment tokens before parent are excluded.
                    + lastNode.tokenCount; // comment tokens after parent are excluded.
            }
            var parent = new LRNode(regulation, start, tokenCount, children);
            for (var i = 0; i < count; i++) { children[i].parent = parent; }
            context.nodeStack.Push(parent);
            // goto next syntax-state
            Dictionary<int/*Node.type*/, LRParseAction> currentState =
               context.stateStack.Peek();
            var nodeType = regulation.left;
            if (currentState.TryGetValue(nodeType, out var parseAction)) {
                parseAction.Execute(context); // parseAction is supposed to be a Goto action
            }
            Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);
        }
        break;
        case Kind.Goto: {
            var nextState = this.union.nextState;
            context.stateStack.Push(nextState);
        }
        break;
        case Kind.Accept: {
            var state = context.stateStack.Pop();
            context.root = context.nodeStack.Pop();
            context.Finish(context.root, 40, context.stArray);
        }
        break;
        default: { throw new NotImplementedException(); }
        }
    }
}

生成的语义内容提取器

举例来说,对于1234+567+89+0+0这个输入,Calc.st经过语法分析得到的语法树如下所示:(语法树的生成过程见本文开头)

smalltalk 复制代码
R[0]=Exp : Exp '+' Term ;⛪T[0->8]
 ├─R[0]=Exp : Exp '+' Term ;⛪T[0->6]
 │  ├─R[0]=Exp : Exp '+' Term ;⛪T[0->4]
 │  │  ├─R[0]=Exp : Exp '+' Term ;⛪T[0->2]
 │  │  │  ├─R[2]=Exp : Term ;⛪T[0]
 │  │  │  │  └─R[5]=Term : Factor ;⛪T[0]
 │  │  │  │     └─R[7]=Factor : 'number' ;⛪T[0]
 │  │  │  │        └─T[0]='number' 1234
 │  │  │  ├─T[1]='+' +
 │  │  │  └─R[5]=Term : Factor ;⛪T[2]
 │  │  │     └─R[7]=Factor : 'number' ;⛪T[2]
 │  │  │        └─T[2]='number' 567
 │  │  ├─T[3]='+' +
 │  │  └─R[5]=Term : Factor ;⛪T[4]
 │  │     └─R[7]=Factor : 'number' ;⛪T[4]
 │  │        └─T[4]='number' 89
 │  ├─T[5]='+' +
 │  └─R[5]=Term : Factor ;⛪T[6]
 │     └─R[7]=Factor : 'number' ;⛪T[6]
 │        └─T[6]='number' 0
 ├─T[7]='+' +
 └─R[5]=Term : Factor ;⛪T[8]
    └─R[7]=Factor : 'number' ;⛪T[8]
       └─T[8]='number' 0

从左上到右下,连续4个R[0]显著增加了树的深度。

bitParser自动生成的语义内容提取器,会后序遍历此语法树,提取结点信息。
点击查看 通用的 遍历语法树 相关代码

csharp 复制代码
// Extract some data structure from syntax tree.
// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>
public T? Extract(LRNode root, TokenList tokens, string sourceCode) {
	var context = new TContext<T>(root, tokens, sourceCode);

	var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();
	// init stack.
	{
		// push nextLeft and its next pending children.
		var nextLeft = root; var index = 0;
		nodeStack.Push(nextLeft); indexStack.Push(index);
		while (nextLeft.children.Count > 0) {
			nextLeft = nextLeft.children[0];
			nodeStack.Push(nextLeft); indexStack.Push(0);
		}
	}

	while (nodeStack.Count > 0) {
		var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;
		if (index < current.children.Count) {
			// push this node back again.
			nodeStack.Push(current); indexStack.Push(index);

			// push nextLeft and its next pending children.
			var nextLeft = current.children[index];
			nodeStack.Push(nextLeft); indexStack.Push(0);
			while (nextLeft.children.Count > 0) {
				nextLeft = nextLeft.children[0];
				nodeStack.Push(nextLeft); indexStack.Push(0);
			}
		}
		else {
			// Visit(current);
			if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
				action(current, context);
			}
		}
	}

	{
		var current = this.EOT;
		// Visit(current);
		if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
			action(current, context);
		}
	}

	return context.result;
}

点击查看 生成的 在遍历语法树时提取结点信息 相关代码

csharp 复制代码
private static readonly Dictionary<int/*LRNode.type*/,
    Action<LRNode, TContext<Exp>>> @expExtractorDict = new();

private static readonly Action<LRNode, TContext<Exp>> VtHandler =
(node, context) => {
    var token = node.start;
    context.objStack.Push(token);
};

// initialize dict for extractor.
private static void InitializeExtractorDict() {
    var extractorDict = @expExtractorDict;
    extractorDict.Add(st.@Plus符, VtHandler);
    extractorDict.Add(st.@Dash符, VtHandler);
    extractorDict.Add(st.@Asterisk符, VtHandler);
    extractorDict.Add(st.@Slash符, VtHandler);
    extractorDict.Add(st.@LeftParenthesis符, VtHandler);
    extractorDict.Add(st.@RightParenthesis符, VtHandler);
    extractorDict.Add(st.@number, VtHandler);
    extractorDict.Add(st.@终,
    static (node, context) => {
        // [-1]=Exp' : Exp ;
        // dumped by ExternalExtractor
        var @final = (VnExp?)context.objStack.Pop();
        var left = new Exp(@final);
        context.result = left; // final step, no need to push into stack.
    }); // end of extractorDict.Add(st.@终, (node, context) => { ... });
    extractorDict.Add(st.Exp枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 0: { // [0]=Exp : Exp '+' Term ;
            // dumped by ListExtractor 2
            var r0 = (VnTerm?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnExp?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 1: { // [1]=Exp : Exp '-' Term ;
            // dumped by ListExtractor 2
            var r0 = (VnTerm?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnExp?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 2: { // [2]=Exp : Term ;
            // dumped by ListExtractor 1
            var r0 = (VnTerm?)context.objStack.Pop();
            var left = new VnExp(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });
    extractorDict.Add(st.Term枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 3: { // [3]=Term : Term '*' Factor ;
            // dumped by ListExtractor 2
            var r0 = (VnFactor?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnTerm?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 4: { // [4]=Term : Term '/' Factor ;
            // dumped by ListExtractor 2
            var r0 = (VnFactor?)context.objStack.Pop();
            var r1 = (Token?)context.objStack.Pop();
            var r2 = (VnTerm?)context.objStack.Pop();
            var left = r2;
            left.Add(r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 5: { // [5]=Term : Factor ;
            // dumped by ListExtractor 1
            var r0 = (VnFactor?)context.objStack.Pop();
            var left = new VnTerm(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });
    extractorDict.Add(st.Factor枝,
    static (node, context) => {
        switch (node.regulation.index) {
        case 6: { // [6]=Factor : '(' Exp ')' ;
            // dumped by DefaultExtractor
            var r0 = (Token?)context.objStack.Pop();
            var r1 = (VnExp?)context.objStack.Pop();
            var r2 = (Token?)context.objStack.Pop();
            var left = new VnFactor(r2, r1, r0);
            context.objStack.Push(left);
        }
        break;
        case 7: { // [7]=Factor : 'number' ;
            // dumped by DefaultExtractor
            var r0 = (Token?)context.objStack.Pop();
            var left = new VnFactor(r0);
            context.objStack.Push(left);
        }
        break;
        default: throw new NotImplementedException();
        }
    }); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });
}

提取到的结点信息,展示出来如下:

smalltalk 复制代码
VnExp⛪T[0->8]
 ├─VnTerm⛪T[0]
 │  └─VnFactor⛪T[0]
 │     └─T[0]='number' 1234
 └─List T[1->8]
    ├─Item T[1->2]
    │  ├─T[1]='+' +
    │  └─VnTerm⛪T[2]
    │     └─VnFactor⛪T[2]
    │        └─T[2]='number' 567
    ├─Item T[3->4]
    │  ├─T[3]='+' +
    │  └─VnTerm⛪T[4]
    │     └─VnFactor⛪T[4]
    │        └─T[4]='number' 89
    ├─Item T[5->6]
    │  ├─T[5]='+' +
    │  └─VnTerm⛪T[6]
    │     └─VnFactor⛪T[6]
    │        └─T[6]='number' 0
    └─Item T[7->8]
       ├─T[7]='+' +
       └─VnTerm⛪T[8]
          └─VnFactor⛪T[8]
             └─T[8]='number' 0

类似Exp : Exp '+' Term ;的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构"压平"。根据压平了的语义信息,很容易将源代码格式化。
点击查看 VnExp结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Exp in the grammar(Exp).
/// </summary>
internal partial class VnExp : IFullFormatter {
	// [0]=Exp : Exp '+' Term ;
	// [1]=Exp : Exp '-' Term ;
	// [2]=Exp : Term ;

	private readonly VnTerm first0;
	public class PostItem : IFullFormatter {
		private readonly Token r1;
		private readonly VnTerm r0;

		public PostItem(Token r1, VnTerm r0) {
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r1, r0);
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r1, preConfig, writer);
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
            context.PrintCommentsBetween(this.r1, this.r0, config, writer);
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class PostItemList : IFullFormatter {
		private readonly List<PostItem> list = new();
		public PostItemList(PostItem item) {
			this.list.Add(item);
			this._scope = new TokenRange(item);
		}
		public void Add(PostItem item) {
			this.list.Add(item);
			this._scope.end = item.Scope.end;
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			for (int i = 0; i < list.Count; i++) {
				if (i == 0) {
					context.PrintBlanksAnd(list[i], preConfig, writer);
				}
				else {
					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
					context.PrintBlanksAnd(list[i], config, writer);
				}
			}
		}
	}
	private PostItemList? list;

	private readonly TokenRange _scope;
	public TokenRange Scope => _scope;

	internal VnExp(VnTerm first0) {
		this.first0 = first0;
		this._scope = new TokenRange(first0);
	}

	// [0]=Exp : Exp '+' Term ;
	// [1]=Exp : Exp '-' Term ;
	internal void Add(Token r1, VnTerm r0) {
		if (this.list == null) {
			this.list = new PostItemList(new(r1, r0));
		}
		else {
			this.list.Add(new(r1, r0));
		}
		this._scope.end = this.list.Scope.end;
	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
		context.PrintBlanksAnd(this.first0, preConfig, writer);
		if (this.list != null) {
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			context.PrintCommentsBetween(this.first0, this.list, config, writer);
			context.PrintBlanksAnd(this.list, config, writer);
		}
	}
}

点击查看 VnTerm结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Term in the grammar(Exp).
/// </summary>
internal partial class VnTerm : IFullFormatter {
	// [3]=Term : Term '*' Factor ;
	// [4]=Term : Term '/' Factor ;
	// [5]=Term : Factor ;

	private readonly VnFactor first0;
	public class PostItem : IFullFormatter {
		private readonly Token r1;
		private readonly VnFactor r0;

		public PostItem(Token r1, VnFactor r0) {
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r1, r0);
		}
		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r1, preConfig, writer);
			// '+'或'-'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
            context.PrintCommentsBetween(this.r1, this.r0, config, writer);
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class PostItemList : IFullFormatter {
		private readonly List<PostItem> list = new();
		public PostItemList(PostItem item) {
			this.list.Add(item);
			this._scope = new TokenRange(item);
		}
		public void Add(PostItem item) {
			this.list.Add(item);
			this._scope.end = item.Scope.end;
		}

		private readonly TokenRange _scope;
		public TokenRange Scope => _scope;

		public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			// '*'或'/'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			for (int i = 0; i < list.Count; i++) {
				if (i == 0) {
					context.PrintBlanksAnd(list[i], preConfig, writer);
				}
				else {
					context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
					context.PrintBlanksAnd(list[i], config, writer);
				}
			}
		}
	}
	private PostItemList? list;

	private readonly TokenRange _scope;
	public TokenRange Scope => _scope;

	// [5]=Term : Factor ;
	internal VnTerm(VnFactor first0) {
		this.first0 = first0;
		this._scope = new TokenRange(first0);
	}

	// [3]=Term : Term '*' Factor ;
	// [4]=Term : Term '/' Factor ;
	internal void Add(Token r1, VnFactor r0) {
		if (this.list == null) {
			this.list = new PostItemList(new(r1, r0));
		}
		else {
			this.list.Add(new(r1, r0));
		}
		this._scope.end = this.list.Scope.end;
	}

	public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
		context.PrintBlanksAnd(this.first0, preConfig, writer);
		if (this.list != null) {
			// '*'或'/'与其左右两边的Token间隔1个空格
			var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
			context.PrintCommentsBetween(this.first0, this.list, config, writer);
			context.PrintBlanksAnd(this.list, config, writer);
		}
	}
}

点击查看 VnFactor结点的格式化 代码

csharp 复制代码
/// <summary>
/// Correspond to the Vn node Factor in the grammar(Exp).
/// </summary>
internal abstract partial class VnFactor : IFullFormatter {
	// [6]=Factor : '(' Exp ')' ;
	// [7]=Factor : 'number' ;

	public class C0 : VnFactor {
		// [6]=Factor : '(' Exp ')' ;
		public C0(Token r2, VnExp r1, Token r0) {
			this.r2 = r2;
			this.r1 = r1;
			this.r0 = r0;
			this._scope = new TokenRange(r2, r0);
		}
		private readonly Token r2;
		private readonly VnExp r1;
		private readonly Token r0;

		private readonly TokenRange _scope;
		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
			context.PrintBlanksAnd(this.r2, preConfig, writer);
			// ( Exp )之间不留空格
			var config = new BlankConfig(inlineBlank: 0, forceNewline: false);
			context.PrintCommentsBetween(this.r2, this.r1, config, writer); 
			context.PrintBlanksAnd(this.r1, config, writer);
			context.PrintCommentsBetween(this.r1, this.r0, config, writer); 
			context.PrintBlanksAnd(this.r0, config, writer);
		}
	}
	public class C1 : VnFactor {
		// [7]=Factor : 'number' ;
		public C1(Token r0) {
			this.r0 = r0;
			this._scope = new TokenRange(r0);
		}
		private readonly Token r0;
		private readonly TokenRange _scope;
		public override TokenRange Scope => _scope;

		public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
            // 根据上级设置的preConfig输出自己唯一的token
			context.PrintBlanksAnd(this.r0, preConfig, writer);
		}
	}

	public abstract TokenRange Scope { get; }
	public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);
}

最终,1234+567+89+0+0被格式化后的样子如下(在运算符两侧各加入1个空格):

csharp 复制代码
1234 + 567 + 89 + 0 + 0

下面是更多示例:

csharp 复制代码
// 格式化 1-2*3
1 - 2 * 3

// 格式化 (1+2)/3
(1 + 2) / 3

// 格式化 (1+2)*(3-4)
(1 + 2) * (3 - 4)

点击查看 GLSL代码格式化的示例1

GLSL 复制代码
// 示例1 格式化前
void main() {
    int a=0;
    a++++;
    ++++a;
}
// 示例1 格式化后
void main() {
    int a = 0;
    a++ ++;
    ++ ++a;
}

点击查看 GLSL代码格式化的示例2

GLSL 复制代码
// 示例2 格式化前
in vec3  passNormal;
in vec2  passTexCoord;

uniform sampler2D textureMap;
uniform   vec3      lihtDirection=vec3(1,1,1);
uniform   vec3       diffuseColor;
uniform bool     transparent=false;

out vec4 outColor;

void main(  ){
    if (transparent) {
        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;}

    if (passTexCoord==vec2(-1,-1)){   // when texture coordinate not exists..
    float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);
     outColor = vec4(diffuseColor * diffuse, 1.0);
    }
    else {     outColor = texture(textureMap, passTexCoord);}
}
// 示例2 格式化后
in vec3 passNormal;
in vec2 passTexCoord;

uniform sampler2D textureMap;
uniform vec3 lihtDirection = vec3(1, 1, 1);
uniform vec3 diffuseColor;
uniform bool transparent = false;

out vec4 outColor;

void main() {
    if (transparent) {
        if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;
    }

    if (passTexCoord == vec2(-1, -1)) {   // when texture coordinate not exists..
        float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);
        outColor = vec4(diffuseColor * diffuse, 1.0);
    }
    else { outColor = texture(textureMap, passTexCoord); }
}

关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。

举例-自动解决Shift/Reduce冲突

C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:

smalltalk 复制代码
selection_statement :
      'if' '(' expression ')' selection_rest_statement
 ;
selection_rest_statement :
      statement 'else' statement
    | statement
 ;

这样,在语法分析器读到'else'这个Token时,它是应当Shift这个'else'呢,还是应当按照selection_rest_statement : statement ;进行规约呢?这就产生了Shift/Reduce冲突。

bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:

csharp 复制代码
const int syntaxStateCount = 477;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
    syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
    var states = CompilerGLSL.syntaxStates;
    ...
    // 30814 actions
    // conflicts(1)=not sovled(0)+solved(1)(1 warnings)
    #region init actions of syntax states
    ...
    // syntaxStates[454]:
    // [324] selection_rest_statement : statement ⏳ 'else' statement ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
    // [325] selection_rest_statement : statement ⏳ ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
    // 'else' repeated 2 times
    states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));
    // ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));
    states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));
	...
    #endregion init actions of syntax states
}

举例-优先级指令%nonassoc、%left、%right、%prec

如果我们按最直观的方式书写Calc.st,可能是这样的:

smalltalk 复制代码
Exp : Exp '+' Exp
    | Exp '-' Exp
    | Exp '*' Exp
    | Exp '/' Exp
    | '(' Exp ')'
    | 'number' ;

%%[0-9]+%% 'number'

点击查看 其LALR(1)语法分析器的状态转换表代码

csharp 复制代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 3);
	states[1] = new(capacity: 5);
	states[2] = new(capacity: 3);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 3);
	states[5] = new(capacity: 3);
	states[6] = new(capacity: 3);
	states[7] = new(capacity: 3);
	states[8] = new(capacity: 5);
	states[9] = new(capacity: 6);
	states[10] = new(capacity: 6);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	#endregion re-used actions

	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
	states[0]/*2*/.Add(st.@number, aShift3);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	states[1]/*3*/.Add(st.@Plus符, aShift4);
	states[1]/*4*/.Add(st.@Dash符, aShift5);
	states[1]/*5*/.Add(st.@Asterisk符, aShift6);
	states[1]/*6*/.Add(st.@Slash符, aShift7);
	states[1]/*7*/.Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
	states[2]/*10*/.Add(st.@number, aShift3);
	// syntaxStates[3]:
	// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[3]/*11*/.Add(st.@Dash符, aReduce5);
	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
	states[3]/*14*/.Add(st.@Slash符, aReduce5);
	states[3]/*15*/.Add(st.@Plus符, aReduce5);
	states[3]/*16*/.Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
	states[4]/*19*/.Add(st.@number, aShift3);
	// syntaxStates[5]:
	// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
	states[5]/*22*/.Add(st.@number, aShift3);
	// syntaxStates[6]:
	// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
	states[6]/*25*/.Add(st.@number, aShift3);
	// syntaxStates[7]:
	// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
	states[7]/*28*/.Add(st.@number, aShift3);
	// syntaxStates[8]:
	// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' 
	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
	states[8]/*30*/.Add(st.@Plus符, aShift4);
	states[8]/*31*/.Add(st.@Dash符, aShift5);
	states[8]/*32*/.Add(st.@Asterisk符, aShift6);
	states[8]/*33*/.Add(st.@Slash符, aShift7);
	// syntaxStates[9]:
	// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[9]/*34*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);
	// '-' repeated 2 times
	states[9]/*36*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);
	// '*' repeated 2 times
	states[9]/*38*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
	// '/' repeated 2 times
	states[9]/*40*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);
	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
	states[9]/*43*/.Add(st.@终, aReduce0);
	// syntaxStates[10]:
	// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[10]/*44*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);
	// '-' repeated 2 times
	states[10]/*46*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);
	// '*' repeated 2 times
	states[10]/*48*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
	// '/' repeated 2 times
	states[10]/*50*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);
	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
	states[10]/*53*/.Add(st.@终, aReduce1);
	// syntaxStates[11]:
	// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[11]/*54*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);
	// '-' repeated 2 times
	states[11]/*56*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);
	// '*' repeated 2 times
	states[11]/*58*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
	// '/' repeated 2 times
	states[11]/*60*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);
	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
	states[11]/*63*/.Add(st.@终, aReduce2);
	// syntaxStates[12]:
	// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	states[12]/*64*/.Add(st.@Plus符, aShift4);
	// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);
	// '-' repeated 2 times
	states[12]/*66*/.Add(st.@Dash符, aShift5);
	// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);
	// '*' repeated 2 times
	states[12]/*68*/.Add(st.@Asterisk符, aShift6);
	// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
	// '/' repeated 2 times
	states[12]/*70*/.Add(st.@Slash符, aShift7);
	// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);
	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
	states[12]/*73*/.Add(st.@终, aReduce3);
	// syntaxStates[13]:
	// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[13]/*74*/.Add(st.@Dash符, aReduce4);
	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
	states[13]/*77*/.Add(st.@Slash符, aReduce4);
	states[13]/*78*/.Add(st.@Plus符, aReduce4);
	states[13]/*79*/.Add(st.@终, aReduce4);
	#endregion init actions of syntax states
}

当处于syntaxStates[9]时,如果遇到'+'这个Token,那么本应当Reduce,但是bitParser根据默认的"Shift优先于Reduce"的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。

如果不想将文法改写为本文最初的样式,这里可以用优先级指令声明加减乘除运算的优先级,从而得到正确的语法分析表。

smalltalk 复制代码
Exp : Exp '+' Exp
    | Exp '-' Exp
    | Exp '*' Exp
    | Exp '/' Exp
    | '(' Exp ')'
    | 'number' ;

%%[0-9]+%% 'number'

%left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce
%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce

点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码

csharp 复制代码
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
	new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
	var states = CompilerExp.syntaxStates;
	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
	#region create objects of syntax states
	states[0] = new(capacity: 3);
	states[1] = new(capacity: 5);
	states[2] = new(capacity: 3);
	states[3] = new(capacity: 6);
	states[4] = new(capacity: 3);
	states[5] = new(capacity: 3);
	states[6] = new(capacity: 3);
	states[7] = new(capacity: 3);
	states[8] = new(capacity: 5);
	states[9] = new(capacity: 6);
	states[10] = new(capacity: 6);
	states[11] = new(capacity: 6);
	states[12] = new(capacity: 6);
	states[13] = new(capacity: 6);
	#endregion create objects of syntax states

	#region re-used actions
	LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
	LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
	LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
	LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
	LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
	LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
	LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
	LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
	LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
	LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
	LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
	LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
	#endregion re-used actions

	// 80 actions
	// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
	#region init actions of syntax states
	// syntaxStates[0]:
	// [-1] Exp' : ⏳ Exp ;☕ '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥' 
	states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
	states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
	states[0]/*2*/.Add(st.@number, aShift3);
	// syntaxStates[1]:
	// [-1] Exp' : Exp ⏳ ;☕ '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥' 
	states[1]/*3*/.Add(st.@Plus符, aShift4);
	states[1]/*4*/.Add(st.@Dash符, aShift5);
	states[1]/*5*/.Add(st.@Asterisk符, aShift6);
	states[1]/*6*/.Add(st.@Slash符, aShift7);
	states[1]/*7*/.Add(st.@终, LRParseAction.accept);
	// syntaxStates[2]:
	// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' 
	states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
	states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
	states[2]/*10*/.Add(st.@number, aShift3);
	// syntaxStates[3]:
	// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[3]/*11*/.Add(st.@Dash符, aReduce5);
	states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
	states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
	states[3]/*14*/.Add(st.@Slash符, aReduce5);
	states[3]/*15*/.Add(st.@Plus符, aReduce5);
	states[3]/*16*/.Add(st.@终, aReduce5);
	// syntaxStates[4]:
	// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
	states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
	states[4]/*19*/.Add(st.@number, aShift3);
	// syntaxStates[5]:
	// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
	states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
	states[5]/*22*/.Add(st.@number, aShift3);
	// syntaxStates[6]:
	// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
	states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
	states[6]/*25*/.Add(st.@number, aShift3);
	// syntaxStates[7]:
	// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥' 
	states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
	states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
	states[7]/*28*/.Add(st.@number, aShift3);
	// syntaxStates[8]:
	// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' 
	states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
	states[8]/*30*/.Add(st.@Plus符, aShift4);
	states[8]/*31*/.Add(st.@Dash符, aShift5);
	states[8]/*32*/.Add(st.@Asterisk符, aShift6);
	states[8]/*33*/.Add(st.@Slash符, aShift7);
	// syntaxStates[9]:
	// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);
	states[9]/*35*/.Add(st.@Plus符, aReduce0);
	// '-' repeated 2 times
	// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);
	states[9]/*37*/.Add(st.@Dash符, aReduce0);
	// '*' repeated 2 times
	states[9]/*38*/.Add(st.@Asterisk符, aShift6);
	// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
	// '/' repeated 2 times
	states[9]/*40*/.Add(st.@Slash符, aShift7);
	// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);
	states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
	states[9]/*43*/.Add(st.@终, aReduce0);
	// syntaxStates[10]:
	// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);
	states[10]/*45*/.Add(st.@Plus符, aReduce1);
	// '-' repeated 2 times
	// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);
	states[10]/*47*/.Add(st.@Dash符, aReduce1);
	// '*' repeated 2 times
	states[10]/*48*/.Add(st.@Asterisk符, aShift6);
	// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
	// '/' repeated 2 times
	states[10]/*50*/.Add(st.@Slash符, aShift7);
	// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);
	states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
	states[10]/*53*/.Add(st.@终, aReduce1);
	// syntaxStates[11]:
	// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);
	states[11]/*55*/.Add(st.@Plus符, aReduce2);
	// '-' repeated 2 times
	// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);
	states[11]/*57*/.Add(st.@Dash符, aReduce2);
	// '*' repeated 2 times
	// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);
	states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
	// '/' repeated 2 times
	// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);
	states[11]/*61*/.Add(st.@Slash符, aReduce2);
	states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
	states[11]/*63*/.Add(st.@终, aReduce2);
	// syntaxStates[12]:
	// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥' 
	// '+' repeated 2 times
	// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);
	states[12]/*65*/.Add(st.@Plus符, aReduce3);
	// '-' repeated 2 times
	// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);
	states[12]/*67*/.Add(st.@Dash符, aReduce3);
	// '*' repeated 2 times
	// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);
	states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
	// '/' repeated 2 times
	// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);
	states[12]/*71*/.Add(st.@Slash符, aReduce3);
	states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
	states[12]/*73*/.Add(st.@终, aReduce3);
	// syntaxStates[13]:
	// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥' 
	states[13]/*74*/.Add(st.@Dash符, aReduce4);
	states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
	states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
	states[13]/*77*/.Add(st.@Slash符, aReduce4);
	states[13]/*78*/.Add(st.@Plus符, aReduce4);
	states[13]/*79*/.Add(st.@终, aReduce4);
	#endregion init actions of syntax states

}

bitParser中的优先级指令%nonassoc%left%right%prec,与yacc相同:

  • 书写顺序靠后的,优先级更高;

  • %left偏向于Reduce;

  • %right偏向于Shift;

  • %nonassoc表示有语法错误;

  • %prec可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。

举例-/后缀

在Step格式的文件里,1=2中的1应当被认为是一个'entityId',而2应当被认为是一个'refEntity',即对"entity 2"的引用。

如何区别两者呢?当某个数值后面有=时,它就是'entityId',否则就是'refEntity'。此时就需要用/后缀功能,如下所示:

smalltalk 复制代码
// postfix.st
// regulations:
Items : Items Item | Item ;
Item : 'entityId' '=' 'refEntity' ;
// lexi statements:
%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着" =",那么这些数字就是'entityId'
%%[0-9]+%% 'refEntity' 

点击查看 生成的 lexi状态0 相关代码

csharp 复制代码
// lexicalState0
private static readonly Action<LexicalContext, char> lexicalState0 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : '=' */
	c == '='/*'\u003D'(61)*/) {
		BeginToken(context);
		ExtendToken(context);
		context.currentState = lexicalState2;
	}
	/* [0-9] */
	else if (/* possible Vt : 'entityId' 'refEntity' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		BeginToken(context);
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
		context.currentState = lexicalState0; // skip them.
	}
	else { // unexpected char.
		BeginToken(context);
		ExtendToken(context);
		AcceptToken(st.Error, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态1 相关代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : 'entityId' */
	c == '='/*'\u003D'(61)*/) {
		context.currentState = lexicalState4;
	}
	/* [ \t] */
	else if (/* possible Vt : 'entityId' */
	(c == ' '/*'\u0020'(32)*/)
	|| (c == '\t'/*'\u0009'(9)*/)) {
		context.currentState = lexicalState1;
	}
	/* deal with everything else. */
	else { // token with error type
		ExtendToken(context);
		AcceptToken(st.Error, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态2 相关代码

csharp 复制代码
// lexicalState2
private static readonly Action<LexicalContext, char> lexicalState2 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@Equal符, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态3 相关代码

csharp 复制代码
// lexicalState3
private static readonly Action<LexicalContext, char> lexicalState3 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* = */
	else if (/* possible Vt : 'entityId' */
	c == '='/*'\u003D'(61)*/) {
		context.currentState = lexicalState4;
	}
	/* [ \t] */
	else if (/* possible Vt : 'entityId' */
	(c == ' '/*'\u0020'(32)*/)
	|| (c == '\t'/*'\u0009'(9)*/)) {
		context.currentState = lexicalState1;
	}
	/* [0-9] */
	else if (/* possible Vt : 'entityId' 'refEntity' */
	'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else {
		AcceptToken(st.@refEntity, context);
		context.currentState = lexicalState0;
	}
};

点击查看 生成的 lexi状态4 相关代码

csharp 复制代码
// lexicalState4
private static readonly Action<LexicalContext, char> lexicalState4 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* deal with everything else. */
	else {
		AcceptToken(st.@entityId, context);
		context.currentState = lexicalState0;
	}
};

下面是二维数组ElseIf[][]形式的状态转换表,它复用了一些ElseIf对象,这进一步减少了空间占用。
点击查看 生成的 二维数组状态转换表 代码

csharp 复制代码
private static readonly ElseIf[][] lexiStates = new ElseIf[5][];
static void InitializeLexiTable() {
	ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times
	ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times
	ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times
	lexiStates[0] = new ElseIf[] {
	// possible Vt: 'entityId' 'refEntity'
	/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),
	// possible Vt: '='
	/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),
	};
	lexiStates[1] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),
	/*0*/s9_9_0_1,
	// possible Vt: 'entityId'
	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
	/*1*/s32_32_0_1,
	// possible Vt: 'entityId'
	/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),
	/*2*/s61_61_0_4,
	};
	lexiStates[2] = new ElseIf[] {
	// possible Vt: '='
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),
	};
	lexiStates[3] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),
	/*0*/s9_9_0_1,
	// possible Vt: 'entityId'
	/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
	/*1*/s32_32_0_1,
	// possible Vt: 'entityId' 'refEntity'
	/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),
	// possible Vt: 'entityId'
	/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),
	/*3*/s61_61_0_4,
	// possible Vt: 'refEntity'
	/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),
	};
	lexiStates[4] = new ElseIf[] {
	// possible Vt: 'entityId'
	/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),
	};
}

其miniDFA的状态图如下:

结合代码和状态图容易发现:

  • 当处于miniDFA3状态时,如果遇到空格\t=则可断定下一个Token类型是'entityId'(或错误类型'Error错');否则就是'refEntity'(并返回miniDFA0状态)。这符合我们的预期。

  • 当处于miniDFA0状态(即初始状态)时,如果遇到[0-9],那么下一个Token类型既可能是'entityId',又可能是'refEntity'。此时尚且无法唯一断定之。

也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId''refEntity'的开头[0-9]+是相同的。

我们来对比它的NFA和DFA:

下面是它的NFA状态图:

下面是它的DFA状态图:

可见:

  • 在NFA状态图中,NFA0-0状态在遇到[0-9]时,有两个选择:NFA2-1状态和NFA3-1状态。在对应的DFA状态图中,NFA2-1状态和NFA3-1状态就被合并到了一个DFA状态(DFA2)中。这是通过子集构造法实现的。

这里顺便展示一个构造miniDFA的例子:

smalltalk 复制代码
// regulations:
Left : 'min' ;
// lexical statements
%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟',',则为'min'

这个'min'的DFA如下:

这个'min'的miniDFA如下:

可见,miniDFA合并了DFA中的DFA1状态和DFA3状态。它们两个能够合并,是因为它们两个在读到相同的char时均会跳转到相同的下一个状态。这是通过Hopcroft算法实现的。

举例-前缀<'Vt'>

在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }中的Point识别为一个"type_name"类型的Token,而不是"identifier"类型的Token。这可以通过前缀来实现。

smalltalk 复制代码
// ..
struct_specifier :
      'struct' 'type_name' '{' struct_declaration_list '}' ;
// ..

// lexical statements
%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为"type_name"类型
%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为"identifier"类型

添加前缀<'struct'>不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是"struct"类型:若是,则设置为"type_name"类型,否则,设置为"identifier"类型。

另外,还需要新增一个数组,用于记录已识别出的全部"type_name"类型,以便再次遇到它时,也能够将其设置为"type_name"类型。
点击查看 状态机受前缀影响的部分 代码

csharp 复制代码
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
	if (false) { /* for simpler code generation purpose. */ }
	/* [a-zA-Z0-9_] */
	else if (/* possible Vt : 'type_name' 'identifier' */
	('a' <= c && c <= 'z')
	|| ('A' <= c && c <= 'Z')
	|| ('0' <= c && c <= '9')
	|| (c == '_')) {
		ExtendToken(context);
		context.currentState = lexicalState3;
	}
	/* deal with everything else. */
	else {
		AcceptToken2(context
		// 如果上一个Token是struct,那么新Token是type_name
		, new(/*<'Vt'>*/st.@struct, st.@type_name)
		// 否则,新Token是identifier
		, new(/*default preVt*/string.Empty, st.@identifier));
		context.currentState = lexicalState0;
	}
};

与之配套的AcceptToken2(context, ifVts);函数也就复杂些:
点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码

csharp 复制代码
struct IfVt {
	public readonly string preVt;
	public readonly string Vt;
	public IfVt(string preVt, string Vt) {
		this.preVt = preVt;
		this.Vt = Vt;
	}
}

private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {
	var startIndex = context.analyzingToken.start.index;
	var endIndex = context.analyzingToken.end.index;
	context.analyzingToken.value = context.sourceCode.Substring(
		startIndex, endIndex - startIndex + 1);
	var typeSet = false;
	const string key = "type_name"; var hadThisTypeName = false;
	if (!typeSet) {
		if (context.tagDict.TryGetValue(key, out var type_nameList)) {
			var list = type_nameList as List<string>;
			if (list.Contains(context.analyzingToken.value)) {
			    // 如果是已识别出的type_name
				context.analyzingToken.type = st.type_name;
				typeSet = true;
				hadThisTypeName = true;
			}
		}
	}
	if (!typeSet) {
	    int lastType = 0;
		if (context.lastSyntaxValidToken != null) {
			lastType = context.lastSyntaxValidToken.type;
		}
		for (var i = 0; i < ifVts.Length; i++) {
			var ifVt = ifVts[i];
			if (ifVt.preVt == 0 // 默认没有前缀
			  || ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>
				context.analyzingToken.type = ifVt.Vt;
				typeSet = true;
				break;
			}
		}
	}
	if (!typeSet) {
		// we failed to assign type according to lexi statements.
		// this indicates token error in source code or inappropriate lexi statements.
		context.analyzingToken.type = st.Error错;
		context.signalCondition = LexicalContext.defaultSignal;
	}

    // cancel forward steps for post-regex
	var backStep = context.cursor.index - context.analyzingToken.end.index;
	if (backStep > 0) { context.MoveBack(backStep); }
	// next operation: context.MoveForward();

	var token = context.analyzingToken.Dump();
	context.result.Add(token);
	// 跳过注释
	if (context.analyzingToken.type != st.blockComment
	 && context.analyzingToken.type != st.inlineComment) {
		context.lastSyntaxValidToken = token;
	}

	if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {
	    // 将新识别出的type_name加入list
		if (!context.tagDict.TryGetValue(key, out var type_nameList)) {
			type_nameList = new List<string>();
			context.tagDict.Add(key, type_nameList);
		}
		var list = type_nameList as List<string>;
		list.Add(context.analyzingToken.value);
	}
}

注意,语法分析不需要blockCommentinlineComment(类型为"注释"的Token),因而在记录上一个Token类型时,我们要跳过注释。

举例-状态信号<signal1, signal2, ..>

在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )中的r1r2都识别为"type_name"类型的Token,而不是"identifier"类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal实现。

状态信号是这样起作用的:

在读到一个"subroutine"类型的Token时,将LexicalContext.signal设置为subroutine0

在读到一个"("类型的Token时,如果LexicalContext.signalsubroutine0,就将LexicalContext.signal设置为subroutine1

在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*的标识符时,如果LexicalContext.signalsubroutine1(这说明词法分析器刚刚连续读到了'subroutine' '('),就将它识别为"type_name"类型的Token,否则识别为"identifier"类型的Token;

在读到一个")"类型的Token时,如果LexicalContext.signalsubroutine1,就将LexicalContext.signal设置为default(默认状态),即不再理会状态信号。

smalltalk 复制代码
storage_qualifier :
    | 'subroutine' '(' type_name_list ')' ;

// lexical statements
%%subroutine%%             'subroutine' subroutine0
<subroutine0>%%[(]%%       '('          subroutine1
<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'
<subroutine1>%%[,]%%       ','
<subroutine1>%%[)]%%       ')'          default

状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。

举例-中文字符

如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00\u9FFF之间),可以这样:

smalltalk 复制代码
Items : Items Item | Item ;
Item : 'chineseChar' | 'other' ;

%%[\u4E00-u9FFF]%% 'chineseChar' 
%%[^\u4E00-u9FFF]%% 'other'

End

相关推荐
杨浦老苏11 小时前
无数据库开源Wiki引擎WikiDocs
docker·markdown·群晖·wiki
shao9185162 天前
大模型UI:Gradio全解11——Chatbot:融合大模型的聊天机器人(4)
markdown·gradio·chatbot·streaming·blocks
wuzuyu3653 天前
doc、pdf转markdown
pdf·markdown
君醉酒3 天前
pandoc + wkhtmltox 批量转换Markdown文件为PDF文件
python·pdf·markdown·pandoc·wkhtmltopdf
听吉米讲故事4 天前
MinerU:高效智能PDF文档解析工具完全指南
pdf·llm·markdown
Rossy Yan14 天前
【C++面向对象——群体类和群体数据的组织】实现含排序功能的数组类(头歌实践教学平台习题)【合集】
c++·排序算法·面向对象·封装·查找
小王子102415 天前
面向对象分析与设计Python版 面向对象的核心特征
python·面向对象·分析与设计
桃园码工16 天前
1-markdown转网页样式页面 --[制作网页模板] 【测试代码下载】
markdown·网页样式页面·制作网页模板
DBWYX18 天前
GoF23种设计模式 简介
设计模式·面向对象