拥有自己的解析器(C#实现LALR(1)语法解析器和miniDFA词法分析器的生成器)
参考lex和yacc的输入格式,参考虎书《现代编译原理-C语言描述》的算法,不依赖第三方库,大力整合优化,实现了LALR(1)语法解析器和miniDFA词法分析器的C#生成器(暂命名为bitParser)。
可在(https://gitee.com/bitzhuwei/bitParser-demos)下载ANSI C语言、GLGL4.60.8和各类测试用例的解析器完整代码。
(https://www.cnblogs.com/bitzhuwei/p/18544785)列示了实现词法分析器和语法分析器的生成器的全部算法。
1234+567+89+0+0
的语法树生成过程☟
(1234+567)/(89-0)
的语法树生成过程☟
可在(https://www.cnblogs.com/bitzhuwei/p/18679529)查看更多语法树的生成过程。(想看自定义表达式的语法树生成过程的同学请留言)
词法分析器的生成器
-
分别生成DFA和最小化DFA(以下简称miniDFA)的词法分析器代码(状态转换表、保留字、Token类型等)
-
支持全Unicode字符、
/
后缀、前缀<'Vt'>
、状态信号<signal1, signal2, ..>
,便于识别id = refId
、struct type_name
、<Comment>[^*\n]*
、subroutine(type_name_list)
这类情况。类似lex,但不完全相同。 -
无须显式书写Token类型、状态信号、保留字,即无须lex中的
%s NUM ID ..
、%x Comment End Text ..
、[(] { return ('('); }
。 -
注释详尽。每个词法状态的每个条件分支都在注释中说明其正则表达式、导向的Token类型(和signal)等。
-
生成ε-NFA、NFA、DFA、miniDFA的状态图(mermaid格式)和各类型Token的状态图(mermaid格式),便于学习和调试。
语法分析器的生成器
-
分别生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的语法分析器代码(分析表、规则列表、语法树结点类型等)。
-
支持优先级指令
%nonassoc
、%left
、%right
、%prec
,自动解决Shift/Reduce、Reduce/Reduce冲突,并在分析表代码的注释中列示之。 -
注释详尽。在注释中列示:每个语法状态的LR项和lookahead;冲突数量、已解决数量、未解决数量等。
-
生成LL(1)、LR(0)、SLR(1)、LALR(1)、LR(1)的状态图(mermaid格式)和状态表(markdown格式),便于学习和调试。
-
生成nullable、FIRST集、FOLLOW集的文档。
其他
- 无须lex和yacc中的
%option
、%union
、%define
、%{
、%}
、%parse-param
、%lex-param
、%pure-parser
、%expect
、%token
、%type
。做成类库,直接按如下方式调用即可:
csharp
var compiler = new CompilerXxx();
var sourceCode = File.ReadAllText("input.st");
var tokens = compiler.Analyze(sourceCode);
var syntaxTree = compiler.Parse(tokens);
var extractedObj = compiler.Extract(syntaxTree, tokens, sourceCode);
// use extractedObj for user-defined business ..
- 支持多行注释指令
%blockComment on/off
和单行注释指令%inlineComment on/off
。默认格式同C语言的/**/
和//
,可自定义其格式,例如:在解析VRML文件时,将单行注释的格式定义为从#到行尾:
smalltalk
%%#[^\r\n]*%% 'inlineComment'
-
生成遍历语法树提取语义信息的框架,提供适用各种语言的源代码格式化算法。可用于格式化、进一步生成中间代码等后续业务逻辑。
-
大力优化,例如生成ANSI C语言的全部解析器代码+文档只需3秒,生成GLSL4.60.8的全部解析器代码+文档只需9秒。
点击查看 其他功能
-
支持Scope范围指令
%validScopeChars
和全局范围指令%validGlobalChars
,默认范围均为[\u0001-\uFFFF]
(即除'\0'
外的全部Unicode字符),可自定义其范围。 -
支持
%omit
指令,可指定要忽略的空白符。默认为'空格'
、'\t'
、'\n'
、'\r'
、'\0'
。 -
支持
%start
指定起始语法结点。
举例-Calc.st
输入文件Calc.st
能够处理加减乘除和括号运算的解析器,其文法如下:
smalltalk
// 输入文件Calc.st
Exp : Exp '+' Term
| Exp '-' Term
| Term ;
Term : Term '*' Factor
| Term '/' Factor
| Factor ;
Factor : '(' Exp ')'
| 'number' ;
%%[0-9]+%% 'number' // 示例只处理非负整数
//无须书写 %%[+]%% '+' 等
据此文法,我们可以生成下述内容:
生成的词法分析器
生成的ε-NFA的词法分析器的状态图如下:
生成的miniDFA的词法分析器的状态图如下:
点击查看 生成的 终结点Vt和非终结点Vn 代码
csharp
// 如不需要,可删除此数组
public static readonly IReadOnlyList<string> stArray = new string[] {
"'¥'", // @终 = 0;
"'+'", // @Plus符 = 1; // '+'
"'-'", // @Dash符 = 2; // '-'
"'*'", // @Asterisk符 = 3; // '*'
"'/'", // @Slash符 = 4; // '/'
"'('", // @LeftParenthesis符 = 5; // '('
"')'", // @RightParenthesis符 = 6; // ')'
"'number'", // @number = 7; // 'number'
// end of 1 + 7 Vt
"Exp", // Exp枝 = 8; // Exp
"Term", // Term枝 = 9; // Term
"Factor", // Factor枝 = 10; // Factor
// end of 3 Vn
};
/// <summary>
/// Vt types are used both for lexical-analyze and syntax-parse.
/// <para>Vn types are only for syntax-parse.</para>
/// <para>Vt is quoted in ''.</para>
/// <para>Vn is not quoted in ''.</para>
/// </summary>
public static class st {
// Vt
/// <summary>
/// Something wrong within the source code.
/// </summary>
public const int Error错 = -1; // "'×'";
/// <summary>
/// end of token list.
/// </summary>
public const int @终 = 0; // "'¥'";
/// <summary>
/// '+'
/// </summary>
public const int @Plus符 = 1; // "'+'"
/// <summary>
/// '-'
/// </summary>
public const int @Dash符 = 2; // "'-'"
/// <summary>
/// '*'
/// </summary>
public const int @Asterisk符 = 3; // "'*'"
/// <summary>
/// '/'
/// </summary>
public const int @Slash符 = 4; // "'/'"
/// <summary>
/// '('
/// </summary>
public const int @LeftParenthesis符 = 5; // "'('"
/// <summary>
/// ')'
/// </summary>
public const int @RightParenthesis符 = 6; // "')'"
/// <summary>
/// 'number'
/// </summary>
public const int @number = 7; // "'number'"
/// <summary>
/// count of ('¥' + user-defined Vt)
/// </summary>
public const int VtCount = 8;
// Vn
/// <summary>
/// Exp
/// </summary>
public const int Exp枝 = 8; // "Exp"
/// <summary>
/// Term
/// </summary>
public const int Term枝 = 9; // "Term"
/// <summary>
/// Factor
/// </summary>
public const int Factor枝 = 10; // "Factor"
}
点击查看 生成的 保留字(即一个语言中的keyword) 相关代码
csharp
public static class reservedWord {
/// <summary>
/// +
/// </summary>
public const string @Plus符 = "+";
/// <summary>
/// -
/// </summary>
public const string @Dash符 = "-";
/// <summary>
/// *
/// </summary>
public const string @Asterisk符 = "*";
/// <summary>
/// /
/// </summary>
public const string @Slash符 = "/";
/// <summary>
/// (
/// </summary>
public const string @LeftParenthesis符 = "(";
/// <summary>
/// )
/// </summary>
public const string @RightParenthesis符 = ")";
}
/// <summary>
/// if <paramref name="token"/> is a reserved word, assign correspond type and return true.
/// <para>otherwise, return false.</para>
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
private static bool CheckReservedWord(AnalyzingToken token) {
bool isReservedWord = true;
switch (token.value) {
case reservedWord.@Plus符: token.type = st.@Plus符; break;
case reservedWord.@Dash符: token.type = st.@Dash符; break;
case reservedWord.@Asterisk符: token.type = st.@Asterisk符; break;
case reservedWord.@Slash符: token.type = st.@Slash符; break;
case reservedWord.@LeftParenthesis符: token.type = st.@LeftParenthesis符; break;
case reservedWord.@RightParenthesis符: token.type = st.@RightParenthesis符; break;
default: isReservedWord = false; break;
}
return isReservedWord;
}
以下是用8个Action<LexicalContext, char, CurrentStateWrapper>
函数委托实现的词法状态转换表
。
点击查看 生成的 lexi状态0 相关代码
csharp
// lexicalState0
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState0 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* \) */
else if (/* possible Vt : ')' */
c == ')'/*'\u0029'(41)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState2;
}
/* \( */
else if (/* possible Vt : '(' */
c == '('/*'\u0028'(40)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState3;
}
/* \/ */
else if (/* possible Vt : '/' */
c == '/'/*'\u002F'(47)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState4;
}
/* \* */
else if (/* possible Vt : '*' */
c == '*'/*'\u002A'(42)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState5;
}
/* - */
else if (/* possible Vt : '-' */
c == '-'/*'\u002D'(45)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState6;
}
/* \+ */
else if (/* possible Vt : '+' */
c == '+'/*'\u002B'(43)*/) {
BeginToken(context);
ExtendToken(context);
wrapper.currentState = lexicalState7;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
wrapper.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
csharp
// lexicalState1
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState1 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* [0-9] */
else if (/* possible Vt : 'number' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
wrapper.currentState = lexicalState1;
}
/* deal with everything else. */
else {
AcceptToken(st.@number, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
csharp
// lexicalState2
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState2 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@RightParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
csharp
// lexicalState3
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState3 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@LeftParenthesis符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
csharp
// lexicalState4
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState4 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Slash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态5 相关代码
csharp
// lexicalState5
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState5 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Asterisk符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态6 相关代码
csharp
// lexicalState6
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState6 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Dash符, context);
wrapper.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态7 相关代码
csharp
// lexicalState7
private static readonly Action<LexicalContext, char, CurrentStateWrapper> lexicalState7 =
static (context, c, wrapper) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Plus符, context);
wrapper.currentState = lexicalState0;
}
};
其调用过程如下:
csharp
// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var wrapper = new CurrentStateWrapper(this.initialState);
do {
char currentChar = context.MoveForward();
wrapper.currentState(context, currentChar, wrapper);
// wrapper.currentState will be set to next lexi-state.
} while (!context.EOF);
return context.result;
}
下面是用一个二维数组ElseIf[][]
实现的词法状态转换表
,其占用空间较少,且执行效率也有所提高。
csharp
private static readonly ElseIf[][] lexiStates = new ElseIf[8][];
static void InitializeLexiTable() {
lexiStates[0] = new ElseIf[] {
// possible Vt: '('
/*0*/new('('/*'\u0028'(40)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: ')'
/*1*/new(')'/*'\u0029'(41)*/, Acts.Begin | Acts.Extend, 2),
// possible Vt: '*'
/*2*/new('*'/*'\u002A'(42)*/, Acts.Begin | Acts.Extend, 5),
// possible Vt: '+'
/*3*/new('+'/*'\u002B'(43)*/, Acts.Begin | Acts.Extend, 7),
// possible Vt: '-'
/*4*/new('-'/*'\u002D'(45)*/, Acts.Begin | Acts.Extend, 6),
// possible Vt: '/'
/*5*/new('/'/*'\u002F'(47)*/, Acts.Begin | Acts.Extend, 4),
// possible Vt: 'number'
/*6*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 1),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'number'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 1),
// possible Vt: 'number'
/*1*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@number),
};
lexiStates[2] = new ElseIf[] {
// possible Vt: ')'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@RightParenthesis符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: '('
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@LeftParenthesis符),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: '/'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Slash符),
};
lexiStates[5] = new ElseIf[] {
// possible Vt: '*'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Asterisk符),
};
lexiStates[6] = new ElseIf[] {
// possible Vt: '-'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Dash符),
};
lexiStates[7] = new ElseIf[] {
// possible Vt: '+'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Plus符),
};
}
顾名思义,一个ElseIf
与函数委托方式中的一个else if ('0' <= c && c <= '9') { .. }
的作用相同。这样,就用一个ElseIf[]
取代了一个函数委托;而且在调用此表时可以用折半查找方式快速定位ElseIf
。
点击查看 调用二维数组实现的词法分析器 相关代码
csharp
// skip '\0' at lexi-state 0
private static readonly ElseIf skipZero = new(
char.MinValue, char.MaxValue, Acts.None,
nextState: 0);
// construct a error token at lexi-state 0
private static readonly ElseIf unexpectedChar = new(
char.MinValue, char.MaxValue, Acts.Begin | Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'");
// construct a error token at other lexi-states
private static readonly ElseIf errorToken = new(
char.MinValue, char.MaxValue, Acts.Extend | Acts.Accept,
nextState: 0, -1);// -1 means error("'×'");
// analyze the specified sourceCode and return a list of Token.
public TokenList Analyze(string sourceCode) {
var context = new LexicalContext(sourceCode);
var currentStateId = 0;
do {
// read current char,
char currentChar = context.MoveForward();
ElseIf[] lexiState = lexiStates[currentStateId];
// binary search for the segment( else if (left <= c && c <= right) { ... } )
var segment = BinarySearch(currentChar, lexiState, currentStateId != 0);
if (segment is null) {
if (currentStateId == 0) { // the initial state of lexical analyze.
segment = BinarySearch(currentChar, this.omitChars, currentStateId != 0);
if (segment is null) {
// '\0' must be skipped.
if (currentChar == '\0') { segment = skipZero; }
else { segment = unexpectedChar; }
}
}
else { // token with error type
segment = errorToken;
}
}
// construct the next token,
var scripts = segment.scripts;
if (scripts != 0) { // it is 0 in most cases.
if ((scripts & Acts.Begin) != 0) {
this.beginToken(context);
}
if ((scripts & Acts.Extend) != 0) {
this.extendToken(context);
}
if ((scripts & Acts.Accept) != 0) {
this.acceptToken(context, segment.Vt);
}
else if ((scripts & Acts.Accept2) != 0) {
this.acceptToken2(context, segment.ifVts);
}
}
// point to next state.
currentStateId = segment.nextStateId;
} while (!context.EOF);
return context.result;
}
private ElseIf? BinarySearch(char currentChar, ElseIf[] lexiState, bool specialLast) {
var left = 0; var right = lexiState.Length - (specialLast ? 2 : 1);
if (right < 0) { }
else {
var result = -1;
while (left < right) {
var mid = (left + right) / 2;
var current = lexiState[mid];
if (currentChar < current.min) { result = -1; }
else if (current.max < currentChar) { result = 1; }
else { return current; }
if (result < 0) { right = mid; }
else { left = mid + 1; }
}
{
var current = lexiState[left];
if (current.min <= currentChar && currentChar <= current.max) {
return current;
}
}
}
if (specialLast) {
var last = lexiState[lexiState.Length - 1];
return last;
// no need to compare, because it's ['\0', '\uFFFF']
//if (last.min <= currentChar && currentChar <= last.max) {
// return last;
//}
}
return null;
}
生成的语法分析器
点击查看 nullable、FIRST集、FOLLOW集
C
nullable:
[0]: nullable( Exp' ) = False
[1]: nullable( Exp ) = False
[2]: nullable( Term ) = False
[3]: nullable( Factor ) = False
[4]: nullable( '¥' ) = False
[5]: nullable( '+' ) = False
[6]: nullable( '-' ) = False
[7]: nullable( '*' ) = False
[8]: nullable( '/' ) = False
[9]: nullable( '(' ) = False
[10]: nullable( ')' ) = False
[11]: nullable( 'number' ) = False
FIRST集:
[0]: FIRST( Exp' ) = { '(' 'number' }
[1]: FIRST( Exp ) = { '(' 'number' }
[2]: FIRST( Term ) = { '(' 'number' }
[3]: FIRST( Factor ) = { '(' 'number' }
[4]: FIRST( '¥' ) = { '¥' }
[5]: FIRST( '+' ) = { '+' }
[6]: FIRST( '-' ) = { '-' }
[7]: FIRST( '*' ) = { '*' }
[8]: FIRST( '/' ) = { '/' }
[9]: FIRST( '(' ) = { '(' }
[10]: FIRST( ')' ) = { ')' }
[11]: FIRST( 'number' ) = { 'number' }
[12]: FIRST( Exp '+' Term ) = { '(' 'number' }
[13]: FIRST( Exp '-' Term ) = { '(' 'number' }
[14]: FIRST( Term '*' Factor ) = { '(' 'number' }
[15]: FIRST( Term '/' Factor ) = { '(' 'number' }
[16]: FIRST( '(' Exp ')' ) = { '(' }
FOLLOW集:
[0]: FOLLOW( Exp' ) = { '¥' }
[1]: FOLLOW( Exp ) = { '-' ')' '+' '¥' }
[2]: FOLLOW( Term ) = { '-' ')' '*' '/' '+' '¥' }
[3]: FOLLOW( Factor ) = { '-' ')' '*' '/' '+' '¥' }
点击查看 生成的 规约规则 代码
csharp
public static IReadOnlyList<Regulation> Regulations => regulations;
private static readonly Regulation[] regulations = new Regulation[] {
// [0]=Exp : Exp '+' Term ;
new(0, st.Exp枝, st.Exp枝, st.@Plus符, st.Term枝),
// [1]=Exp : Exp '-' Term ;
new(1, st.Exp枝, st.Exp枝, st.@Dash符, st.Term枝),
// [2]=Exp : Term ;
new(2, st.Exp枝, st.Term枝),
// [3]=Term : Term '*' Factor ;
new(3, st.Term枝, st.Term枝, st.@Asterisk符, st.Factor枝),
// [4]=Term : Term '/' Factor ;
new(4, st.Term枝, st.Term枝, st.@Slash符, st.Factor枝),
// [5]=Term : Factor ;
new(5, st.Term枝, st.Factor枝),
// [6]=Factor : '(' Exp ')' ;
new(6, st.Factor枝, st.@LeftParenthesis符, st.Exp枝, st.@RightParenthesis符),
// [7]=Factor : 'number' ;
new(7, st.Factor枝, st.@number),
};
点击查看 生成的 LALR(1)语法分析表 代码
csharp
const int syntaxStateCount = 16;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 5);
states[1] = new(capacity: 3);
states[2] = new(capacity: 6);
states[3] = new(capacity: 6);
states[4] = new(capacity: 5);
states[5] = new(capacity: 6);
states[6] = new(capacity: 4);
states[7] = new(capacity: 4);
states[8] = new(capacity: 3);
states[9] = new(capacity: 3);
states[10] = new(capacity: 3);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
states[14] = new(capacity: 6);
states[15] = new(capacity: 6);
#endregion create objects of syntax states
#region re-used actions
LRParseAction aGoto2 = new(LRParseAction.Kind.Goto, states[2]);// refered 2 times
LRParseAction aGoto3 = new(LRParseAction.Kind.Goto, states[3]);// refered 4 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 2 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 2 times
LRParseAction aShift8 = new(LRParseAction.Kind.Shift, states[8]);// refered 3 times
LRParseAction aShift9 = new(LRParseAction.Kind.Shift, states[9]);// refered 3 times
LRParseAction aReduce2 = new(regulations[2]);// refered 4 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce7 = new(regulations[7]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 4 times
LRParseAction aReduce1 = new(regulations[1]);// refered 4 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
LRParseAction aReduce6 = new(regulations[6]);// refered 6 times
#endregion re-used actions
// 78 actions
// conflicts(0)=not sovled(0)+solved(0)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : ⏳ Exp ;☕ '¥'
// [0] Exp : ⏳ Exp '+' Term ;☕ '-' '+' '¥'
// [1] Exp : ⏳ Exp '-' Term ;☕ '-' '+' '¥'
// [2] Exp : ⏳ Term ;☕ '-' '+' '¥'
// [3] Term : ⏳ Term '*' Factor ;☕ '-' '*' '/' '+' '¥'
// [4] Term : ⏳ Term '/' Factor ;☕ '-' '*' '/' '+' '¥'
// [5] Term : ⏳ Factor ;☕ '-' '*' '/' '+' '¥'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥'
// [7] Factor : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥'
/*0*/states[0].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[1]));
/*1*/states[0].Add(st.Term枝, aGoto2);
/*2*/states[0].Add(st.Factor枝, aGoto3);
/*3*/states[0].Add(st.@LeftParenthesis符, aShift4);
/*4*/states[0].Add(st.@number, aShift5);
// syntaxStates[1]:
// [-1] Exp' : Exp ⏳ ;☕ '¥'
// [0] Exp : Exp ⏳ '+' Term ;☕ '-' '+' '¥'
// [1] Exp : Exp ⏳ '-' Term ;☕ '-' '+' '¥'
/*5*/states[1].Add(st.@Plus符, aShift6);
/*6*/states[1].Add(st.@Dash符, aShift7);
/*7*/states[1].Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [2] Exp : Term ⏳ ;☕ '-' ')' '+' '¥'
// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥'
/*8*/states[2].Add(st.@Asterisk符, aShift8);
/*9*/states[2].Add(st.@Slash符, aShift9);
/*10*/states[2].Add(st.@Dash符, aReduce2);
/*11*/states[2].Add(st.@RightParenthesis符, aReduce2);
/*12*/states[2].Add(st.@Plus符, aReduce2);
/*13*/states[2].Add(st.@终, aReduce2);
// syntaxStates[3]:
// [5] Term : Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
/*14*/states[3].Add(st.@Dash符, aReduce5);
/*15*/states[3].Add(st.@RightParenthesis符, aReduce5);
/*16*/states[3].Add(st.@Asterisk符, aReduce5);
/*17*/states[3].Add(st.@Slash符, aReduce5);
/*18*/states[3].Add(st.@Plus符, aReduce5);
/*19*/states[3].Add(st.@终, aReduce5);
// syntaxStates[4]:
// [6] Factor : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Term ;☕ '-' ')' '+'
// [1] Exp : ⏳ Exp '-' Term ;☕ '-' ')' '+'
// [2] Exp : ⏳ Term ;☕ '-' ')' '+'
// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+'
// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+'
// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+'
// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+'
/*20*/states[4].Add(st.Exp枝, new(LRParseAction.Kind.Goto, states[10]));
/*21*/states[4].Add(st.Term枝, aGoto2);
/*22*/states[4].Add(st.Factor枝, aGoto3);
/*23*/states[4].Add(st.@LeftParenthesis符, aShift4);
/*24*/states[4].Add(st.@number, aShift5);
// syntaxStates[5]:
// [7] Factor : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
/*25*/states[5].Add(st.@Dash符, aReduce7);
/*26*/states[5].Add(st.@RightParenthesis符, aReduce7);
/*27*/states[5].Add(st.@Asterisk符, aReduce7);
/*28*/states[5].Add(st.@Slash符, aReduce7);
/*29*/states[5].Add(st.@Plus符, aReduce7);
/*30*/states[5].Add(st.@终, aReduce7);
// syntaxStates[6]:
// [0] Exp : Exp '+' ⏳ Term ;☕ '-' ')' '+' '¥'
// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
/*31*/states[6].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[11]));
/*32*/states[6].Add(st.Factor枝, aGoto3);
/*33*/states[6].Add(st.@LeftParenthesis符, aShift4);
/*34*/states[6].Add(st.@number, aShift5);
// syntaxStates[7]:
// [1] Exp : Exp '-' ⏳ Term ;☕ '-' ')' '+' '¥'
// [3] Term : ⏳ Term '*' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Term : ⏳ Term '/' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Term : ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
/*35*/states[7].Add(st.Term枝, new(LRParseAction.Kind.Goto, states[12]));
/*36*/states[7].Add(st.Factor枝, aGoto3);
/*37*/states[7].Add(st.@LeftParenthesis符, aShift4);
/*38*/states[7].Add(st.@number, aShift5);
// syntaxStates[8]:
// [3] Term : Term '*' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
/*39*/states[8].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[13]));
/*40*/states[8].Add(st.@LeftParenthesis符, aShift4);
/*41*/states[8].Add(st.@number, aShift5);
// syntaxStates[9]:
// [4] Term : Term '/' ⏳ Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [6] Factor : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [7] Factor : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
/*42*/states[9].Add(st.Factor枝, new(LRParseAction.Kind.Goto, states[14]));
/*43*/states[9].Add(st.@LeftParenthesis符, aShift4);
/*44*/states[9].Add(st.@number, aShift5);
// syntaxStates[10]:
// [6] Factor : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Term ;☕ '-' ')' '+'
// [1] Exp : Exp ⏳ '-' Term ;☕ '-' ')' '+'
/*45*/states[10].Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[15]));
/*46*/states[10].Add(st.@Plus符, aShift6);
/*47*/states[10].Add(st.@Dash符, aShift7);
// syntaxStates[11]:
// [0] Exp : Exp '+' Term ⏳ ;☕ '-' ')' '+' '¥'
// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥'
/*48*/states[11].Add(st.@Asterisk符, aShift8);
/*49*/states[11].Add(st.@Slash符, aShift9);
/*50*/states[11].Add(st.@Dash符, aReduce0);
/*51*/states[11].Add(st.@RightParenthesis符, aReduce0);
/*52*/states[11].Add(st.@Plus符, aReduce0);
/*53*/states[11].Add(st.@终, aReduce0);
// syntaxStates[12]:
// [1] Exp : Exp '-' Term ⏳ ;☕ '-' ')' '+' '¥'
// [3] Term : Term ⏳ '*' Factor ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Term : Term ⏳ '/' Factor ;☕ '-' ')' '*' '/' '+' '¥'
/*54*/states[12].Add(st.@Asterisk符, aShift8);
/*55*/states[12].Add(st.@Slash符, aShift9);
/*56*/states[12].Add(st.@Dash符, aReduce1);
/*57*/states[12].Add(st.@RightParenthesis符, aReduce1);
/*58*/states[12].Add(st.@Plus符, aReduce1);
/*59*/states[12].Add(st.@终, aReduce1);
// syntaxStates[13]:
// [3] Term : Term '*' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
/*60*/states[13].Add(st.@Dash符, aReduce3);
/*61*/states[13].Add(st.@RightParenthesis符, aReduce3);
/*62*/states[13].Add(st.@Asterisk符, aReduce3);
/*63*/states[13].Add(st.@Slash符, aReduce3);
/*64*/states[13].Add(st.@Plus符, aReduce3);
/*65*/states[13].Add(st.@终, aReduce3);
// syntaxStates[14]:
// [4] Term : Term '/' Factor ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
/*66*/states[14].Add(st.@Dash符, aReduce4);
/*67*/states[14].Add(st.@RightParenthesis符, aReduce4);
/*68*/states[14].Add(st.@Asterisk符, aReduce4);
/*69*/states[14].Add(st.@Slash符, aReduce4);
/*70*/states[14].Add(st.@Plus符, aReduce4);
/*71*/states[14].Add(st.@终, aReduce4);
// syntaxStates[15]:
// [6] Factor : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
/*72*/states[15].Add(st.@Dash符, aReduce6);
/*73*/states[15].Add(st.@RightParenthesis符, aReduce6);
/*74*/states[15].Add(st.@Asterisk符, aReduce6);
/*75*/states[15].Add(st.@Slash符, aReduce6);
/*76*/states[15].Add(st.@Plus符, aReduce6);
/*77*/states[15].Add(st.@终, aReduce6);
#endregion init actions of syntax states
}
生成的LALR(1)语法分析器的状态图和状态表如下:
由于mermaid支持的字符数量有限,往往不能完全显示一个语言(例如C语言)的全部语法状态及其LR项+lookahead,我在生成状态图时默认只显示每个语法状态的前3个LR项+lookahead。完整的LR项+lookahead可以在生成的语法分析表代码中找到。
状态 | '+' | '-' | '*' | '/' | '(' | ')' | 'number' | '¥' | Exp | Term | Factor |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | S4 | S5 | G1 | G2 | G3 | ||||||
1 | S6 | S7 | ✅ | ||||||||
2 | R[2] | R[2] | S8 | S9 | R[2] | R[2] | |||||
3 | R[5] | R[5] | R[5] | R[5] | R[5] | R[5] | |||||
4 | S4 | S5 | G10 | G2 | G3 | ||||||
5 | R[7] | R[7] | R[7] | R[7] | R[7] | R[7] | |||||
6 | S4 | S5 | G11 | G3 | |||||||
7 | S4 | S5 | G12 | G3 | |||||||
8 | S4 | S5 | G13 | ||||||||
9 | S4 | S5 | G14 | ||||||||
10 | S6 | S7 | S15 | ||||||||
11 | R[0] | R[0] | S8 | S9 | R[0] | R[0] | |||||
12 | R[1] | R[1] | S8 | S9 | R[1] | R[1] | |||||
13 | R[3] | R[3] | R[3] | R[3] | R[3] | R[3] | |||||
14 | R[4] | R[4] | R[4] | R[4] | R[4] | R[4] | |||||
15 | R[6] | R[6] | R[6] | R[6] | R[6] | R[6] |
上表中:
-
S6表示
Shift并进入状态6
; -
R[2]表示
用regulations[2]=Exp : Term ;规约
; -
G1表示
进入状态1
; -
✅表示
Accept
,即分析完毕,语法树生成成功; -
空白的地方表示
遇到语法错误
。
点击查看 调用语法分析表 代码
csharp
public SyntaxTree Parse(TokenList tokenList) {
var context = LRSyntaxContext(tokenList, this.initialState, this.EOT, this.stArray);
var accept = false;
do {
Token token = context.CurrentToken;
int nodeType = token.type;// auto-convert from string to string.
while (nodeType == blockComment || nodeType == inlineComment) {
// skip comment token
context.cursor++;
token = context.CurrentToken;
nodeType = token.type;// auto-convert from string to string.
}
Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context);
accept = parseAction.kind == LRParseAction.Kind.Accept;
}
else { // syntax error happened.
return new SyntaxTree(context);
}
} while (!accept);
var root = context.root;
Debug.Assert(root != null);
return new SyntaxTree(root);
}
public class LRParseAction {
public enum Kind {
Error,
Shift,
Reduce,
Goto,
Accept,
}
public readonly Kind kind;
[StructLayout(LayoutKind.Explicit)]
struct Union {
[FieldOffset(0)] public Dictionary<int/*Node.type*/, LRParseAction> nextState;
[FieldOffset(0)] public Regulation regulation;
}
private Union union;
// 执行分析动作。
public void Execute(LRSyntaxContext context) {
switch (this.kind) {
case Kind.Error: { throw new NotImplementedException(); }
//break;
case Kind.Shift: {
var token = context.CurrentToken;
var leaf = new LRNode(token);
context.nodeStack.Push(leaf);
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
// prepare for next loop.
context.cursor++;
}
break;
case Kind.Reduce: {
var regulation = this.union.regulation;
int count = regulation.right.Length;
var children = new LRNode[count];
var start = Token.empty; LRNode? lastNode = null;
var first = true;
for (int i = 0; i < count; i++) {
var state = context.stateStack.Pop();//只弹出,不再使用。
var node = context.nodeStack.Pop();
children[count - i - 1] = node;
if (node.start.index >= 0) { // this node includes token
if (first) { lastNode = node; first = false; }
start = node.start;
}
}
int tokenCount = 0;
if (lastNode is not null) {
tokenCount =
lastNode.start.index // comment tokens inside of parent are included.
- start.index // comment tokens before parent are excluded.
+ lastNode.tokenCount; // comment tokens after parent are excluded.
}
var parent = new LRNode(regulation, start, tokenCount, children);
for (var i = 0; i < count; i++) { children[i].parent = parent; }
context.nodeStack.Push(parent);
// goto next syntax-state
Dictionary<int/*Node.type*/, LRParseAction> currentState =
context.stateStack.Peek();
var nodeType = regulation.left;
if (currentState.TryGetValue(nodeType, out var parseAction)) {
parseAction.Execute(context); // parseAction is supposed to be a Goto action
}
Debug.Assert(parseAction != null && parseAction.kind == Kind.Goto);
}
break;
case Kind.Goto: {
var nextState = this.union.nextState;
context.stateStack.Push(nextState);
}
break;
case Kind.Accept: {
var state = context.stateStack.Pop();
context.root = context.nodeStack.Pop();
context.Finish(context.root, 40, context.stArray);
}
break;
default: { throw new NotImplementedException(); }
}
}
}
生成的语义内容提取器
举例来说,对于1234+567+89+0+0
这个输入,Calc.st经过语法分析得到的语法树
如下所示:(语法树的生成过程见本文开头)
smalltalk
R[0]=Exp : Exp '+' Term ;⛪T[0->8]
├─R[0]=Exp : Exp '+' Term ;⛪T[0->6]
│ ├─R[0]=Exp : Exp '+' Term ;⛪T[0->4]
│ │ ├─R[0]=Exp : Exp '+' Term ;⛪T[0->2]
│ │ │ ├─R[2]=Exp : Term ;⛪T[0]
│ │ │ │ └─R[5]=Term : Factor ;⛪T[0]
│ │ │ │ └─R[7]=Factor : 'number' ;⛪T[0]
│ │ │ │ └─T[0]='number' 1234
│ │ │ ├─T[1]='+' +
│ │ │ └─R[5]=Term : Factor ;⛪T[2]
│ │ │ └─R[7]=Factor : 'number' ;⛪T[2]
│ │ │ └─T[2]='number' 567
│ │ ├─T[3]='+' +
│ │ └─R[5]=Term : Factor ;⛪T[4]
│ │ └─R[7]=Factor : 'number' ;⛪T[4]
│ │ └─T[4]='number' 89
│ ├─T[5]='+' +
│ └─R[5]=Term : Factor ;⛪T[6]
│ └─R[7]=Factor : 'number' ;⛪T[6]
│ └─T[6]='number' 0
├─T[7]='+' +
└─R[5]=Term : Factor ;⛪T[8]
└─R[7]=Factor : 'number' ;⛪T[8]
└─T[8]='number' 0
从左上到右下,连续4个R[0]
显著增加了树的深度。
bitParser自动生成的语义内容提取器,会后序遍历
此语法树,提取结点信息。
点击查看 通用的 遍历语法树 相关代码
csharp
// Extract some data structure from syntax tree.
// <para>post-order traverse <paramref name="root"/> with stack(without recursion).</para>
public T? Extract(LRNode root, TokenList tokens, string sourceCode) {
var context = new TContext<T>(root, tokens, sourceCode);
var nodeStack = new Stack<LRNode>(); var indexStack = new Stack<int>();
// init stack.
{
// push nextLeft and its next pending children.
var nextLeft = root; var index = 0;
nodeStack.Push(nextLeft); indexStack.Push(index);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
}
while (nodeStack.Count > 0) {
var current = nodeStack.Pop(); var index = indexStack.Pop() + 1;
if (index < current.children.Count) {
// push this node back again.
nodeStack.Push(current); indexStack.Push(index);
// push nextLeft and its next pending children.
var nextLeft = current.children[index];
nodeStack.Push(nextLeft); indexStack.Push(0);
while (nextLeft.children.Count > 0) {
nextLeft = nextLeft.children[0];
nodeStack.Push(nextLeft); indexStack.Push(0);
}
}
else {
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
}
}
{
var current = this.EOT;
// Visit(current);
if (extractorDict.TryGetValue(current.type, out Action<LRNode, TContext<T>>? action)) {
action(current, context);
}
}
return context.result;
}
点击查看 生成的 在遍历语法树时提取结点信息 相关代码
csharp
private static readonly Dictionary<int/*LRNode.type*/,
Action<LRNode, TContext<Exp>>> @expExtractorDict = new();
private static readonly Action<LRNode, TContext<Exp>> VtHandler =
(node, context) => {
var token = node.start;
context.objStack.Push(token);
};
// initialize dict for extractor.
private static void InitializeExtractorDict() {
var extractorDict = @expExtractorDict;
extractorDict.Add(st.@Plus符, VtHandler);
extractorDict.Add(st.@Dash符, VtHandler);
extractorDict.Add(st.@Asterisk符, VtHandler);
extractorDict.Add(st.@Slash符, VtHandler);
extractorDict.Add(st.@LeftParenthesis符, VtHandler);
extractorDict.Add(st.@RightParenthesis符, VtHandler);
extractorDict.Add(st.@number, VtHandler);
extractorDict.Add(st.@终,
static (node, context) => {
// [-1]=Exp' : Exp ;
// dumped by ExternalExtractor
var @final = (VnExp?)context.objStack.Pop();
var left = new Exp(@final);
context.result = left; // final step, no need to push into stack.
}); // end of extractorDict.Add(st.@终, (node, context) => { ... });
extractorDict.Add(st.Exp枝,
static (node, context) => {
switch (node.regulation.index) {
case 0: { // [0]=Exp : Exp '+' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 1: { // [1]=Exp : Exp '-' Term ;
// dumped by ListExtractor 2
var r0 = (VnTerm?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnExp?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 2: { // [2]=Exp : Term ;
// dumped by ListExtractor 1
var r0 = (VnTerm?)context.objStack.Pop();
var left = new VnExp(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Exp枝, (node, context) => { ... });
extractorDict.Add(st.Term枝,
static (node, context) => {
switch (node.regulation.index) {
case 3: { // [3]=Term : Term '*' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 4: { // [4]=Term : Term '/' Factor ;
// dumped by ListExtractor 2
var r0 = (VnFactor?)context.objStack.Pop();
var r1 = (Token?)context.objStack.Pop();
var r2 = (VnTerm?)context.objStack.Pop();
var left = r2;
left.Add(r1, r0);
context.objStack.Push(left);
}
break;
case 5: { // [5]=Term : Factor ;
// dumped by ListExtractor 1
var r0 = (VnFactor?)context.objStack.Pop();
var left = new VnTerm(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Term枝, (node, context) => { ... });
extractorDict.Add(st.Factor枝,
static (node, context) => {
switch (node.regulation.index) {
case 6: { // [6]=Factor : '(' Exp ')' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var r1 = (VnExp?)context.objStack.Pop();
var r2 = (Token?)context.objStack.Pop();
var left = new VnFactor(r2, r1, r0);
context.objStack.Push(left);
}
break;
case 7: { // [7]=Factor : 'number' ;
// dumped by DefaultExtractor
var r0 = (Token?)context.objStack.Pop();
var left = new VnFactor(r0);
context.objStack.Push(left);
}
break;
default: throw new NotImplementedException();
}
}); // end of extractorDict.Add(st.Factor枝, (node, context) => { ... });
}
提取到的结点信息,展示出来如下:
smalltalk
VnExp⛪T[0->8]
├─VnTerm⛪T[0]
│ └─VnFactor⛪T[0]
│ └─T[0]='number' 1234
└─List T[1->8]
├─Item T[1->2]
│ ├─T[1]='+' +
│ └─VnTerm⛪T[2]
│ └─VnFactor⛪T[2]
│ └─T[2]='number' 567
├─Item T[3->4]
│ ├─T[3]='+' +
│ └─VnTerm⛪T[4]
│ └─VnFactor⛪T[4]
│ └─T[4]='number' 89
├─Item T[5->6]
│ ├─T[5]='+' +
│ └─VnTerm⛪T[6]
│ └─VnFactor⛪T[6]
│ └─T[6]='number' 0
└─Item T[7->8]
├─T[7]='+' +
└─VnTerm⛪T[8]
└─VnFactor⛪T[8]
└─T[8]='number' 0
类似Exp : Exp '+' Term ;
的规约规则,会导致语法树深度过大。这一步的语义提取,作用是将此类树结构"压平"。根据压平了的语义信息,很容易将源代码格式化。
点击查看 VnExp结点的格式化 代码
csharp
/// <summary>
/// Correspond to the Vn node Exp in the grammar(Exp).
/// </summary>
internal partial class VnExp : IFullFormatter {
// [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
// [2]=Exp : Term ;
private readonly VnTerm first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnTerm r0;
public PostItem(Token r1, VnTerm r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list;
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
internal VnExp(VnTerm first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
}
// [0]=Exp : Exp '+' Term ;
// [1]=Exp : Exp '-' Term ;
internal void Add(Token r1, VnTerm r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
}
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnTerm结点的格式化 代码
csharp
/// <summary>
/// Correspond to the Vn node Term in the grammar(Exp).
/// </summary>
internal partial class VnTerm : IFullFormatter {
// [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
// [5]=Term : Factor ;
private readonly VnFactor first0;
public class PostItem : IFullFormatter {
private readonly Token r1;
private readonly VnFactor r0;
public PostItem(Token r1, VnFactor r0) {
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r1, r0);
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r1, preConfig, writer);
// '+'或'-'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class PostItemList : IFullFormatter {
private readonly List<PostItem> list = new();
public PostItemList(PostItem item) {
this.list.Add(item);
this._scope = new TokenRange(item);
}
public void Add(PostItem item) {
this.list.Add(item);
this._scope.end = item.Scope.end;
}
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
for (int i = 0; i < list.Count; i++) {
if (i == 0) {
context.PrintBlanksAnd(list[i], preConfig, writer);
}
else {
context.PrintCommentsBetween(list[i - 1], list[i], config, writer);
context.PrintBlanksAnd(list[i], config, writer);
}
}
}
}
private PostItemList? list;
private readonly TokenRange _scope;
public TokenRange Scope => _scope;
// [5]=Term : Factor ;
internal VnTerm(VnFactor first0) {
this.first0 = first0;
this._scope = new TokenRange(first0);
}
// [3]=Term : Term '*' Factor ;
// [4]=Term : Term '/' Factor ;
internal void Add(Token r1, VnFactor r0) {
if (this.list == null) {
this.list = new PostItemList(new(r1, r0));
}
else {
this.list.Add(new(r1, r0));
}
this._scope.end = this.list.Scope.end;
}
public void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.first0, preConfig, writer);
if (this.list != null) {
// '*'或'/'与其左右两边的Token间隔1个空格
var config = new BlankConfig(inlineBlank: 1, forceNewline: false);
context.PrintCommentsBetween(this.first0, this.list, config, writer);
context.PrintBlanksAnd(this.list, config, writer);
}
}
}
点击查看 VnFactor结点的格式化 代码
csharp
/// <summary>
/// Correspond to the Vn node Factor in the grammar(Exp).
/// </summary>
internal abstract partial class VnFactor : IFullFormatter {
// [6]=Factor : '(' Exp ')' ;
// [7]=Factor : 'number' ;
public class C0 : VnFactor {
// [6]=Factor : '(' Exp ')' ;
public C0(Token r2, VnExp r1, Token r0) {
this.r2 = r2;
this.r1 = r1;
this.r0 = r0;
this._scope = new TokenRange(r2, r0);
}
private readonly Token r2;
private readonly VnExp r1;
private readonly Token r0;
private readonly TokenRange _scope;
public override TokenRange Scope => _scope;
public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
context.PrintBlanksAnd(this.r2, preConfig, writer);
// ( Exp )之间不留空格
var config = new BlankConfig(inlineBlank: 0, forceNewline: false);
context.PrintCommentsBetween(this.r2, this.r1, config, writer);
context.PrintBlanksAnd(this.r1, config, writer);
context.PrintCommentsBetween(this.r1, this.r0, config, writer);
context.PrintBlanksAnd(this.r0, config, writer);
}
}
public class C1 : VnFactor {
// [7]=Factor : 'number' ;
public C1(Token r0) {
this.r0 = r0;
this._scope = new TokenRange(r0);
}
private readonly Token r0;
private readonly TokenRange _scope;
public override TokenRange Scope => _scope;
public override void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context) {
// 根据上级设置的preConfig输出自己唯一的token
context.PrintBlanksAnd(this.r0, preConfig, writer);
}
}
public abstract TokenRange Scope { get; }
public abstract void FullFormat(BlankConfig preConfig, TextWriter writer, FormatContext context);
}
最终,1234+567+89+0+0
被格式化后的样子如下(在运算符两侧各加入1个空格):
csharp
1234 + 567 + 89 + 0 + 0
下面是更多示例:
csharp
// 格式化 1-2*3
1 - 2 * 3
// 格式化 (1+2)/3
(1 + 2) / 3
// 格式化 (1+2)*(3-4)
(1 + 2) * (3 - 4)
点击查看 GLSL代码格式化的示例1
GLSL
// 示例1 格式化前
void main() {
int a=0;
a++++;
++++a;
}
// 示例1 格式化后
void main() {
int a = 0;
a++ ++;
++ ++a;
}
点击查看 GLSL代码格式化的示例2
GLSL
// 示例2 格式化前
in vec3 passNormal;
in vec2 passTexCoord;
uniform sampler2D textureMap;
uniform vec3 lihtDirection=vec3(1,1,1);
uniform vec3 diffuseColor;
uniform bool transparent=false;
out vec4 outColor;
void main( ){
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;}
if (passTexCoord==vec2(-1,-1)){ // when texture coordinate not exists..
float diffuse=max(dot(normalize(lihtDirection),normalize(passNormal)),0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord);}
}
// 示例2 格式化后
in vec3 passNormal;
in vec2 passTexCoord;
uniform sampler2D textureMap;
uniform vec3 lihtDirection = vec3(1, 1, 1);
uniform vec3 diffuseColor;
uniform bool transparent = false;
out vec4 outColor;
void main() {
if (transparent) {
if (int(gl_FragCoord.x + gl_FragCoord.y) % 2 == 1) discard;
}
if (passTexCoord == vec2(-1, -1)) { // when texture coordinate not exists..
float diffuse = max(dot(normalize(lihtDirection), normalize(passNormal)), 0);
outColor = vec4(diffuseColor * diffuse, 1.0);
}
else { outColor = texture(textureMap, passTexCoord); }
}
关于这个格式化算法的详细介绍,可参考我的另一篇文章(GLSL Shader的格式化算法(LALR解析器))。
举例-自动解决Shift/Reduce冲突
C语言和GLSL中都有if-else悬挂问题,它是由下述产生式引起的:
smalltalk
selection_statement :
'if' '(' expression ')' selection_rest_statement
;
selection_rest_statement :
statement 'else' statement
| statement
;
这样,在语法分析器读到'else'
这个Token时,它是应当Shift这个'else'
呢,还是应当按照selection_rest_statement : statement ;
进行规约呢?这就产生了Shift/Reduce冲突。
bitParser会自动选择按照Shift处理,将按照Ruduce方式处理的那一项注释掉,如下代码所示:
csharp
const int syntaxStateCount = 477;
// LALR(1) syntax parse table
private static readonly Dictionary<string/*LRNode.type*/, LRParseAction>[]
syntaxStates = new Dictionary<string, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerGLSL.syntaxStates;
...
// 30814 actions
// conflicts(1)=not sovled(0)+solved(1)(1 warnings)
#region init actions of syntax states
...
// syntaxStates[454]:
// [324] selection_rest_statement : statement ⏳ 'else' statement ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
// [325] selection_rest_statement : statement ⏳ ;☕ '--' '-' ';' '!' '(' '{' '}' '+' ..
// 'else' repeated 2 times
states[454]/*28145*/.Add(st.@else, new(LRParseAction.Kind.Shift, states[466]));
// ⚔ PreferShiftToReduce states[454]/*28146*/.Add(st.@else, new(regulations[325]));
states[454]/*28147*/.Add(st.@Dash符Dash符, new(regulations[325]));
...
#endregion init actions of syntax states
}
举例-优先级指令%nonassoc、%left、%right、%prec
如果我们按最直观的方式书写Calc.st,可能是这样的:
smalltalk
Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ;
%%[0-9]+%% 'number'
点击查看 其LALR(1)语法分析器的状态转换表代码
csharp
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states
#region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(16 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : ⏳ Exp ;☕ '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ⏳ ;☕ '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[9]/*34*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
states[9]/*36*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[10]/*44*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
states[10]/*46*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[11]/*54*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
states[11]/*56*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
states[11]/*58*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
states[11]/*60*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
states[12]/*64*/.Add(st.@Plus符, aShift4);
// ⚔ PreferShiftToReduce states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
states[12]/*66*/.Add(st.@Dash符, aShift5);
// ⚔ PreferShiftToReduce states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
states[12]/*68*/.Add(st.@Asterisk符, aShift6);
// ⚔ PreferShiftToReduce states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
states[12]/*70*/.Add(st.@Slash符, aShift7);
// ⚔ PreferShiftToReduce states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states
}
当处于syntaxStates[9]
时,如果遇到'+'
这个Token,那么本应当Reduce,但是bitParser根据默认的"Shift优先于Reduce"的原则,选择了Shift。显然,这无法正确处理加减运算和乘除运算的优先关系。
如果不想将文法改写为本文最初的样式,这里可以用优先级指令
声明加减乘除运算的优先级,从而得到正确的语法分析表。
smalltalk
Exp : Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '(' Exp ')'
| 'number' ;
%%[0-9]+%% 'number'
%left '+' '-' // '+' '-' 的优先级相同,且偏向于Reduce
%left '*' '/' // '*' '/' 的优先级相同,且高于'+' '-',且偏向于Reduce
点击查看 使用了优先级指令的LALR(1)语法分析器的状态转换表 代码
csharp
const int syntaxStateCount = 14;
/// <summary>
/// LALR(1) syntax parse table
/// </summary>
private static readonly Dictionary<string/*Node.type*/, LRParseAction>[] syntaxStates =
new Dictionary<string/*Node.type*/, LRParseAction>[syntaxStateCount];
private static void InitializeSyntaxStates() {
var states = CompilerExp.syntaxStates;
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region create objects of syntax states
states[0] = new(capacity: 3);
states[1] = new(capacity: 5);
states[2] = new(capacity: 3);
states[3] = new(capacity: 6);
states[4] = new(capacity: 3);
states[5] = new(capacity: 3);
states[6] = new(capacity: 3);
states[7] = new(capacity: 3);
states[8] = new(capacity: 5);
states[9] = new(capacity: 6);
states[10] = new(capacity: 6);
states[11] = new(capacity: 6);
states[12] = new(capacity: 6);
states[13] = new(capacity: 6);
#endregion create objects of syntax states
#region re-used actions
LRParseAction aShift2 = new(LRParseAction.Kind.Shift, states[2]);// refered 6 times
LRParseAction aShift3 = new(LRParseAction.Kind.Shift, states[3]);// refered 6 times
LRParseAction aShift4 = new(LRParseAction.Kind.Shift, states[4]);// refered 6 times
LRParseAction aShift5 = new(LRParseAction.Kind.Shift, states[5]);// refered 6 times
LRParseAction aShift6 = new(LRParseAction.Kind.Shift, states[6]);// refered 6 times
LRParseAction aShift7 = new(LRParseAction.Kind.Shift, states[7]);// refered 6 times
LRParseAction aReduce5 = new(regulations[5]);// refered 6 times
LRParseAction aReduce0 = new(regulations[0]);// refered 6 times
LRParseAction aReduce1 = new(regulations[1]);// refered 6 times
LRParseAction aReduce2 = new(regulations[2]);// refered 6 times
LRParseAction aReduce3 = new(regulations[3]);// refered 6 times
LRParseAction aReduce4 = new(regulations[4]);// refered 6 times
#endregion re-used actions
// 80 actions
// conflicts(16)=not sovled(0)+solved(16)(0 warnings)
#region init actions of syntax states
// syntaxStates[0]:
// [-1] Exp' : ⏳ Exp ;☕ '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' '*' '/' '+' '¥'
states[0]/*0*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[1]));
states[0]/*1*/.Add(st.@LeftParenthesis符, aShift2);
states[0]/*2*/.Add(st.@number, aShift3);
// syntaxStates[1]:
// [-1] Exp' : Exp ⏳ ;☕ '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' '*' '/' '+' '¥'
states[1]/*3*/.Add(st.@Plus符, aShift4);
states[1]/*4*/.Add(st.@Dash符, aShift5);
states[1]/*5*/.Add(st.@Asterisk符, aShift6);
states[1]/*6*/.Add(st.@Slash符, aShift7);
states[1]/*7*/.Add(st.@终, LRParseAction.accept);
// syntaxStates[2]:
// [4] Exp : '(' ⏳ Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+'
states[2]/*8*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[8]));
states[2]/*9*/.Add(st.@LeftParenthesis符, aShift2);
states[2]/*10*/.Add(st.@number, aShift3);
// syntaxStates[3]:
// [5] Exp : 'number' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
states[3]/*11*/.Add(st.@Dash符, aReduce5);
states[3]/*12*/.Add(st.@RightParenthesis符, aReduce5);
states[3]/*13*/.Add(st.@Asterisk符, aReduce5);
states[3]/*14*/.Add(st.@Slash符, aReduce5);
states[3]/*15*/.Add(st.@Plus符, aReduce5);
states[3]/*16*/.Add(st.@终, aReduce5);
// syntaxStates[4]:
// [0] Exp : Exp '+' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[4]/*17*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[9]));
states[4]/*18*/.Add(st.@LeftParenthesis符, aShift2);
states[4]/*19*/.Add(st.@number, aShift3);
// syntaxStates[5]:
// [1] Exp : Exp '-' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[5]/*20*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[10]));
states[5]/*21*/.Add(st.@LeftParenthesis符, aShift2);
states[5]/*22*/.Add(st.@number, aShift3);
// syntaxStates[6]:
// [2] Exp : Exp '*' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[6]/*23*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[11]));
states[6]/*24*/.Add(st.@LeftParenthesis符, aShift2);
states[6]/*25*/.Add(st.@number, aShift3);
// syntaxStates[7]:
// [3] Exp : Exp '/' ⏳ Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : ⏳ Exp '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : ⏳ Exp '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : ⏳ Exp '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : ⏳ Exp '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [4] Exp : ⏳ '(' Exp ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [5] Exp : ⏳ 'number' ;☕ '-' ')' '*' '/' '+' '¥'
states[7]/*26*/.Add(st.@vnExp, new(LRParseAction.Kind.Goto, states[12]));
states[7]/*27*/.Add(st.@LeftParenthesis符, aShift2);
states[7]/*28*/.Add(st.@number, aShift3);
// syntaxStates[8]:
// [4] Exp : '(' Exp ⏳ ')' ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+'
states[8]/*29*/.Add(st.@RightParenthesis符, new(LRParseAction.Kind.Shift, states[13]));
states[8]/*30*/.Add(st.@Plus符, aShift4);
states[8]/*31*/.Add(st.@Dash符, aShift5);
states[8]/*32*/.Add(st.@Asterisk符, aShift6);
states[8]/*33*/.Add(st.@Slash符, aShift7);
// syntaxStates[9]:
// [0] Exp : Exp '+' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*34*/.Add(st.@Plus符, aShift4);
states[9]/*35*/.Add(st.@Plus符, aReduce0);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[9]/*36*/.Add(st.@Dash符, aShift5);
states[9]/*37*/.Add(st.@Dash符, aReduce0);
// '*' repeated 2 times
states[9]/*38*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[9]/*39*/.Add(st.@Asterisk符, aReduce0);
// '/' repeated 2 times
states[9]/*40*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[9]/*41*/.Add(st.@Slash符, aReduce0);
states[9]/*42*/.Add(st.@RightParenthesis符, aReduce0);
states[9]/*43*/.Add(st.@终, aReduce0);
// syntaxStates[10]:
// [1] Exp : Exp '-' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*44*/.Add(st.@Plus符, aShift4);
states[10]/*45*/.Add(st.@Plus符, aReduce1);
// '-' repeated 2 times
// ⚔ LeftShouldReduce states[10]/*46*/.Add(st.@Dash符, aShift5);
states[10]/*47*/.Add(st.@Dash符, aReduce1);
// '*' repeated 2 times
states[10]/*48*/.Add(st.@Asterisk符, aShift6);
// ⚔ LowPrecedence states[10]/*49*/.Add(st.@Asterisk符, aReduce1);
// '/' repeated 2 times
states[10]/*50*/.Add(st.@Slash符, aShift7);
// ⚔ LowPrecedence states[10]/*51*/.Add(st.@Slash符, aReduce1);
states[10]/*52*/.Add(st.@RightParenthesis符, aReduce1);
states[10]/*53*/.Add(st.@终, aReduce1);
// syntaxStates[11]:
// [2] Exp : Exp '*' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[11]/*54*/.Add(st.@Plus符, aShift4);
states[11]/*55*/.Add(st.@Plus符, aReduce2);
// '-' repeated 2 times
// ⚔ LowPrecedence states[11]/*56*/.Add(st.@Dash符, aShift5);
states[11]/*57*/.Add(st.@Dash符, aReduce2);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*58*/.Add(st.@Asterisk符, aShift6);
states[11]/*59*/.Add(st.@Asterisk符, aReduce2);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[11]/*60*/.Add(st.@Slash符, aShift7);
states[11]/*61*/.Add(st.@Slash符, aReduce2);
states[11]/*62*/.Add(st.@RightParenthesis符, aReduce2);
states[11]/*63*/.Add(st.@终, aReduce2);
// syntaxStates[12]:
// [3] Exp : Exp '/' Exp ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
// [0] Exp : Exp ⏳ '+' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [1] Exp : Exp ⏳ '-' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [2] Exp : Exp ⏳ '*' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// [3] Exp : Exp ⏳ '/' Exp ;☕ '-' ')' '*' '/' '+' '¥'
// '+' repeated 2 times
// ⚔ LowPrecedence states[12]/*64*/.Add(st.@Plus符, aShift4);
states[12]/*65*/.Add(st.@Plus符, aReduce3);
// '-' repeated 2 times
// ⚔ LowPrecedence states[12]/*66*/.Add(st.@Dash符, aShift5);
states[12]/*67*/.Add(st.@Dash符, aReduce3);
// '*' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*68*/.Add(st.@Asterisk符, aShift6);
states[12]/*69*/.Add(st.@Asterisk符, aReduce3);
// '/' repeated 2 times
// ⚔ LeftShouldReduce states[12]/*70*/.Add(st.@Slash符, aShift7);
states[12]/*71*/.Add(st.@Slash符, aReduce3);
states[12]/*72*/.Add(st.@RightParenthesis符, aReduce3);
states[12]/*73*/.Add(st.@终, aReduce3);
// syntaxStates[13]:
// [4] Exp : '(' Exp ')' ⏳ ;☕ '-' ')' '*' '/' '+' '¥'
states[13]/*74*/.Add(st.@Dash符, aReduce4);
states[13]/*75*/.Add(st.@RightParenthesis符, aReduce4);
states[13]/*76*/.Add(st.@Asterisk符, aReduce4);
states[13]/*77*/.Add(st.@Slash符, aReduce4);
states[13]/*78*/.Add(st.@Plus符, aReduce4);
states[13]/*79*/.Add(st.@终, aReduce4);
#endregion init actions of syntax states
}
bitParser中的优先级指令%nonassoc
、%left
、%right
、%prec
,与yacc相同:
-
书写顺序靠后的,优先级更高;
-
%left
偏向于Reduce; -
%right
偏向于Shift; -
%nonassoc
表示有语法错误; -
%prec
可以特别指定一个Token类型,以使用其优先级,而不是采用默认的(规约规则最右边Vt)的优先级。而且可以指定文法中不存在的Vt(即此Vt纯粹是个占位符)。
举例-/
后缀
在Step格式的文件里,1=2
中的1
应当被认为是一个'entityId'
,而2
应当被认为是一个'refEntity'
,即对"entity 2"的引用。
如何区别两者呢?当某个数值后面有=
时,它就是'entityId'
,否则就是'refEntity'
。此时就需要用/
后缀功能,如下所示:
smalltalk
// postfix.st
// regulations:
Items : Items Item | Item ;
Item : 'entityId' '=' 'refEntity' ;
// lexi statements:
%%[0-9]+/[ \t]*=%% 'entityId' // 数字后跟随着" =",那么这些数字就是'entityId'
%%[0-9]+%% 'refEntity'
点击查看 生成的 lexi状态0 相关代码
csharp
// lexicalState0
private static readonly Action<LexicalContext, char> lexicalState0 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : '=' */
c == '='/*'\u003D'(61)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState2;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
BeginToken(context);
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else if (c == ' ' || c == '\r' || c == '\n' || c == '\t' || c == '\0') {
context.currentState = lexicalState0; // skip them.
}
else { // unexpected char.
BeginToken(context);
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态1 相关代码
csharp
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* deal with everything else. */
else { // token with error type
ExtendToken(context);
AcceptToken(st.Error, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态2 相关代码
csharp
// lexicalState2
private static readonly Action<LexicalContext, char> lexicalState2 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@Equal符, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态3 相关代码
csharp
// lexicalState3
private static readonly Action<LexicalContext, char> lexicalState3 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* = */
else if (/* possible Vt : 'entityId' */
c == '='/*'\u003D'(61)*/) {
context.currentState = lexicalState4;
}
/* [ \t] */
else if (/* possible Vt : 'entityId' */
(c == ' '/*'\u0020'(32)*/)
|| (c == '\t'/*'\u0009'(9)*/)) {
context.currentState = lexicalState1;
}
/* [0-9] */
else if (/* possible Vt : 'entityId' 'refEntity' */
'0'/*'\u0030'(48)*/ <= c && c <= '9'/*'\u0039'(57)*/) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken(st.@refEntity, context);
context.currentState = lexicalState0;
}
};
点击查看 生成的 lexi状态4 相关代码
csharp
// lexicalState4
private static readonly Action<LexicalContext, char> lexicalState4 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* deal with everything else. */
else {
AcceptToken(st.@entityId, context);
context.currentState = lexicalState0;
}
};
下面是二维数组ElseIf[][]
形式的状态转换表,它复用了一些ElseIf
对象,这进一步减少了空间占用。
点击查看 生成的 二维数组状态转换表 代码
csharp
private static readonly ElseIf[][] lexiStates = new ElseIf[5][];
static void InitializeLexiTable() {
ElseIf s9_9_0_1 = new('\t'/*'\u0009'(9)*/, Acts.None, 1);//refered 2 times
ElseIf s32_32_0_1 = new(' '/*'\u0020'(32)*/, Acts.None, 1);//refered 2 times
ElseIf s61_61_0_4 = new('='/*'\u003D'(61)*/, Acts.None, 4);//refered 2 times
lexiStates[0] = new ElseIf[] {
// possible Vt: 'entityId' 'refEntity'
/*0*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Begin | Acts.Extend, 3),
// possible Vt: '='
/*1*/new('='/*'\u003D'(61)*/, Acts.Begin | Acts.Extend, 2),
};
lexiStates[1] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/ //new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId'
/*2*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*2*/s61_61_0_4,
};
lexiStates[2] = new ElseIf[] {
// possible Vt: '='
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@Equal符),
};
lexiStates[3] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*///new('\t'/*'\u0009'(9)*/, Acts.None, 1),
/*0*/s9_9_0_1,
// possible Vt: 'entityId'
/*1*///new(' '/*'\u0020'(32)*/, Acts.None, 1),
/*1*/s32_32_0_1,
// possible Vt: 'entityId' 'refEntity'
/*2*/new('0'/*'\u0030'(48)*/, '9'/*'\u0039'(57)*/, Acts.Extend, 3),
// possible Vt: 'entityId'
/*3*///new('='/*'\u003D'(61)*/, Acts.None, 4),
/*3*/s61_61_0_4,
// possible Vt: 'refEntity'
/*4*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@refEntity),
};
lexiStates[4] = new ElseIf[] {
// possible Vt: 'entityId'
/*0*/new('\u0000'/*(0)*/, '\uFFFF'/*�(65535)*/, Acts.Accept, 0, st.@entityId),
};
}
其miniDFA的状态图如下:
结合代码和状态图容易发现:
-
当处于
miniDFA3
状态时,如果遇到空格
、\t
、=
则可断定下一个Token类型是'entityId'
(或错误类型'Error错'
);否则就是'refEntity'
(并返回miniDFA0
状态)。这符合我们的预期。 -
当处于
miniDFA0
状态(即初始状态)时,如果遇到[0-9]
,那么下一个Token类型既可能是'entityId'
,又可能是'refEntity'
。此时尚且无法唯一断定之。
也就是说,这个例子也包含了将NFA转换为DFA的问题,因为它的两种Token'entityId'
和'refEntity'
的开头[0-9]+
是相同的。
我们来对比它的NFA和DFA:
下面是它的NFA状态图:
下面是它的DFA状态图:
可见:
- 在NFA状态图中,
NFA0-0
状态在遇到[0-9]
时,有两个选择:NFA2-1
状态和NFA3-1
状态。在对应的DFA状态图中,NFA2-1
状态和NFA3-1
状态就被合并到了一个DFA状态(DFA2
)中。这是通过子集构造法
实现的。
这里顺便展示一个构造miniDFA的例子:
smalltalk
// regulations:
Left : 'min' ;
// lexical statements
%%[a-zA-Z_][a-zA-Z0-9_]*/,%% 'min' //标识符后跟',',则为'min'
这个'min'
的DFA如下:
这个'min'
的miniDFA如下:
可见,miniDFA合并了DFA中的DFA1
状态和DFA3
状态。它们两个能够合并,是因为它们两个在读到相同的char
时均会跳转到相同的下一个状态。这是通过Hopcroft算法
实现的。
举例-前缀<'Vt'>
在GLSL中,为了方便语法分析,我需要将struct Point { float x; float y; }
中的Point
识别为一个"type_name"类型的Token,而不是"identifier"类型的Token。这可以通过前缀来实现。
smalltalk
// ..
struct_specifier :
'struct' 'type_name' '{' struct_declaration_list '}' ;
// ..
// lexical statements
%%<'struct'>[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name' // 跟在struct之后的Token应当被设定为"type_name"类型
%%[a-zA-Z_][a-zA-Z0-9_]*%% 'identifier' // 平时应当被设定为"identifier"类型
添加前缀<'struct'>
不会影响词法分析器的状态构成,只会在设置Token类型时看一看上一个Token是不是"struct"类型:若是,则设置为"type_name"类型,否则,设置为"identifier"类型。
另外,还需要新增一个数组,用于记录已识别出的全部"type_name"类型,以便再次遇到它时,也能够将其设置为"type_name"类型。
点击查看 状态机受前缀影响的部分 代码
csharp
// lexicalState1
private static readonly Action<LexicalContext, char> lexicalState1 =
static (context, c) => {
if (false) { /* for simpler code generation purpose. */ }
/* [a-zA-Z0-9_] */
else if (/* possible Vt : 'type_name' 'identifier' */
('a' <= c && c <= 'z')
|| ('A' <= c && c <= 'Z')
|| ('0' <= c && c <= '9')
|| (c == '_')) {
ExtendToken(context);
context.currentState = lexicalState3;
}
/* deal with everything else. */
else {
AcceptToken2(context
// 如果上一个Token是struct,那么新Token是type_name
, new(/*<'Vt'>*/st.@struct, st.@type_name)
// 否则,新Token是identifier
, new(/*default preVt*/string.Empty, st.@identifier));
context.currentState = lexicalState0;
}
};
与之配套的AcceptToken2(context, ifVts);
函数也就复杂些:
点击查看 void AcceptToken2(LexicalContext context, params IfVt[] ifVts); 代码
csharp
struct IfVt {
public readonly string preVt;
public readonly string Vt;
public IfVt(string preVt, string Vt) {
this.preVt = preVt;
this.Vt = Vt;
}
}
private static void AcceptToken2(LexicalContext context, params IfVt[] ifVts) {
var startIndex = context.analyzingToken.start.index;
var endIndex = context.analyzingToken.end.index;
context.analyzingToken.value = context.sourceCode.Substring(
startIndex, endIndex - startIndex + 1);
var typeSet = false;
const string key = "type_name"; var hadThisTypeName = false;
if (!typeSet) {
if (context.tagDict.TryGetValue(key, out var type_nameList)) {
var list = type_nameList as List<string>;
if (list.Contains(context.analyzingToken.value)) {
// 如果是已识别出的type_name
context.analyzingToken.type = st.type_name;
typeSet = true;
hadThisTypeName = true;
}
}
}
if (!typeSet) {
int lastType = 0;
if (context.lastSyntaxValidToken != null) {
lastType = context.lastSyntaxValidToken.type;
}
for (var i = 0; i < ifVts.Length; i++) {
var ifVt = ifVts[i];
if (ifVt.preVt == 0 // 默认没有前缀
|| ifVt.preVt == lastType) { // 匹配到了前缀<'Vt'>
context.analyzingToken.type = ifVt.Vt;
typeSet = true;
break;
}
}
}
if (!typeSet) {
// we failed to assign type according to lexi statements.
// this indicates token error in source code or inappropriate lexi statements.
context.analyzingToken.type = st.Error错;
context.signalCondition = LexicalContext.defaultSignal;
}
// cancel forward steps for post-regex
var backStep = context.cursor.index - context.analyzingToken.end.index;
if (backStep > 0) { context.MoveBack(backStep); }
// next operation: context.MoveForward();
var token = context.analyzingToken.Dump();
context.result.Add(token);
// 跳过注释
if (context.analyzingToken.type != st.blockComment
&& context.analyzingToken.type != st.inlineComment) {
context.lastSyntaxValidToken = token;
}
if (!hadThisTypeName && context.analyzingToken.type == st.type_name) {
// 将新识别出的type_name加入list
if (!context.tagDict.TryGetValue(key, out var type_nameList)) {
type_nameList = new List<string>();
context.tagDict.Add(key, type_nameList);
}
var list = type_nameList as List<string>;
list.Add(context.analyzingToken.value);
}
}
注意,语法分析不需要blockComment
和inlineComment
(类型为"注释"的Token),因而在记录上一个Token类型时,我们要跳过注释。
举例-状态信号<signal1, signal2, ..>
在GLSL中,为了方便语法分析,我需要将subroutine ( r1, r2 )
中的r1
、r2
都识别为"type_name"类型的Token,而不是"identifier"类型的Token。这无法通过前缀实现,但可以通过状态信号LexicalContext.signal
实现。
状态信号是这样起作用的:
在读到一个"subroutine"类型的Token时,将LexicalContext.signal
设置为subroutine0
;
在读到一个"("类型的Token时,如果LexicalContext.signal
是subroutine0
,就将LexicalContext.signal
设置为subroutine1
;
在读到一个符合[a-zA-Z_][a-zA-Z0-9_]*
的标识符时,如果LexicalContext.signal
是subroutine1
(这说明词法分析器刚刚连续读到了'subroutine' '('
),就将它识别为"type_name"类型的Token,否则识别为"identifier"类型的Token;
在读到一个")"类型的Token时,如果LexicalContext.signal
是subroutine1
,就将LexicalContext.signal
设置为default
(默认状态),即不再理会状态信号。
smalltalk
storage_qualifier :
| 'subroutine' '(' type_name_list ')' ;
// lexical statements
%%subroutine%% 'subroutine' subroutine0
<subroutine0>%%[(]%% '(' subroutine1
<subroutine1>%%[a-zA-Z_][a-zA-Z0-9_]*%% 'type_name'
<subroutine1>%%[,]%% ','
<subroutine1>%%[)]%% ')' default
状态信号,是在词法分析器这个状态机的基础上又附加了一个状态机,因而其应用比较复杂,容易出错,应当尽量少用。
举例-中文字符
如果我想识别出一个文本文件中的全部汉字(假设汉字字符位于\u4E00
和\u9FFF
之间),可以这样:
smalltalk
Items : Items Item | Item ;
Item : 'chineseChar' | 'other' ;
%%[\u4E00-u9FFF]%% 'chineseChar'
%%[^\u4E00-u9FFF]%% 'other'