Part10_编写自己的解释器

今天我们将会继续去缩小理想和现实的差距。我们的理想是构建一个Pascal编程语言子集的功能全面的解释器。

在这篇文章中,我们将会更新我们的解释器,使其能够去解析以及解释我们的第一个非常完整的Pascal程序。这个程序也能够被开源的Pascal编译器成功编译。

下面是完整的Pascal程序:

Pascal 复制代码
PROGRAM Part10;
VAR
   number     : INTEGER;
   a, b, c, x : INTEGER;
   y          : REAL;

BEGIN {Part10}
   BEGIN
      number := 2;
      a := number;
      b := 10 * a + 10 * number DIV 4;
      c := a - - b
   END;
   x := 11;
   y := 20 / 7 + 3.14;
   { writeln('a = ', a); }
   { writeln('b = ', b); }
   { writeln('c = ', c); }
   { writeln('number = ', number); }
   { writeln('x = ', x); }
   { writeln('y = ', y); }
END.  {Part10}

好了,让我们来看一看今天将要学习哪些内容:

  1. 我们将要学习如何解析以及解释Pascal的PROGRAM头。
  2. 我们将要学习如何解析Pascal的变量声明。
  3. 我们将会更新我们的解释器,使其能够使用DIV关键字进行整数除法,使用前导斜线(/)进行小数除法。
  4. 我们将会添加对Pascal注释的支持。

首先,让我们仔细看一下语法发生的变化。今天,我们将会添加一些新的规则,同时也会更新一些现有的规则:

  1. program 的语法规则定义更新了,其包含PROGRAM保留关键字、程序名称、一个语句块、以及一个英文句点作为结尾。下面是一个完整的Pascal程序的例子:

    Pascal 复制代码
    PROGRAM Part10;
    BEGIN
    END.
  2. block 规则合并了 declarationscompound_statement 规则。系列的后面当我们添加过程声明的时候,还会再次使用该规则。下面是语句块的一个示例:

    Pascal 复制代码
     VAR
        number : INTEGER;
     
     BEGIN
     END

    另一个示例:

    Pascal 复制代码
    BEGIN
    END
  3. Pascal声明语法由很多部分组成,并且每一个部分都是可选的。在这篇文章中,我们讲仅仅只关注变量声明部分。declarations 规则要么是一个变量声明的子规则,要么是空。

  4. Pascal是一个静态类型编程语言,这也就意味着每个变量在声明的时候都需要显式指定其类型。在Pascal中,变量在使用前必须先声明。变量声明通过在变量声明部分使用关键字VAR进行声明。你可以这样去定义变量:

    Pascal 复制代码
    VAR
        number: INTEGER;
        a, b, c, x: INTEGER;
        y: REAL;
  5. type_spec 规则用于在变量声明时指定类型(INTEGER 或者 REAL)。例子如下:

    Pascal 复制代码
    VAR
    	a: INTEGER;
    	b: REAL;

    变量'a'声明时指定的类型是INTEGER 。变量'b'声明时指定的类型是REAL。在这篇文章中,我们不进行类型检查,但是不久的将来,我们会添加对类型检查的支持。

  6. term 规则被更新了,其能够使用DIV 关键字进行整数除法,使用前导斜线(/)进行小数除法。

    之前,20除以7等于整数2:

    Pascal 复制代码
    20 / 7 = 2

    现在,20除以7会得到一个小数2.85714285714 :

    Pascal 复制代码
    20 / 7 = 2.85714285714 

    如果还想使用整数除法,使用DIV 保留关键字:

    Pascal 复制代码
    20 DIV 7 = 2
  7. factor 规则更新了,其能够同时处理整型和浮点型的常量。我将INTEGER的子规则移除掉了,这是因为现在常量使用INTEGER_CONSTREAL_CONST 去表示,而且 INTEGER 符号已经被用于表示整型数据类型了。下面例子中,我们的词法解析器将会为20和7生成一个INTEGER_CONST 的符号实例,为3.14生成一个 REAL_CONST 的符号实例:

    Pascal 复制代码
    y := 20 / 7 + 3.14

下面就是我们今天完整的BNF:

bash 复制代码
program: 
	PROGRAM variable SEMI block DOT
block:
	declarations compound_statement
declarations:
	empty |
	VAR (variable_declaration SEMI)+
variable_declaration:
	variable (COMMA variable)* COLON type_spec
type_spec:
	INTEGER |
	REAL
compound_statement:
	BEGIN statement_list END
statement_list:
	statement |
	statement SEMI statement_list
statement:
	empty |
	compound_statement |
	assign_statement
empty:
assign_statement:
	variable ASSIGN expr
expr: 
	term ((PLUS|MINUS)term)*
term: 
	factor ((MUL|DIV|REAL_DIV)factor)*
factor:
	INTEGER_CONST |
	REAL_CONST |
	PLUS factor |
	MINUS factor |
	LPAREN factor RPAREN |
	variable
variable:
	ID

我们将会以和上次一样的轨迹介绍今天的变化内容:

  1. 更新词法解析器;
  2. 更新语法解析器;
  3. 更新解释器;

更新词法解析器

下面是词法解析器的变化总结:

  1. 新的符号实例;
  2. 新的以及要修改的保留关键字;
  3. 用于处理Pascal注释的新方法:skip_comment
  4. integer方法重命名,并且修改内部的逻辑;
  5. 更新get_next_token方法,使其能够返回新的符号实例对象;

让我们仔细看看上面提到的变化吧:

  1. 为了能够处理程序头、变量声明、整型和浮点型常量以及整数和小数除法,我们需要去添加一些新的符号(有些符号是保留关键字)。我们也需要去更新一下INTEGER 符号的含义,使其表示一个整型的数据类型,而不是表示一个整型常量。下面是一份完整的新增以及发生变更的符号清单:

    • PROGRAM(保留关键字)
    • VAR(保留关键字)
    • COLON(:)
    • COMMA(,)
    • INTEGER (我们使其表示一个整型的数据类型,而不是一个整型的常量)
    • REAL (Pascal的REAL类型)
    • INTEGER_CONST(例如:3或者5)
    • REAL_CONST(例如:3.14)
    • DIV (整数除法,保留关键字)
    • REAL_DIV(小数除法,/)
  2. 下面是保留关键字到符号实例对象的完整映射:

    python 复制代码
     RESERVED_KEYWORDS: Dict[str, Token] = {
         PROGRAM: Token(PROGRAM, PROGRAM),
         VAR: Token(VAR, VAR),
         INTEGER: Token(INTEGER, INTEGER),
         REAL: Token(REAL, REAL),
         BEGIN: Token(BEGIN, BEGIN),
         END: Token(END, END),
         DIV: Token(DIV, DIV)
     }
  3. 我们添加了skip_comment 方法去处理Pascal的注释。方法相当简单。它的工作就是跳过所有的字符,直到该字符是右花括号(}):

    python 复制代码
     def skip_comment(self) -> None:
         while self.current_char != '}':
             self.advance()
         self.advance() # 右花括号,表示注释结束
  4. 我们讲方法integer 更名为 number。它可以同时处理整型(3)和浮点型(3.14)常量:

    python 复制代码
     def number(self) -> Token:
         """
         返回text输入中(多数位)整数或小数 
         """
         result: str = '0'
         while self.current_char is not None and self.current_char.isdigit():
             result = f'{result}{self.current_char}'
             self.advance()
    
         if self.current_char == '.':
             result = f'{result}{self.current_char}'
             self.advance()
             while self.current_char is not None and self.current_char.isdigit():
                 result = f'{result}{self.current_char}'
                 self.advance()
             return Token(REAL_CONST, float(result))
         return Token(INTEGER_CONST, int(result))
  5. 我们也更新了get_next_token 方法,使其能够返回新的符号实例对象:

    python 复制代码
     def get_next_token(self) -> Token:
         while self.current_char is not None:
             if self.current_char.isspace():
                 self.skip_whitespace()
                 continue
             if self.current_char == '{':
                 self.skip_comment()
                 continue
             if self.current_char.isalpha():
                 return self.__id()
             if self.current_char.isdigit():
                 return self.number()
             if self.current_char == ';':
                 self.advance()
                 return Token(SEMI, ';')
             if self.current_char == '.':
                 self.advance()
                 return Token(DOT, '.')
             if self.current_char == ',':
                 self.advance()
                 return Token(COMMA, ',')
             if self.current_char == ':':
                 if self.peek() == '=':
                     self.advance()
                     self.advance()
                     return Token(ASSIGN, ':=')
                 self.advance()
                 return Token(COLON, ':')
             if self.current_char == '+':
                 self.advance()
                 return Token(PLUS, '+')
             if self.current_char == '-':
                 self.advance()
                 return Token(MINUS, '-')
             if self.current_char == '*':
                 self.advance()
                 return Token(MUL, '*')
             if self.current_char == '/':
                 self.advance()
                 return Token(REAL_DIV, '/')
             if self.current_char == '(':
                 self.advance()
                 return Token(LPAREN, '(')
             if self.current_char == ')':
                 self.advance()
                 return Token(RPAREN, ')')
             self.error()
         return Token(EOF, None)

更新语法解析器

现在来看看语法解析器的变化。

下面是它的变化总结:

  1. 新的AST节点:ProgramBlockVarDeclarationTypeSpec
  2. BNF中新规则对应的方法:blockdeclatationsvariable_declarationtype_spec
  3. 更新现有的方法:programtermfactor

让我们一个一个的过一下:

  1. 我们首先从新的AST节点开始。新的节点有四个:

    • Program 节点,其代表一个程序,是我们的根节点:

      python 复制代码
        class Program(NoOp):
        	
        	    def __init__(
        	            self,
        	            program_node: AST,
        	            block_node: AST
        	    ):
        	        super().__init__()
        	        self.program_node: AST = program_node
        	        self.block_node: AST = block_node
          ```
    • Block 节点,其保存有declarationscompound_statement :

      python 复制代码
      class Block(NoOp):
      
          def __init__(
                  self,
                  declarations: Iterable[AST],
                  compound_statement: AST
          ):
              super().__init__()
              self.declarations: Iterable[AST] = declarations
              self.compound_statement: AST = compound_statement
    • VarDeclaration 节点,其代表一个变量声明。保存有变量节点和类型节点:

      python 复制代码
        class VarDeclaration(NoOp):
        
            def __init__(
                    self,
                    vars: Iterable[AST],
                    type_spec_node: AST
            ):
                super().__init__()
                self.vars: Iterable[AST] = vars
                self.type_spec_node: AST = type_spec_node
    • TypeSpec 节点,其代表变量的类型(INTEGER、REAL):

      python 复制代码
      class TypeSpec(AST):
      
          def __init__(
                  self,
                  type_spec: Token
          ):
              super().__init__(type_spec)
              self.type_spec: Token = type_spec
  2. 你或许还记得,BNF中的每一个规则都要映射为语法解析器中的一个方法。今天,我们添加了四个新方法:blockdeclarationsvariable_declarationtype_spec。这些方法负责解析新的语言结构,以及构建AST节点:

    python 复制代码
    def block(self) -> AST:
        """
        block:
            declarations compound_statement
        """
        return Block(
            declarations=list([declaration for declaration in self.declarations()]),
            compound_statement=self.compound_statement()
        )
    
    def declarations(self) -> Iterable[AST]:
        """
        declarations:
            empty |
            VAR (variable_declaration SEMI)+
        """
        if self.current_token.type == VAR:
            self.eat(VAR)
            while True:
                yield self.variable_declaration()
                self.eat(SEMI)
                if self.current_token.type != ID: break
        else:
            yield self.empty()
    
    def variable_declaration(self) -> AST:
        """
        variable_declaration:
            variable (COMMA variable)* COLON type_spec
        """
        vars: List[AST] = [self.variable()]
        while self.current_token.type == COMMA:
            self.eat(COMMA)
            vars.append(self.variable())
        self.eat(COLON)
        type_spec_node: AST = self.type_spec()
        return VarDeclaration(
            vars=vars,
            type_spec_node=type_spec_node
        )
    
    def type_spec(self) -> AST:
        """
        type_spec:
            INTEGER |
            REAL
        """
        type_spec: Token = self.current_token
        if self.current_token.type == INTEGER:
            self.eat(INTEGER)
        else:
            self.eat(REAL)
        return TypeSpec(type_spec)
  3. 我们也需要去更新programtermfactor方法,以适应语法规则的变化:

    python 复制代码
    def program(self) -> AST:
        """
        program:
            PROGRAM variable SEMI block DOT
        """
        self.eat(PROGRAM)
        program_node: AST = self.variable()
        self.eat(SEMI)
        block_node: AST = self.block()
        self.eat(DOT)
        return Program(
            program_node=program_node,
            block_node=block_node
        )
    def term(self) -> AST:
        """
        term:
            factor ((MUL|DIV|REAL_DIV)factor)*
        """
        node: AST = self.factor()
        while self.current_token.type in (MUL, DIV, REAL_DIV):
            token: Token = self.current_token
            if token.type == MUL:
                self.eat(MUL)
            elif token.type == DIV:
                self.eat(DIV)
            else:
                self.eat(REAL_DIV)
            node = BinaryOp(
                left=node,
                op=token,
                right=self.factor()
            )
        return node
    
    def factor(self) -> AST:
        """
        factor:
            PLUS factor |
            MINUS factor |
            INTEGER_CONST |
            REAL_CONST |
            LPAREN expr RPAREN |
            variable
        """
        token: Token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            return UnaryOp(token, self.factor())
        if token.type == MINUS:
            self.eat(MINUS)
            return UnaryOp(token, self.factor())
        if token.type == INTEGER_CONST:
            self.eat(INTEGER_CONST)
            return Num(token)
        if token.type == REAL_CONST:
            self.eat(REAL_CONST)
            return Num(token)
        if token.type == LPAREN:
            self.eat(LPAREN)
            expr: AST = self.expr()
            self.eat(RPAREN)
            return expr
        return self.variable()
    	

更新解释器

我们已经做完了对词法解析器和语法解析器的改变。剩下的内容就是向Interpreter类中添加对应的访问方法。为了能够访问新的节点,需要添加四个新方法:

  • visit_Program
  • visit_Block
  • visit_VarDeclatation
  • visit_TypeSepc

这些方法都相当直接,你会看到visit_VarDeclatationvisit_TypeSepc 方法中啥也没做:

python 复制代码
def visit_Program(
        self,
        node: Program
) -> None:
    self.visit(node.block_node)

def visit_Block(
        self,
        node: Block
) -> None:
    for declaration in node.declarations:
        self.visit(declaration)
    self.visit(node.compound_statement)

def visit_VarDeclaration(
        self,
        node: VarDeclaration
) -> None: pass
def visit_TypeSpec(
        self,
        node: TypeSpec
) -> None: pass

我们也需要去更新visit_BinaryOp 方法,使其能够正确处理整型和浮点型除法:

python 复制代码
def visit_BinaryOp(
        self,
        node: BinaryOp
) -> int | float:
    if node.op.type == PLUS:
        return self.visit(node.left) + self.visit(node.right)
    if node.op.type == MINUS:
        return self.visit(node.left) - self.visit(node.right)
    if node.op.type == MUL:
        return self.visit(node.left) * self.visit(node.right)
    if node.op.type == DIV:
        return self.visit(node.left) // self.visit(node.right)
    return self.visit(node.left) / self.visit(node.right)

完整代码如下:

python 复制代码
import abc
from typing import Dict, Iterable, List, Callable

PROGRAM, SEMI, DOT, VAR, COMMA, COLON, INTEGER, REAL, BEGIN, END, ASSIGN, PLUS, MINUS, MUL, DIV, REAL_DIV, INTEGER_CONST, REAL_CONST, LPAREN, RPAREN, ID, EOF = (
    'PROGRAM',
    'SEMI',
    'DOT',
    'VAR',
    'COMMA',
    'COLON',
    'INTEGER',
    'REAL',
    'BEGIN',
    'END',
    'ASSIGN',
    'PLUS',
    'MINUS',
    'MUL',
    'DIV',
    'REAL_DIV',
    'INTEGER_CONST',
    'REAL_CONST',
    'LPAREN',
    'RPAREN',
    'ID',
    'EOF'
)

class Token:

    def __init__(
            self,
            type: str,
            value: int | float | str | None
    ):
        self.type = type
        self.value = value

    def __str__(self) -> str:
        return 'Token(type={type}), value={value}'.format(
            type=self.type,
            value=self.value
        )

    def __repr__(self) -> str: return self.__str__()

class Lexer:

    RESERVED_KEYWORDS: Dict[str, Token] = {
        PROGRAM: Token(PROGRAM, PROGRAM),
        VAR: Token(VAR, VAR),
        INTEGER: Token(INTEGER, INTEGER),
        REAL: Token(REAL, REAL),
        BEGIN: Token(BEGIN, BEGIN),
        END: Token(END, END),
        DIV: Token(DIV, DIV)
    }

    def __init__(
            self,
            text: str
    ):
        self.text: str = text
        self.pos: int = 0
        self.current_char: str | None = self.text[self.pos]

    def error(self) -> None:
        raise Exception('非法字符')

    def advance(self) -> None:
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None
        else:
            self.current_char = self.text[self.pos]

    def peek(self) -> str | None:
        pos = self.pos + 1
        if pos > len(self.text) - 1:
            return None
        return self.text[pos]

    def skip_whitespace(self) -> None:
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def skip_comment(self) -> None:
        while self.current_char != '}':
            self.advance()
        self.advance() # 右花括号,表示注释结束

    def number(self) -> Token:
        """
        返回text输入中(多数位)整数或小数
        """
        result: str = '0'
        while self.current_char is not None and self.current_char.isdigit():
            result = f'{result}{self.current_char}'
            self.advance()

        if self.current_char == '.':
            result = f'{result}{self.current_char}'
            self.advance()
            while self.current_char is not None and self.current_char.isdigit():
                result = f'{result}{self.current_char}'
                self.advance()
            return Token(REAL_CONST, float(result))
        return Token(INTEGER_CONST, int(result))

    def __id(self) -> Token:
        name: str = ''
        while self.current_char is not None and self.current_char.isalnum():
            name = f'{name}{self.current_char}'
            self.advance()
        return self.RESERVED_KEYWORDS.get(name, Token(ID, name))

    def get_next_token(self) -> Token:
        while self.current_char is not None:
            if self.current_char.isspace():
                self.skip_whitespace()
                continue
            if self.current_char == '{':
                self.skip_comment()
                continue
            if self.current_char.isalpha():
                return self.__id()
            if self.current_char.isdigit():
                return self.number()
            if self.current_char == ';':
                self.advance()
                return Token(SEMI, ';')
            if self.current_char == '.':
                self.advance()
                return Token(DOT, '.')
            if self.current_char == ',':
                self.advance()
                return Token(COMMA, ',')
            if self.current_char == ':':
                if self.peek() == '=':
                    self.advance()
                    self.advance()
                    return Token(ASSIGN, ':=')
                self.advance()
                return Token(COLON, ':')
            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')
            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')
            if self.current_char == '*':
                self.advance()
                return Token(MUL, '*')
            if self.current_char == '/':
                self.advance()
                return Token(REAL_DIV, '/')
            if self.current_char == '(':
                self.advance()
                return Token(LPAREN, '(')
            if self.current_char == ')':
                self.advance()
                return Token(RPAREN, ')')
            self.error()
        return Token(EOF, None)

class AST(abc.ABC):

    def __init__(
            self,
            token: Token | None
    ):
        self.token: Token = token

class NoOp(AST):

    def __init__(self):
        super().__init__(None)

class Program(NoOp):

    def __init__(
            self,
            program_node: AST,
            block_node: AST
    ):
        super().__init__()
        self.program_node: AST = program_node
        self.block_node: AST = block_node

class Block(NoOp):

    def __init__(
            self,
            declarations: Iterable[AST],
            compound_statement: AST
    ):
        super().__init__()
        self.declarations: Iterable[AST] = declarations
        self.compound_statement: AST = compound_statement

class VarDeclaration(NoOp):

    def __init__(
            self,
            vars: Iterable[AST],
            type_spec_node: AST
    ):
        super().__init__()
        self.vars: Iterable[AST] = vars
        self.type_spec_node: AST = type_spec_node

class Var(AST):

    def __init__(
            self,
            var: Token
    ):
        super().__init__(var)
        self.var: Token = var
        self.value: str = self.var.value

class TypeSpec(AST):

    def __init__(
            self,
            type_spec: Token
    ):
        super().__init__(type_spec)
        self.type_spec: Token = type_spec

class Compound(NoOp):

    def __init__(self):
        super().__init__()
        self.children: List[AST] = []

class BinaryOp(AST):

    def __init__(
            self,
            left: AST,
            op: Token,
            right: AST
    ):
        super().__init__(op)
        self.left: AST = left
        self.op: Token = op
        self.right: AST = right

class UnaryOp(AST):

    def __init__(
            self,
            op: Token,
            operand: AST
    ):
        super().__init__(op)
        self.op: Token = op
        self.operand: AST = operand

class Assign(BinaryOp): pass

class Num(AST):

    def __init__(
            self,
            num: Token
    ):
        super().__init__(num)
        self.num: Token = num
        self.value: int | float = self.num.value

class Parser:

    def __init__(
            self,
            lexer: Lexer
    ):
        self.lexer: Lexer = lexer
        self.current_token: Token = self.lexer.get_next_token()

    def error(self) -> None:
        raise Exception('不合法的语法规则')

    def eat(
            self,
            token_type: str
    ):
        if self.current_token.type == token_type:
            self.current_token = self.lexer.get_next_token()
        else:
            self.error()

    def program(self) -> AST:
        """
        program:
            PROGRAM variable SEMI block DOT
        """
        self.eat(PROGRAM)
        program_node: AST = self.variable()
        self.eat(SEMI)
        block_node: AST = self.block()
        self.eat(DOT)
        return Program(
            program_node=program_node,
            block_node=block_node
        )

    def block(self) -> AST:
        """
        block:
            declarations compound_statement
        """
        return Block(
            declarations=list([declaration for declaration in self.declarations()]),
            compound_statement=self.compound_statement()
        )

    def declarations(self) -> Iterable[AST]:
        """
        declarations:
            empty |
            VAR (variable_declaration SEMI)+
        """
        if self.current_token.type == VAR:
            self.eat(VAR)
            while True:
                yield self.variable_declaration()
                self.eat(SEMI)
                if self.current_token.type != ID: break
        else:
            yield self.empty()

    def variable_declaration(self) -> AST:
        """
        variable_declaration:
            variable (COMMA variable)* COLON type_spec
        """
        vars: List[AST] = [self.variable()]
        while self.current_token.type == COMMA:
            self.eat(COMMA)
            vars.append(self.variable())
        self.eat(COLON)
        type_spec_node: AST = self.type_spec()
        return VarDeclaration(
            vars=vars,
            type_spec_node=type_spec_node
        )

    def type_spec(self) -> AST:
        """
        type_spec:
            INTEGER |
            REAL
        """
        type_spec: Token = self.current_token
        if self.current_token.type == INTEGER:
            self.eat(INTEGER)
        else:
            self.eat(REAL)
        return TypeSpec(type_spec)

    def compound_statement(self) -> AST:
        """
        compound_statement:
            BEGIN statement_list END
        """
        node: Compound = Compound()
        self.eat(BEGIN)
        statement_list: List[AST] = list([statement for statement in self.statement_list()])
        self.eat(END)
        node.children = statement_list
        return node

    def statement_list(self) -> Iterable[AST]:
        """
        statement_list:
            statement |
            statement SEMI statement_list
        """
        yield self.statement()
        if self.current_token.type == SEMI:
            self.eat(SEMI)
            yield from self.statement_list()

    def statement(self) -> AST:
        """
        statement:
            empty |
            compound_statement |
            assign_statement
        """
        if self.current_token.type == BEGIN:
            return self.compound_statement()
        if self.current_token.type == ID:
            return self.assign_statement()
        return self.empty()

    def assign_statement(self) -> AST:
        """
        assign_statement:
            variable ASSIGN expr
        """
        var: AST = self.variable()
        op: Token = self.current_token
        self.eat(ASSIGN)
        expr: AST = self.expr()
        return Assign(
            left=var,
            op=op,
            right=expr
        )

    def empty(self) -> AST: return NoOp()

    def expr(self) -> AST:
        """
        expr:
            term ((PLUS|MINUS)term)*
        """
        node: AST = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token: Token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
            else:
                self.eat(MINUS)
            node = BinaryOp(
                left=node,
                op=token,
                right=self.term()
            )
        return node

    def term(self) -> AST:
        """
        term:
            factor ((MUL|DIV|REAL_DIV)factor)*
        """
        node: AST = self.factor()
        while self.current_token.type in (MUL, DIV, REAL_DIV):
            token: Token = self.current_token
            if token.type == MUL:
                self.eat(MUL)
            elif token.type == DIV:
                self.eat(DIV)
            else:
                self.eat(REAL_DIV)
            node = BinaryOp(
                left=node,
                op=token,
                right=self.factor()
            )
        return node

    def factor(self) -> AST:
        """
        factor:
            PLUS factor |
            MINUS factor |
            INTEGER_CONST |
            REAL_CONST |
            LPAREN expr RPAREN |
            variable
        """
        token: Token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            return UnaryOp(token, self.factor())
        if token.type == MINUS:
            self.eat(MINUS)
            return UnaryOp(token, self.factor())
        if token.type == INTEGER_CONST:
            self.eat(INTEGER_CONST)
            return Num(token)
        if token.type == REAL_CONST:
            self.eat(REAL_CONST)
            return Num(token)
        if token.type == LPAREN:
            self.eat(LPAREN)
            expr: AST = self.expr()
            self.eat(RPAREN)
            return expr
        return self.variable()

    def variable(self) -> AST:
        """
        variable: ID
        """
        var: Token = self.current_token
        self.eat(ID)
        return Var(var)

    def parse(self) -> AST:
        tree: AST = self.program()
        assert self.current_token.type == EOF
        return tree

class NodeVisitor:

    def visit(
            self,
            node: AST
    ) -> int | float | None:
        name: str = f'visit_{type(node).__name__}'
        visitor: Callable[[AST], int | float | None] = getattr(self, name, self.visit_generic)
        return visitor(node)

    def visit_generic(
            self,
            node: AST
    ) -> None:
        raise Exception(f'未找到方法:visit_{type(node).__name__}')

class Interpreter(NodeVisitor):

    def __init__(
            self,
            parser: Parser
    ):
        self.parser: Parser = parser
        self.GLOBAL_SCOPE: Dict[str, int | float | str] = {}

    def visit_NoOp(
            self,
            node: NoOp
    ) -> None: pass

    def visit_Program(
            self,
            node: Program
    ) -> None:
        self.visit(node.block_node)

    def visit_Block(
            self,
            node: Block
    ) -> None:
        for declaration in node.declarations:
            self.visit(declaration)
        self.visit(node.compound_statement)

    def visit_VarDeclaration(
            self,
            node: VarDeclaration
    ) -> None: pass

    def visit_Var(
            self,
            node: Var
    ) -> int | float | str:
        return self.GLOBAL_SCOPE[node.value]

    def visit_TypeSpec(
            self,
            node: TypeSpec
    ) -> None: pass

    def visit_Compound(
            self,
            node: Compound
    ) -> None:
        for child in node.children:
            self.visit(child)

    def visit_BinaryOp(
            self,
            node: BinaryOp
    ) -> int | float:
        if node.op.type == PLUS:
            return self.visit(node.left) + self.visit(node.right)
        if node.op.type == MINUS:
            return self.visit(node.left) - self.visit(node.right)
        if node.op.type == MUL:
            return self.visit(node.left) * self.visit(node.right)
        if node.op.type == DIV:
            return self.visit(node.left) // self.visit(node.right)
        return self.visit(node.left) / self.visit(node.right)

    def visit_UnaryOp(
            self,
            node: UnaryOp
    ) -> int | float:
        if node.op.type == PLUS:
            return self.visit(node.operand)
        return -self.visit(node.operand)

    def visit_Assign(
            self,
            node: Assign
    ) -> None:
        name: str = node.left.token.value
        self.GLOBAL_SCOPE[name] = self.visit(node.right)

    def visit_Num(
            self,
            node: Num
    ) -> int | float:
        return node.value

    def interpret(self) -> None:
        tree: AST = self.parser.parse()
        self.visit(tree)

text: str = """
PROGRAM Part10;
VAR
   number     : INTEGER;
   a, b, c, x : INTEGER;
   y          : REAL;

BEGIN {Part10}
   BEGIN
      number := 2;
      a := number;
      b := 10 * a + 10 * number DIV 4;
      c := a - - b
   END;
   x := 11;
   y := 20 / 7 + 3.14;
   y := y + 2
   { writeln('a = ', a); }
   { writeln('b = ', b); }
   { writeln('c = ', c); }
   { writeln('number = ', number); }
   { writeln('x = ', x); }
   { writeln('y = ', y); }
END.  {Part10}
"""

lexer: Lexer = Lexer(text)
parser: Parser = Parser(lexer)
interpreter = Interpreter(parser)
interpreter.interpret()
print(interpreter.GLOBAL_SCOPE)
相关推荐
Zero_to_zero12342 小时前
Claude code系列(一):claude安装、入门及基础操作指令
人工智能·python
Yeats_Liao2 小时前
异步推理架构:CPU-NPU流水线设计与并发效率提升
python·深度学习·神经网络·架构·开源
hnxaoli2 小时前
统信小程序(八)归档目录自动调整
linux·python
喵手2 小时前
Python爬虫实战:把“菜鸟教程”的知识树连根拔起(递归/遍历实战)(附 CSV 导出)!
爬虫·python·爬虫实战·python爬虫工程化实战·零基础python爬虫教学·菜鸟教程数据采集·采集菜鸟教程于csv
七夜zippoe3 小时前
gRPC高性能RPC框架实战:从Protocol Buffers到流式传输的完整指南
网络·python·网络协议·rpc·protocol
claem3 小时前
Mac端 Python脚本创建与理解
开发语言·python·macos
lixzest3 小时前
目标检测算法应用工程师 面试高频题 + 标准答案
python·yolo·目标检测·计算机视觉
癫狂的兔子3 小时前
【BUG】【Python】【Spider】Compound class names are not allowed.
开发语言·python·bug