使用 TypeScript 从零搭建自己的 Web 框架：领域特定语言(DSL) 与 Prisma 模型

使用 TypeScript 从零搭建自己的 Web 框架：领域特定语言(DSL) 与 Prisma 模型

DSL（领域特定语言）是一种针对特定应用领域设计的编程语言或标记语言，旨在提供比通用编程语言更简洁、更直观的语法和工具集，以便开发者能够更高效地完成特定任务。DSL 的出现大大简化了复杂领域的工作流程，提高了开发效率，降低了出错率。

以 Prisma 模型为例，它是一种用于定义和操作数据库模式的 DSL。Prisma 模型提供了一种直观、简洁的方式来描述数据库中的实体、关系以及它们之间的约束。通过定义 Prisma 模型，开发者可以清晰地表达数据库的结构和规则，而无需直接编写繁琐的 SQL 语句或 ORM 配置。

这是一个定义 Prisma Schema 的例子:

kotlin 复制代码

model User {
  id      Int      @id @default(autoincrement())
  email   String   @unique
  name    String?
  role    Role     @default(USER)
  posts   Post[]
  profile Profile?
}

创建一个 DSL 的步骤如下:

lua 复制代码

+----------------+    +----------------+    +----------------+
| 定义DSL语法    | --> | 编写解析器      | --> | 实现解释器/编译器 |
+----------------+    +----------------+    +----------------+
        ^                     |                      |
        |                     V                      V
        |                +----------------+    +----------------+
        |                | 生成AST（抽象  | --> | 生成目标代码   |
        |                | 语法树）        |    +----------------+
        |                +----------------+
        |
        +-----------------------------------------------------+

那么如何为上面的 Prisma 模型创建一个解析器呢？

解析器原理

解析器主要涉及到词法分析和语法分析两个关键步骤。

词法分析（也称为扫描器或词法器）根据语言的词法规则，从左到右扫描源程序的字符，将其分割成一个个 Token，并为每个 Token 附上相应的属性（如 Token 的类型、值等）。

语法分析，这一步使用语法规则集合（通常以文法的形式表示）来分析 Token 序列，确定它们之间的关系和结构。语法分析器会检查 Token 序列是否符合预定义的语法规则，从而验证输入数据是否满足特定的语法要求。

这是一个 TypeScript 实现词法分析器的例子:

typescript 复制代码

type TokenType = 'INTEGER' | 'PLUS' | 'MINUS' | 'EOF';

interface Token {
  type: TokenType;
  value?: string | number;
}

class Lexer {
  private text: string;
  private position: number = 0;
  private currentChar: string = '';

  constructor(text: string) {
    this.text = text;
    this.advance();
  }

  private advance() {
    this.position++;
    if (this.position > this.text.length - 1) {
      this.currentChar = 'EOF';
    } else {
      this.currentChar = this.text[this.position];
    }
  }

  private isDigit(char: string): boolean {
    return /[0-9]/.test(char);
  }

  private isWhitespace(char: string): boolean {
    return /\s/.test(char);
  }

  nextToken(): Token {
    while (this.currentChar !== 'EOF') {
      // 忽略空白字符
      if (this.isWhitespace(this.currentChar)) {
        this.advance();
        continue;
      }

      if (this.isDigit(this.currentChar)) {
        let value = '';
        // 收集整数数字
        while (this.isDigit(this.currentChar)) {
          value += this.currentChar;
          this.advance();
        }
        return { type: 'INTEGER', value: parseInt(value, 10) };
      }

      if (this.currentChar === '+') {
        this.advance();
        return { type: 'PLUS' };
      }

      if (this.currentChar === '-') {
        this.advance();
        return { type: 'MINUS' };
      }

      // 如果当前字符不是数字、加号或减号，则抛出错误
      throw new Error(`Unexpected character: ${this.currentChar}`);
    }

    return { type: 'EOF' };
  }
}

const lexer = new Lexer('100 + 200 - 300');
let token: Token;
while ((token = lexer.nextToken()).type !== 'EOF') {
  console.log(token);
}

nextToken 方法是词法分析器的核心，它不断读取字符并根据词法规则返回相应的记号。当遇到数字时，它会收集所有连续的数字字符并返回一个 INTEGER 类型的记号；当遇到加号或减号时，它会返回相应的记号；当遇到其他字符时，它会抛出一个错误。循环不断调用 nextToken 方法，直到返回 EOF 记号为止。

解析器生成器

手动编写解析器是一个复杂且漫长的过程，在实际开发中一般使用解析器生成器来创建解析器。peg.js 是一种基于 Parsing Expression Grammar（解析表达式语法）的解析器生成器，PEG 类似于正则表达式和巴科斯范式 BNF，我们使用它来生成一个 Prisma 模型解析器。

为了简单起见我们简化模型如下:

arduino 复制代码

model User {
  id      Int
  email   String
  name    String?
}

定义模型语法

makefile 复制代码

// prisma.pegjs
start
  = model EOF


// 这一行的意思是 model 由 "model"字符 + 空白字符(至少一个或多个) + "{" 字符 + 空白字符(零个或多个) + fields + 空白字符(零个或多个) + "}" 组成
model
  = "model" space+ identifier space* "{" space* fields space* "}"

// fields 由 field(至少一个或多个) 组成
fields
  = field+

// field 由 空白字符(零个或多个) + identifier + 空白字符(零个或多个) + fieldType (空白字符(零个或多个) + "?"字符)(可以为空) + 空白字符(零个或多个) 组成
field
  = space* identifier space+ fieldType (space* "?")? space*

// identifier 由小写字母或大写字母开头 小写字母或大写字母+数字和下划线（零个或多个）组成
identifier
  = [a-zA-Z_] [a-zA-Z0-9_]*

// fieldType 由 "Int"字符 或 "String"字符 或 identifier 组成
fieldType
  = "Int" / "String" / identifier

// stringLiteral 由 " 开始 + 中间不为 " 和 any 的字符（零个或多个）+ " 结尾 组成
stringLiteral
  = "\"" (!"\"" any)* "\""

// space 由 空格 或 \t 或 \n 或 \r 组成
space
  = " " / "\t" / "\n" / "\r"

// 不为换行符的任意字符
any
  = .

EOF
  = !.

生成解析器

npx pegjs parser.pegjs
使用解析器

javascript 复制代码

// index.js
const parser = require('./parser');

const input = `model User {
  id      Int
  email   String
  name    String?
}`;

const ast = parser.parse(input);
console.log(JSON.stringify(ast, null, 2));

运行

less 复制代码

node index.js

// 输出 AST
[  [    "model",    [      " "    ],
    [      "U",      [        "s",        "e",        "r"      ]
    ],
    [      " "    ],
    "{",
    [      "\n",      " ",      " "    ],
    [      [        [],
        [          "i",          [            "d"          ]
        ],
        [          " ",          " ",          " ",          " ",          " ",          " "        ],
        "Int",
        null,
        [          " ",          "\n",          " ",          " "        ]
      ],
      [        [],
        [          "e",          [            "m",            "a",            "i",            "l"          ]
        ],
        [          " ",          " ",          " "        ],
        "String",
        null,
        [          "\n",          " ",          " "        ]
      ],
      [        [],
        [          "n",          [            "a",            "m",            "e"          ]
        ],
        [          " ",          " ",          " ",          " "        ],
        "String",
        [          [],
          "?"
        ],
        [          " ",          "\n"        ]
      ]
    ],
    [],
    "}"
  ],
  null
]

有了 AST（抽象语法树）之后我们就可以根据 AST 生成 SQL 和 DAO 层代码

总结

DSL（领域特定语言）允许我们以更直观、更贴近业务逻辑的方式表达和解决复杂问题。DSL 能够简化复杂系统的构建和维护过程，提高开发效率，同时减少错误和误解。通过 DSL，我们可以创建出针对特定领域的强大工具，从而推动业务创新和提升软件系统的整体质量。