第四章 - 读取TypeScript抽象语法树并提取有用数据

学习如何通过编程方式读取TypeScript抽象语法树并提取有用数据

这是最激动人心的章节之一。我们将学习如何通过编程方式读取TypeScript代码，并了解能简化这一过程的实用工具。

通过编程方式读取代码是实现自动化代码处理的重要手段。举个例子：假设我们需要分析代码库中所有使用function fn() {}声明的函数，以便后续将它们自动转换为箭头函数const fn = () => {}。这时就需要先读取每个文件，遍历代码的抽象语法树，找到所有函数声明。

这只是一个例子。下一章我们会探讨更多实际应用场景。现在让我们先了解如何开始这个过程。

直接使用编译器API

TypeScript编译器API提供了我们需要的一切工具。让我们从以下代码片段开始：

sql 复制代码

const add = (first: number, second: number): number => {  return first + second}
export function sum(...numbers: number[]): number {  return numbers.reduce(add, 0)}

这段简单代码包含两个函数：一个是导出的sum函数，另一个是模块内部使用的未导出函数。

我们第一个练习的目标是：

统计文件中声明的函数总数
区分统计导出函数和未导出函数的数量

将这段代码粘贴到上一章创建的fun.ts文件中，我们将用它作为练习素材。

在index.mjs文件中（执行npm run compile时会运行这个文件），我们将编写遍历代码和统计函数的逻辑。文件开头保留创建程序的代码：

php 复制代码

import ts from "typescript"
const program = ts.createProgram(["fun.ts"], {  module: ts.ModuleKind.ESNext,  noImplicitAny: true})

接下来我们要检查fun.ts的内容。为了避免硬编码文件名（实际项目中文件名通常不确定），我们使用getRootFileNames()方法获取程序中的所有根文件：

javascript 复制代码

const rootFileNames = program.getRootFileNames()
console.log(`共有 ${rootFileNames.length} 个根文件`)

这会打印程序中根文件的数量。当前应该显示1，因为只有fun.ts一个文件。

然后我们遍历所有根文件名，用getSourceFile()获取文件内容：

arduino 复制代码

for (const rootFileName of rootFileNames) {  const sourceFile = program.getSourceFile(rootFileName)
  if (sourceFile) {    console.log("正在检查源文件:", sourceFile.fileName)    // ...  }}

运行npm run compile，应该能看到以下输出：

kotlin 复制代码

共有 1 个根文件
正在检查源文件: fun.ts

进展顺利！🎉

现在我们可以开始遍历源文件的所有AST节点了。使用sourceFile上的forEachChild辅助方法，它能遍历文件中的所有顶层节点。在if (sourceFile) {...}语句中添加：

javascript 复制代码

sourceFile.forEachChild((childNode) => {
  console.log(childNode.kind)
})

forEachChild方法提供文件中每个顶层AST节点，我们打印每个节点的kind属性。"Kind"是几乎所有AST节点都有的属性，对应一个描述节点"类型"的枚举值。这里使用kind而非type是为了避免术语冲突。

现在文件内容应该如下：

javascript 复制代码

import ts from "typescript"
const program = ts.createProgram(["fun.ts"], {  module: ts.ModuleKind.ESNext,  noImplicitAny: true})
const rootFileNames = program.getRootFileNames()console.log(`共有 ${rootFileNames.length} 个根文件`)

for (const rootFileName of rootFileNames) {  const sourceFile = program.getSourceFile(rootFileName)
  if (sourceFile) {    console.log("正在检查源文件:", sourceFile.fileName)
    sourceFile.forEachChild((childNode) => {      console.log(childNode.kind)    })  }}

运行后会看到：

kotlin 复制代码

共有 1 个根文件
正在检查源文件: fun.ts
259
240
1

kind值对应SyntaxKind枚举。以下是相关枚举值：

ini 复制代码

enum SyntaxKind {
  // ...
  EndOfFileToken = 1,
  VariableStatement = 240,
  FunctionDeclaration = 259
  // ...
}

根据枚举值，输出中的259对应FunctionDeclaration（文件中的第一个函数），240对应VariableStatement（变量声明），1对应文件结束标记，这些都是正确的。

注意：虽然第二个函数是箭头函数，但TypeScript首先将其视为VariableStatement节点，稍后我们会看到其中包含箭头函数的详细信息。

现在回到最初的目标：统计所有函数数量及导出函数数量。我们继续深入检查节点，将console.log(childNode.kind);替换为：

less 复制代码

if (ts.isFunctionDeclaration(childNode)) {  // ...}
if (ts.isVariableStatement(childNode)) {  // ...}

源文件中可能有多种节点，但我们只关心函数和箭头函数这两种情况。使用ts.isFunctionDeclaration检查函数节点，ts.isVariableStatement检查变量节点。

添加两个计数变量：

ini 复制代码

let functionsCount = 0
let exportedFunctionsCount = 0

这些变量可以放在for循环的开头。如果是FunctionDeclaration节点，functionsCount自动加1，然后根据是否导出决定是否增加exportedFunctionsCount：

javascript 复制代码

if (ts.isFunctionDeclaration(childNode)) {  functionsCount++
  const hasExportSpecifier = childNode.modifiers?.find((modifier) => modifier.kind === ts.SyntaxKind.ExportKeyword)  if (hasExportSpecifier) {    exportedFunctionsCount++  }}

检查函数是否导出时，我们查看节点的modifiers属性。这个属性是包含函数修饰符的数组，可能包含async、abstract、public等修饰符。我们特别关注export修饰符，所以检查修饰符数组中是否有ExportKeyword。

接下来处理伪装成变量声明的箭头函数。在if (ts.isVariableStatement(childNode)) {}语句中添加：

typescript 复制代码

if (ts.isVariableStatement(childNode)) {
  childNode.declarationList.forEachChild((node) => {
    if (ts.isVariableDeclaration(node) && ts.isArrowFunction(node.initializer)) {
      functionsCount++
    }
  })
}

变量可以包含多种内容，所以需要检查变量类型。我们检查声明列表中的每个节点，确认是变量声明且初始值是箭头函数时，增加functionsCount。还需要检查是否导出：

typescript 复制代码

const hasExportSpecifier = childNode.modifiers?.find((modifier) => modifier.kind === ts.SyntaxKind.ExportKeyword)
if (hasExportSpecifier) {
  exportedFunctionsCount++
}

这与检查函数声明是否导出的代码相同，只是操作对象换成了变量声明节点。

注意：暂时不必担心代码重复问题，后续可以封装成通用函数。出于教学目的，保持代码直观更重要。

最终代码应如下：

typescript 复制代码

import ts from "typescript";

const program = ts.createProgram(["fun.ts"], {
  module: ts.ModuleKind.ESNext,
  noImplicitAny: true,
});

const rootFileNames = program.getRootFileNames();
console.log(`共有 ${rootFileNames.length} 个根文件`);

for (const rootFileName of rootFileNames) {
  const sourceFile = program.getSourceFile(rootFileName);

  if (sourceFile) {
    let functionsCount = 0;
    let exportedFunctionsCount = 0;

    console.log("正在检查源文件:", sourceFile.fileName);

    sourceFile.forEachChild((childNode) => {
      if (ts.isFunctionDeclaration(childNode)) {
        functionsCount++;

        const hasExportSpecifier = childNode.modifiers?.find(
          (modifier) => modifier.kind === ts.SyntaxKind.ExportKeyword
        );

        if (hasExportSpecifier) {
          exportedFunctionsCount++;
        }
      }

      if (ts.isVariableStatement(childNode)) {
        childNode.declarationList.forEachChild((node) => {
          if (
            ts.isVariableDeclaration(node) &&
            ts.isArrowFunction(node.initializer)
          ) {
            functionsCount++;

            const hasExportSpecifier = childNode.modifiers?.find(
              (modifier) => modifier.kind === ts.SyntaxKind.ExportKeyword
            );

            if (hasExportSpecifier) {
              exportedFunctionsCount++;
            }
          }
        });
      }
    });

    console.log("函数总数:", functionsCount);
    console.log("导出函数数:", exportedFunctionsCount);
  }
}

运行后输出：

typescript 复制代码

共有 1 个根文件
正在检查源文件: fun.ts
函数总数: 2
导出函数数: 1

这说明我们成功读取了TypeScript AST并提取了所需信息！🎉

我们可以修改源文件，导出两个函数来测试：

typescript 复制代码

export const add = (first: number, second: number): number => {  return first + second}
export function sum(...numbers: number[]): number {  return numbers.reduce(add, 0)}

再次运行会看到：

typescript 复制代码

共有 1 个根文件
正在检查源文件: fun.ts
函数总数: 2
导出函数数: 2

完美！一切符合预期。

注意：当前方法只统计顶层声明的函数。要统计所有函数（包括代码块内声明的），需要深入遍历每个代码块。

现在你应该对如何遍历AST节点有了基本认识。ts命名空间下有数百个工具函数可用于检查节点类型。

使用AST查看器

你可能会想：有没有更直观的方式查看TypeScript代码的AST？确实有！推荐ts-ast-viewer.com，它支持选择TypeScript版本，可以同时查看AST结构、节点属性和编译器API的factory方法构造代码（第7章会详细介绍）。

编写代码分析程序时，AST查看器非常有用。我们可以粘贴示例代码，直观查看AST结构，从而更快地编写遍历代码。例如粘贴以下代码：

typescript 复制代码

export function sum(...numbers: number[]): number {
  return numbers.reduce(add, 0)
}

会看到如下AST结构：

typescript 复制代码

SourceFile
    FunctionDeclaration
    ExportKeyword
    Identifier
    Parameter
        DotDotDotToken
        Identifier
        ArrayType
            NumberKeyword
    NumberKeyword
    Block
        ReturnStatement
            CallExpression
                PropertyAccessExpression
                    Identifier
                    Identifier
                Identifier
                NumericLiteral
EndOfFileToken

这样我们就能立即知道需要先遍历根节点，然后访问函数声明和导出修饰符。在熟悉节点类型前，这能节省大量时间。

TSQuery工具

还有名为TSQuery的工具，它使用类似CSS选择器的语法来查询AST。相比原生API，它能用更少的代码提取AST数据。对于大型项目很有帮助，但也有缺点：CSS式选择器容易出错，且缺乏IDE智能提示支持。

为了完整起见，我们用TSQuery实现相同的功能（统计函数数量和导出函数数量）。在if (sourceFile) { }块中替换为：

typescript 复制代码

console.log("正在检查源文件:", sourceFile.fileName)
const sourceText = sourceFile.getFullText()
const ast = tsquery.ast(sourceText)

// 获取箭头函数和函数声明
const nodes = tsquery(ast, "ArrowFunction, FunctionDeclaration")
functionsCount = nodes.length

// 获取导出的函数声明
// 以及包含箭头函数的导出变量声明
const exportedNodes = tsquery(
  ast,
  `FunctionDeclaration:has(ExportKeyword), VariableStatement:has(ExportKeyword) ArrowFunction`
)
exportedFunctionsCount = exportedNodes.length

这段代码更简洁，但也更难理解。我们逐行分析：

首先通过sourceFile.getFullText()获取文件全文（这是编译器API的功能），然后通过tsquery.ast()获取AST。

第一个选择器ArrowFunction, FunctionDeclaration查找两种节点，结果数组的长度就是函数总数。

第二个选择器更复杂： FunctionDeclaration:has(ExportKeyword)查找带导出修饰符的函数声明 VariableStatement:has(ExportKeyword) ArrowFunction查找带导出修饰符且包含箭头函数的变量声明

结果与原生API实现一致，但选择器语法难以编写和调试。虽然有TSQuery Playground这样的工具辅助，但仍然具有挑战性。

本书不深入介绍TSQuery，只是展示基本用法。后续章节我们将继续使用原生编译器API。

本章总结

本章我们学到了很多！我们学习了如何使用TypeScript编译器API遍历抽象语法树，以及如何使用TSQuery库实现相同功能。我们通过两种方法完成了相同的统计练习。

这些练习可能很有趣，但你可能会问："为什么要统计文件中的函数数量？"这是个好问题。实际上有无数理由需要编写代码分析程序，这就是下一章的主题：读取AST的实际应用场景。