Vue3源码解析之 compiler（一）

本文为原创文章，未获授权禁止转载，侵权必究！

本篇是 Vue3 源码解析系列第 19 篇，关注专栏

前言

前面我们分别对 Vue 的 响应式、h 函数、render 函数、diff 算法 的讲解，接下来我们就来分析下 compiler 编译器。

编译器 是一个非常复杂的概念，在很多语言中均有涉及，不同类型的编译器在技术实现上都会有较大的差异。但对于我们而言，我们只需要有一个领域特定语言（DSL）的编译器即可。

DSL 并不具备很强的普适性，它是仅为某个使用的领域而设计的，但它也足以用于表示这个领域中的问题以及构建对应的解决方案。

那么，我们这里所谓的特定语言指的就是：把 template 模板 编译成 render 函数，这个就是 Vue 中 compiler 编译器 的作用。

下面我们通过案例，来看下 compiler 编译器 是如何执行的。

案例

首先引入 compile 函数，声明 template 模板，通过 compile 函数将模板编译成 render 函数且打印该结果。

html 复制代码

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
    <script src="../../../dist/vue.global.js"></script>
  </head>
  <body>
    <div id="app"></div>
    <script>
      const { compile } = Vue

      const template = `<div>hello world</div>`

      const renderFn = compile(template)

      console.log(renderFn)
    </script>
  </body>
</html>

Compiler 编译器

Vue 中主要通过 compile 方法来进行编译，它的作用就是将 template 转换为 render 函数，该方法定义在 packages/compiler-dom/src/index.ts 中：

ts 复制代码

export function compile(
  template: string,
  options: CompilerOptions = {}
): CodegenResult {
  return baseCompile(
    template,
    extend({}, parserOptions, options, {
      nodeTransforms: [
        // ignore <script> and <tag>
        // this is not put inside DOMNodeTransforms because that list is used
        // by compiler-ssr to generate vnode fallback branches
        ignoreSideEffectTags,
        ...DOMNodeTransforms,
        ...(options.nodeTransforms || [])
      ],
      directiveTransforms: extend(
        {},
        DOMDirectiveTransforms,
        options.directiveTransforms || {}
      ),
      transformHoist: __BROWSER__ ? null : stringifyStatic
    })
  )
}

可以看出该方法实际执行的是 baseCompile 函数， template 参数为传入的模板，根据案例，当前为 <div>hello world</div>，该函数定义在 packages/compiler-core/src/compile.ts 中：

ts 复制代码

export function baseCompile(
  template: string | RootNode,
  options: CompilerOptions = {}
): CodegenResult {
  // 省略

  const ast = isString(template) ? baseParse(template, options) : template
  
  // 省略

  transform(
    ast,
    extend({}, options, {
      prefixIdentifiers,
      nodeTransforms: [
        ...nodeTransforms,
        ...(options.nodeTransforms || []) // user transforms
      ],
      directiveTransforms: extend(
        {},
        directiveTransforms,
        options.directiveTransforms || {} // user transforms
      )
    })
  )

  return generate(
    ast,
    extend({}, options, {
      prefixIdentifiers
    })
  )
}

该函数大致分为三步：一是通过 parse 进行解析，得到 AST；二是通过 transform 方法对 AST 进行转换，得到 Javascript AST；三是通过 generate 方法根据 AST 生成 render 函数。

在 Vue 中，编译器的流程大致如下：

这里生成的 AST 抽象语法树，是一个用来描述模板的 js 对象：

md 复制代码

- type：对应一个 enum 类型的数据 NodeTypes，表示当前节点类型。
- children：表示子节点
- loc：loction 内容的位置
    1. start：开始位置
    2. end：结束位置
    3. source：原值
- 注意：type 类型的不同，属性值也会不同

有限状态机

AST 抽象语法树 的构建是一个非常复杂的过程，这里需要使用有限状态机（或叫做有限状态自动机）。

Vue 中 ast 主要通过 baseParse 方法，接收传入的 template，返回一个 ast 对象。即通过 parse 方法解析 template，得到 ast 对象，中间解析的过程就需要使用 有限状态自动机。

而对模板的解析过程包含了三个特性，这里可以参考阮一峰老师的这篇文章：

状态总数是有限的（每一个标签代表一个状态）
1. 初始状态
2. 标签开始状态（<）
3. 标签名称状态（div）
4. 文本状态（hello world）
5. 结束标签状态（</）
6. 结束标签名称状态（div）
7. ......
任一时刻，只处在一种状态之中
1. 比如 <div>hello world</div> 从左往右，要么是 开始标签状态，要么是 文本状态，要么是 结束状态，不可能没有状态。
某种条件下，会从一种状态转变到另一种状态
1. 一开始为开始状态，然后切换到文本状态或者其他状态。即 我们一定是在某一前提条件下，由某一状态切换到另一个状态。

上述三点阐述了 有限状态自动机 的含义，即 表示有限个状态以及在这些状态之间的转移和动作等行为的数学计算模型。

我们根据案例 <div>hello world</div> 来看下 有限状态自动机 解析过程：

解析 < ：由 初始状态 进入 标签开始状态
解析 div ：由 标签开始状态 进入 标签名称状态
解析 > ：由 标签名称状态 进入 初始状态
解析 hello world ：由 初始状态 进入 文本状态
解析 < ：由 文本状态 进入 标签开始状态
解析 / ：由 标签开始状态 进入 结束标签状态
解析 div ：由 结束标签状态 进入 结束标签名称状态
解析 > ：由 结束标签名称状态 进入 初始状态

经过以上一系列的解析，对于 <div>hello world</div> 而言，我们将得到三个 token：

开始标签： <div>
文本节点： hello world
结束标签： </div>

而这样一个利用 有限自动状态机 的状态迁移，来获取 tokens 过程，可以叫做：多模板的标记化 。而这个 tokens 是生成 AST 的关键，即：生成 AST 过程就是 tokens 扫描的过程。

扫描 tokens 过程需要引用到递归下降解析器的概念，我们以下面 html 结构为例：

html 复制代码

<div>
    <p>hello world</p>
</div>

该 html 可以被解析为如下 tokens：

md 复制代码

开始标签：<div>
开始标签：<p>
文本节点：hello world
结束标签：</p>
结束标签：</div>

扫描过程，初始状态，Root 为根节点：

标签进栈后，转换为对应的 AST：

一旦 结束标签</p> 进栈后就会弹出栈，执行新节点的再次进入（如有），执行完所有标签，此时 AST 节点树就生成了：

综上，AST 解析过程大致分为两步：一是 template 通过有限状态自动机解析成 tokens；二是将 tokens 解析成 AST 节点对象。

AST 对象

Vue 中 template 转换成 ast 节点对象，是通过 baseParse 方法来实现的，该方法定义在 packages/compiler-core/src/parse.ts 文件中：

ts 复制代码

export function baseParse(
  content: string,
  options: ParserOptions = {}
): RootNode {
  const context = createParserContext(content, options)
  const start = getCursor(context)
  return createRoot(
    parseChildren(context, TextModes.DATA, []),
    getSelection(context, start)
  )
}

根据案例，转换后 ast 对象大致如下：

ts 复制代码

const ast = {
    "type": 0,
    "children": [
        {
            "type": 1,
            "tag": "div",
            "tagType": 0,
            "props": [],
            "children": [{ "type": 2, "content": "hello world" }]
        }
    ],
    "loc": {}
}

该方法先通过 createParserContext 函数来创建上下文 context：

ts 复制代码

function createParserContext(
  content: string,
  rawOptions: ParserOptions
): ParserContext {
  const options = extend({}, defaultParserOptions)

  let key: keyof ParserOptions
  for (key in rawOptions) {
    // @ts-ignore
    options[key] =
      rawOptions[key] === undefined
        ? defaultParserOptions[key]
        : rawOptions[key]
  }
  return {
    options,
    column: 1,
    line: 1,
    offset: 0,
    originalSource: content,
    source: content,
    inPre: false,
    inVPre: false,
    onWarn: options.onWarn
  }
}

返回的结果是一个对象：

最后会返回由 createRoot 函数执行得到的对象：

ts 复制代码

export function createRoot(
  children: TemplateChildNode[],
  loc = locStub
): RootNode {
  return {
    type: NodeTypes.ROOT,
    children,
    helpers: [],
    components: [],
    directives: [],
    hoists: [],
    imports: [],
    cached: 0,
    temps: 0,
    codegenNode: undefined,
    loc
  }
}

我们主要关注第一个参数，也是 baseParse 函数的核心，通过执行 parseChildren 方法获取：

ts 复制代码

function parseChildren(
  context: ParserContext,
  mode: TextModes,
  ancestors: ElementNode[]
): TemplateChildNode[] {
  const parent = last(ancestors)
  const ns = parent ? parent.ns : Namespaces.HTML
  const nodes: TemplateChildNode[] = []

  while (!isEnd(context, mode, ancestors)) {
    __TEST__ && assert(context.source.length > 0)
    const s = context.source
    let node: TemplateChildNode | TemplateChildNode[] | undefined = undefined

    if (mode === TextModes.DATA || mode === TextModes.RCDATA) {
      if (!context.inVPre && startsWith(s, context.options.delimiters[0])) {
        // '{{'
        node = parseInterpolation(context, mode)
      } else if (mode === TextModes.DATA && s[0] === '<') {
        // https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
        if (s.length === 1) {
          emitError(context, ErrorCodes.EOF_BEFORE_TAG_NAME, 1)
        } else if (s[1] === '!') {
          // https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state
          if (startsWith(s, '<!--')) {
            node = parseComment(context)
          } else if (startsWith(s, '<!DOCTYPE')) {
            // Ignore DOCTYPE by a limitation.
            node = parseBogusComment(context)
          } else if (startsWith(s, '<![CDATA[')) {
            if (ns !== Namespaces.HTML) {
              node = parseCDATA(context, ancestors)
            } else {
              emitError(context, ErrorCodes.CDATA_IN_HTML_CONTENT)
              node = parseBogusComment(context)
            }
          } else {
            emitError(context, ErrorCodes.INCORRECTLY_OPENED_COMMENT)
            node = parseBogusComment(context)
          }
        } else if (s[1] === '/') {
          // https://html.spec.whatwg.org/multipage/parsing.html#end-tag-open-state
          if (s.length === 2) {
            emitError(context, ErrorCodes.EOF_BEFORE_TAG_NAME, 2)
          } else if (s[2] === '>') {
            emitError(context, ErrorCodes.MISSING_END_TAG_NAME, 2)
            advanceBy(context, 3)
            continue
          } else if (/[a-z]/i.test(s[2])) {
            emitError(context, ErrorCodes.X_INVALID_END_TAG)
            parseTag(context, TagType.End, parent)
            continue
          } else {
            emitError(
              context,
              ErrorCodes.INVALID_FIRST_CHARACTER_OF_TAG_NAME,
              2
            )
            node = parseBogusComment(context)
          }
        } else if (/[a-z]/i.test(s[1])) {
          node = parseElement(context, ancestors)

          // 2.x <template> with no directive compat
          if (
            __COMPAT__ &&
            isCompatEnabled(
              CompilerDeprecationTypes.COMPILER_NATIVE_TEMPLATE,
              context
            ) &&
            node &&
            node.tag === 'template' &&
            !node.props.some(
              p =>
                p.type === NodeTypes.DIRECTIVE &&
                isSpecialTemplateDirective(p.name)
            )
          ) {
            __DEV__ &&
              warnDeprecation(
                CompilerDeprecationTypes.COMPILER_NATIVE_TEMPLATE,
                context,
                node.loc
              )
            node = node.children
          }
        } else if (s[1] === '?') {
          emitError(
            context,
            ErrorCodes.UNEXPECTED_QUESTION_MARK_INSTEAD_OF_TAG_NAME,
            1
          )
          node = parseBogusComment(context)
        } else {
          emitError(context, ErrorCodes.INVALID_FIRST_CHARACTER_OF_TAG_NAME, 1)
        }
      }
    }
    if (!node) {
      node = parseText(context, mode)
    }

    if (isArray(node)) {
      for (let i = 0; i < node.length; i++) {
        pushNode(nodes, node[i])
      }
    } else {
      pushNode(nodes, node)
    }
  }

  // Whitespace handling strategy like v2
  let removedWhitespace = false
  if (mode !== TextModes.RAWTEXT && mode !== TextModes.RCDATA) {
    const shouldCondense = context.options.whitespace !== 'preserve'
    for (let i = 0; i < nodes.length; i++) {
      const node = nodes[i]
      if (!context.inPre && node.type === NodeTypes.TEXT) {
        if (!/[^\t\r\n\f ]/.test(node.content)) {
          const prev = nodes[i - 1]
          const next = nodes[i + 1]
          // Remove if:
          // - the whitespace is the first or last node, or:
          // - (condense mode) the whitespace is adjacent to a comment, or:
          // - (condense mode) the whitespace is between two elements AND contains newline
          if (
            !prev ||
            !next ||
            (shouldCondense &&
              (prev.type === NodeTypes.COMMENT ||
                next.type === NodeTypes.COMMENT ||
                (prev.type === NodeTypes.ELEMENT &&
                  next.type === NodeTypes.ELEMENT &&
                  /[\r\n]/.test(node.content))))
          ) {
            removedWhitespace = true
            nodes[i] = null as any
          } else {
            // Otherwise, the whitespace is condensed into a single space
            node.content = ' '
          }
        } else if (shouldCondense) {
          // in condense mode, consecutive whitespaces in text are condensed
          // down to a single space.
          node.content = node.content.replace(/[\t\r\n\f ]+/g, ' ')
        }
      }
      // Remove comment nodes if desired by configuration.
      else if (node.type === NodeTypes.COMMENT && !context.options.comments) {
        removedWhitespace = true
        nodes[i] = null as any
      }
    }
    if (context.inPre && parent && context.options.isPreTag(parent.tag)) {
      // remove leading newline per html spec
      // https://html.spec.whatwg.org/multipage/grouping-content.html#the-pre-element
      const first = nodes[0]
      if (first && first.type === NodeTypes.TEXT) {
        first.content = first.content.replace(/^\r?\n/, '')
      }
    }
  }

  return removedWhitespace ? nodes.filter(Boolean) : nodes
}

AST 解析过程就是通过该方法来实现的，我们再看下该方法的执行逻辑。根据 isEnd 条件执行 while 遍历：

ts 复制代码

function isEnd(
  context: ParserContext,
  mode: TextModes,
  ancestors: ElementNode[]
): boolean {
  const s = context.source

  switch (mode) {
    case TextModes.DATA:
      if (startsWith(s, '</')) {
        // TODO: probably bad performance
        for (let i = ancestors.length - 1; i >= 0; --i) {
          if (startsWithEndTagOpen(s, ancestors[i].tag)) {
            return true
          }
        }
      }
      break

    case TextModes.RCDATA:
    case TextModes.RAWTEXT: {
      const parent = last(ancestors)
      if (parent && startsWithEndTagOpen(s, parent.tag)) {
        return true
      }
      break
    }

    case TextModes.CDATA:
      if (startsWith(s, ']]>')) {
        return true
      }
      break
  }

  return !s
}

s 为我们传入的模板，当前为 <div>hello world</div>。根据判断 startsWith(s, '</') 当前模板开始位置不是结束标签开头，直接 break 返回。

条件满足继续执行 while 逻辑，取 s[0] 模板第一个字符判断是否为 < 开头；接着取 s[1] 第二个字符，当前为 d，根据判断 /[a-z]/i.test(s[1])，执行 parseElement 方法：

ts 复制代码

function parseElement(
  context: ParserContext,
  ancestors: ElementNode[]
): ElementNode | undefined {
  __TEST__ && assert(/^<[a-z]/i.test(context.source))

  // Start tag.
  const wasInPre = context.inPre
  const wasInVPre = context.inVPre
  const parent = last(ancestors)
  const element = parseTag(context, TagType.Start, parent)
  const isPreBoundary = context.inPre && !wasInPre
  const isVPreBoundary = context.inVPre && !wasInVPre

  if (element.isSelfClosing || context.options.isVoidTag(element.tag)) {
    // #4030 self-closing <pre> tag
    if (isPreBoundary) {
      context.inPre = false
    }
    if (isVPreBoundary) {
      context.inVPre = false
    }
    return element
  }

  // Children.
  ancestors.push(element)
  const mode = context.options.getTextMode(element, parent)
  const children = parseChildren(context, mode, ancestors)
  ancestors.pop()

  // 2.x inline-template compat
  if (__COMPAT__) {
    const inlineTemplateProp = element.props.find(
      p => p.type === NodeTypes.ATTRIBUTE && p.name === 'inline-template'
    ) as AttributeNode
    if (
      inlineTemplateProp &&
      checkCompatEnabled(
        CompilerDeprecationTypes.COMPILER_INLINE_TEMPLATE,
        context,
        inlineTemplateProp.loc
      )
    ) {
      const loc = getSelection(context, element.loc.end)
      inlineTemplateProp.value = {
        type: NodeTypes.TEXT,
        content: loc.source,
        loc
      }
    }
  }

  element.children = children

  // End tag.
  if (startsWithEndTagOpen(context.source, element.tag)) {
    parseTag(context, TagType.End, parent)
  } else {
    emitError(context, ErrorCodes.X_MISSING_END_TAG, 0, element.loc.start)
    if (context.source.length === 0 && element.tag.toLowerCase() === 'script') {
      const first = children[0]
      if (first && startsWith(first.loc.source, '<!--')) {
        emitError(context, ErrorCodes.EOF_IN_SCRIPT_HTML_COMMENT_LIKE_TEXT)
      }
    }
  }

  element.loc = getSelection(context, element.loc.start)

  if (isPreBoundary) {
    context.inPre = false
  }
  if (isVPreBoundary) {
    context.inVPre = false
  }
  return element
}

我们再看下 parseTag 方法：

ts 复制代码

function parseTag(
  context: ParserContext,
  type: TagType,
  parent: ElementNode | undefined
): ElementNode | undefined {
  // 省略
  // Tag open.
  const start = getCursor(context)
  const match = /^<\/?([a-z][^\t\r\n\f />]*)/i.exec(context.source)!
  const tag = match[1]
  const ns = context.options.getNamespace(tag, parent)

  advanceBy(context, match[0].length)
  advanceSpaces(context)

  // save current state in case we need to re-parse attributes with v-pre
  const cursor = getCursor(context)
  const currentSource = context.source

  // check <pre> tag
  if (context.options.isPreTag(tag)) {
    context.inPre = true
  }

  // Attributes.
  let props = parseAttributes(context, type)

  // check v-pre
  // 省略

  // Tag close.
  let isSelfClosing = false
  if (context.source.length === 0) {
    emitError(context, ErrorCodes.EOF_IN_TAG)
  } else {
    isSelfClosing = startsWith(context.source, '/>')
    if (type === TagType.End && isSelfClosing) {
      emitError(context, ErrorCodes.END_TAG_WITH_TRAILING_SOLIDUS)
    }
    advanceBy(context, isSelfClosing ? 2 : 1)
  }

  if (type === TagType.End) {
    return
  }

  // 省略

  let tagType = ElementTypes.ELEMENT
  // 省略

  return {
    type: NodeTypes.ELEMENT,
    ns,
    tag,
    tagType,
    props,
    isSelfClosing,
    children: [],
    loc: getSelection(context, start),
    codegenNode: undefined // to be created during transform phase
  }
}

通过正则对模板的匹配，获取 tag，当前为 div，之后执行 advanceBy 进行游标移动：

ts 复制代码

function advanceBy(context: ParserContext, numberOfCharacters: number): void {
  const { source } = context
  __TEST__ && assert(numberOfCharacters <= source.length)
  advancePositionWithMutation(context, source, numberOfCharacters)
  context.source = source.slice(numberOfCharacters)
}

解析完，此时模板为 >hello world</div>：

此时 <div 被解析完成，后续解析是从 >hello world</div> 开始的。继续执行 isSelfClosing = startsWith(context.source, '/>') ，判断当前模板开始位置是否为结束标签，如果是，则移动两位，否则移动一位。当前为否，解析完，此时模板为 hello world</div>：

最后返回该节点对象，包含 type、tag、codegenNode 等属性：

当前解析完，模板为 hello world</div>，说明 <div> 被解析完成。之后将解析的节点插入 ancestors，接着再次触发 parseChildren 函数，执行 parseText 方法解析文本：

ts 复制代码

function parseText(context: ParserContext, mode: TextModes): TextNode {
  __TEST__ && assert(context.source.length > 0)

  const endTokens =
    mode === TextModes.CDATA ? [']]>'] : ['<', context.options.delimiters[0]]

  let endIndex = context.source.length
  for (let i = 0; i < endTokens.length; i++) {
    const index = context.source.indexOf(endTokens[i], 1)
    if (index !== -1 && endIndex > index) {
      endIndex = index
    }
  }

  __TEST__ && assert(endIndex > 0)

  const start = getCursor(context)
  const content = parseTextData(context, endIndex, mode)

  return {
    type: NodeTypes.TEXT,
    content,
    loc: getSelection(context, start)
  }
}

声明 endTokens 为 ['<', '{{']，用来处理模板文本后面一个标签位置，当前模板 hello world</div>，长度为 17，遍历处理完，endIndex 为 11，即 hello world<。之后执行 parseTextData 方法：

ts 复制代码

function parseTextData(
  context: ParserContext,
  length: number,
  mode: TextModes
): string {
  const rawText = context.source.slice(0, length)
  advanceBy(context, length)
  if (
    mode === TextModes.RAWTEXT ||
    mode === TextModes.CDATA ||
    !rawText.includes('&')
  ) {
    return rawText
  } else {
    // DATA or RCDATA containing "&"". Entity decoding required.
    return context.options.decodeEntities(
      rawText,
      mode === TextModes.ATTRIBUTE_VALUE
    )
  }
}

获取到文本内容 rawText = 'hello world'，再移动游标 advanceBy，当前模板为 </div> ：

之后 parseText 返回文本节点对象，接着执行 parseChildren 中 pushNode 方法，将该文本节点插入 nodes 中：

ts 复制代码

function pushNode(nodes: TemplateChildNode[], node: TemplateChildNode): void {
  if (node.type === NodeTypes.TEXT) {
    const prev = last(nodes)
    // Merge if both this and the previous node are text and those are
    // consecutive. This happens for cases like "a < b".
    if (
      prev &&
      prev.type === NodeTypes.TEXT &&
      prev.loc.end.offset === node.loc.start.offset
    ) {
      prev.content += node.content
      prev.loc.end = node.loc.end
      prev.loc.source += node.loc.source
      return
    }
  }

  nodes.push(node)
}

接着回到 parseElement 方法，执行 ancestors.pop() 将文本节点弹出，又回到第一层，并将 element.children = children 文本节点赋值给 div 元素子节点上：

当前模板还剩余 </div> 未处理，接着根据判断是否以结束标签开头 startsWithEndTagOpen(context.source, element.tag)：

ts 复制代码

function startsWithEndTagOpen(source: string, tag: string): boolean {
  return (
    startsWith(source, '</') &&
    source.slice(2, 2 + tag.length).toLowerCase() === tag.toLowerCase() &&
    /[\t\r\n\f />]/.test(source[2 + tag.length] || '>')
  )
}

再次执行 parseTag 方法，解析完，当前模板为 >：

再根据 isSelfClosing 变量判断游标移动几位，至此 source 模板都解析完成：

之后再将包含文本子节点的元素节点插入到 nodes 中，此时 parseChildren 执行完毕，执行 createRoot 方法，返回处理好的模板对象，即 AST 对象：

下一篇，我们将继续来分析 compiler 编译器的 transform 函数是如何将 AST 对象 转换为 Javascript AST 对象，以及 generate 函数如何将 AST 生成 render 函数 的。

总结

整个 AST 生成的核心就是 parseChildren 方法。
生成的过程中，对 template 即 <div>hello world</div> 进行解析，整个解析过程分为 5 步：
1. 第一次解析 <div ，此时 context.source: >hello world</div>
2. 第二次解析 > ，此时 context.source: hello world</div>
3. 第三次解析 hello world ，此时 context.source: </div>
4. 第四次解析 </div ，此时 context.source: >
5. 第五次解析 > ，此时 context.source: ''
在这个解析过程，我们逐步扫描对应的每次 token，得到一个对应的 AST 对象。

Vue3 源码实现

vue-next-mini