给你的 markdown 文件链接渲染时添加索引和引用序列

一、Why?

在微信公众号看文章时，有些博主的文章 对外链接(引用) 链接标记的非常明确，想想这应该是深受写论文的影响,一个好的的引用展示方式是对文章作者的尊重。但是 markdown 中没有那么好的引用功能，我们应该如何实现呢？我们会遇到哪些问题？

如何获取所有的 [alt](url) 的内容，使用正则直接进行匹配？可以但是如果要自己的写正则似乎不是那么理想？有更好的办法吗？有 ast 将 markdown 抽象为语法数，解析为 js 数组或对象。这样就可以方便的进行解析我们出需要的目标。

二、实现目标

在文章底部添加索引标注列表, 格式是： [1] {alt}:{herf}。
本位的目标不是写一个 markdown 的微信公众平台的排版工具，而是分享处理引用的方法。。
在链接上添加索引（一般使用 sup 标签，并且给其目标样式, 格式 <sup>[index + 1]</sup>）。
基于 Node.js ESM 环境运行

三、基于 markdown-it 解析

其实markdown 的编译器和解释器有很多，这里就选择 markdown-it 进行解析，我们的目标就是获取 parse 之后的 AST Token。

四、搭建 Node.js es module 环境

ts 复制代码

cd your_dir
pnpm init
pnpm add markdown-it
touch utils.js index.js README.md

utils 辅助函数
index.js 是主要运行函数，它的目标输出 html 文件
README.md 当然是我们的目标读取mardown 文件

五、markdown-it 解析到 Token 简单理解

实例化 markdownIt

ts 复制代码

import markdownIt from "markdown-it"

const mdi = new markdownIt();

所有 markdown 的解析相关的工作在 mdi 实例上进行。

node.js 读取 markdown 文件，获取 markdown 文件内容

ts 复制代码

fs.readFile("./README.md", "utf-8", (err, data) => {
    // data 即为 markdown 
})

注意：在安全方面，默认的 markdown-it 会将单独的 <a href="/adb">adb</a>，不会直接渲染成 html标签字符串,而是经过转义的相对安全的 html,这点是需要注意的。

六、渲染 markdown 到 html，在 a 标签上添加需要

ts 复制代码

const _html = mdi.render(data, {});
let linkIndex = 1;
const html = _html.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, (match, href, alt) => {
    const modifiedLink = `<a href="${href}" style="color: blue">${match}<sup>[${linkIndex}]</sup></a>`;
    linkIndex++;
    return modifiedLink;
});

使用正则匹配 a 标签的所有内容，并在重新设置 a 标签：

初始化索引 linkIndex 为 1。
添加 style 内置样式， style="color: blue"，当然你也可以根据自己的需求自定义。
添加索引 <sup>[${linkIndex}]</sup>, 当然也可以根据自己的需要添加样式。

当到这一步我们的 html 中链接解析添加索引的任务就完成了。很简单就是一个正则匹配。

七、从 AST 中获取 a 标签的列表

markdown-it 解析的 tokens 是一个数组，数组的每一项都是类树形结构，使用 children 链接。

ts 复制代码

// 递归处理 AST
export function extractLinks(tokens) {
  const links = [];
  let currentLink = null;

  tokens.forEach(token => {
      if (token.type === 'link_open') {
          currentLink = {
              href: token.attrs.find(attr => attr[0] === 'href')[1],
              alt: '', // 初始化空的 alt 属性值
          };
      } else if (token.type === 'text' && currentLink !== null) {
          currentLink.alt += token.content; // 将文本内容添加到 alt 属性值中
      } else if (token.type === 'link_close' && currentLink !== null) {
          links.push(currentLink); // 将完整的链接对象添加到 links 数组中
          currentLink = null;
      } else if (token.children) {
          links.push(...extractLinks(token.children)); // 递归处理子元素
      }
  });

  return links;
}

export function genList(list) {
  let html = '<ul style="list-style: none; padding-left: 0px">'
  list.forEach((li, index) => {
    html = html + `<li>[${index + 1}]: ${li.alt} ${li.href}</li>`
  })
  html += '</ul>'

  return html
}

extractLinks 函数用于从 markdown-it 解析之后的 tokens 数组中获取 a 标签所有的 href 和 alt(作为content)。
在 token 中 type 是 link_open 的表示是 a 标签，我们需要获取其内容。同时获取 a 标签的 content(alt)。
当然也是通过递归的方式处理这种 tree 树形结构的数据。
genList 作用根据列表输出 ul-li 的列表（可自定义）。

八、输出 html 文件

ts 复制代码

import fs from "fs";

const output = /*markdown_html + list_html*/
fs.writeFileSync('./index.html', output)

使用 node.js 中 fs 提供的 writeFileSync 方法直接写出 index.html，在 html 中预览效果。

九、总体实现

utils.js

ts 复制代码

// 递归处理 AST
export function extractLinks(tokens) {
  const links = [];
  let currentLink = null;

  tokens.forEach(token => {
      if (token.type === 'link_open') {
          currentLink = {
              href: token.attrs.find(attr => attr[0] === 'href')[1],
              alt: '', // 初始化空的 alt 属性值
          };
      } else if (token.type === 'text' && currentLink !== null) {
          currentLink.alt += token.content; // 将文本内容添加到 alt 属性值中
      } else if (token.type === 'link_close' && currentLink !== null) {
          links.push(currentLink); // 将完整的链接对象添加到 links 数组中
          currentLink = null;
      } else if (token.children) {
          links.push(...extractLinks(token.children)); // 递归处理子元素
      }
  });

  return links;
}

export function genList(list) {
  console.log("xlist", list)
  let html = '<ul style="list-style: none; padding-left: 0px">'
  list.forEach((li, index) => {
    html = html + `<li>[${index + 1}]: ${li.alt} ${li.href}</li>`
  })
  html += '</ul>'

  return html
}

index.js

ts 复制代码

import markdownIt from "markdown-it";
import fs from "fs";
import { extractLinks, genList } from "./utils.js";

const mdi = new markdownIt();

fs.readFile("./README.md", "utf-8", (err, data) => {
  const _html = mdi.render(data, {});
  let linkIndex = 1;
  const html = _html.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, (match, href, alt) => {
    const modifiedLink = `<a href="${href}" style="color: blue">${match}<sup>[${linkIndex}]</sup></a>`;
    linkIndex++;
    return modifiedLink;
});

  
  const token = mdi.parse(data);
  const list = genList(extractLinks(token))

  const output = html + list
  fs.writeFileSync('./index.html', output)
});

十、边界分析

由于有的时候链接比较复杂，链接的内容可以是一个图片：

ts 复制代码

 [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://reactjs.org/docs/how-to-contribute.html#your-first-pull-request)

这种写法，理论上我们在写文章的不出现图片的引用与索引配合的情况。一般都是文字索引与连接的配合展示。所以此处不做处理。

十一、扩展

有了应用的需求的基本实现，能够轻松的实现微信的排版功能。
编辑情况可能比现有的更加复杂，需要更多的使用场景分析。
可以自己实现 css 样式，实现自己的排版工具。
不得不说明的是作为一个排版工具，安全性 xss 等问题依然需要关注。

十二、小结

本文主要为 markdown 文章渲染链接添加索引和引序号，这种做法在论文中十分常见，表示我们尊重别人的知识成果。本文主要基于 markdown-it 提供的 parse 解析抽象语法树 AST 的 token 获取目标数组。同时使用 render 渲染 html 时使用正则，匹配 a 标签，替换内容。本文为一种实现可能，希望能够帮助读者。