前言

本文主要介绍纯前端实现PDF、MD、WORD类型导出的处理思路之一全文涉及到的代码不具有适用性，仅供参考思路

欢迎提供改进意见等🫶！

需求描述

需求场景：在dataset 生成数据事实report的时候，需要导出功能，将report 进行分享，分享格式为PDF、MD、WORD的文件

导出元数据：report源数据为report的content list，类型为ContentBlock数组，可以理解为有内容类型标识的内容数组，其中的内容类型主要有：
- 静态文本（已由后端处理成为md文本）
- 静态图片
- echart图表
- 代码块
样式要求：导出内容有排版需求，特别是PDF文件，需要分页

难点疑点

需求的难点在于以下几点

1. 数据格式转化

元数据只是json数据，如何进行处理生成多类型文件

对于静态图片数据
- 元数据中只提供了s3地址，因为图片的渲染是浏览器端实现的，并且图片地址是无法直接转到pdf、word的，需要额外处理
对于echart实时渲染图表
- 如何将内容转到其他文件类型内容，并且保证导出的是实时渲染的图表，需要处理

2. 样式处理

主要问题在于PDF导出中，PDF会进行分页截断，如何避免report内容不会被直接截断，导致内容跨页的情况，需要处理

技术方案

初期的技术方案探索基本上由导师完成🙇‍♂️ 初期技术方案思路为：

由于content list其中文本内容、代码块内容本身已经是md文本，所以直接将content list转为md文本，首先实现导出md文件
使用marked.js将md文件转为html, 基于html文本可以进行纯js解析，以实现导出成为word文件
实现解析html函数processNode，结合docx进行对于不同节点类型生成不同的word内容对象，例如paragraph对象，docx自动将这些对象转化为word内容，进而实现导出word文件
使用html2canvas将html转为canvas，再使用jsPDF根据canvas内容生成PDF内容，进而实现PDF导出

具体实现

生成md文件

所以想要生成MD文件，需要：

遍历元数据，对于数据进行针对性地处理
遇到文本、代码的类型内容直接加入
对于图片类型，md文档可以插入图片地址，也可以指定dataurl。md规则为![alt](url)。
对于图表类型，由于图表是由echarts动态生成的，想要动态获取"截图"，就需要

使用html2canvas将图表dom节点绘制到canvas上
将canvas转为图片，使用toDataURL方法可以直接导出dataurl
使用dataurl换掉原地址

对于表格类型，由于后端数据格式为tableColumns数组，所以需要动态按照md的表格规则进行拼凑出表格内容。其中md规则主要拆分三部分

tableHeadRow，例如｜head1｜head2｜head3｜，表头行
tableStyleRow，例如｜:----:｜:----:｜:----:｜，表格样式控制行，具体样式控制规则，参考：www.runoob.com/markdown/md...
tableDataRows, 例如｜1｜2｜3｜，表内容行，需要严格对应表头行

针对图表、表格的转化，大致思路可以参考以下函数

typescript 复制代码

async function transformChart(chartID: string,) {
  const chartEle = document.getElementById(chartID);
  const chartCanvas = await html2canvas(chartEle as HTMLElement, {
      useCORS: true,
      scrollX: 0,
      scrollY: 0,
    });

   return `![${c.block}](${chartCanvas.toDataURL("image/png", 0.8)})`
}

typescript 复制代码

// tableColumns 例子 仅供参考，具体视自己数据结构而定
// tableColumns:
// [
//   {
//   Average_Start_Position_X: -1,
//   Average_Start_Position_Y: -1,
//   Year: 1960
//   },
//   {
//   Average_Start_Position_X: -2,
//   Average_Start_Position_Y: -1,
//   Year: 1961
//   }
// ]
async function transformTable(
  tableContent: any,
) {
  // 获取
  const tableDataSource = await fetchJson(tableContent.url); // 实际tableData的内容，即tableColumns数组
  const { columns = [], excludeColumns = [] } = tableContent
    .config;
  const tableHeadColumns = columns.filter(
    (item) => !excludeColumns.includes(item),
  );
  
  const tableHeadRow = `| ${tableHeadColumns.join(" | ")} |`;
  const tableStyleRow = `| ${new Array(tableHeadColumns.length).fill(" :----: ").join(" | ")} |`;
  const tableDataRows = tableDataSource
    .map(
      (row: { [x: string]: any }) =>
        `| ${tableHeadColumns.map((th) => row[th]).join(" | ")} |`,
    )
    .join("\n");
  return `${tableHeadRow} \n ${tableStyleRow} \n ${tableDataRows}`
}

1.2. 输出md文件

至此基本上，md文档内容的处理就完成了，只需要将拼接的文本内容转为类型为text/markdown的Blob对象，接着使用saveAs输出为md文件即可

javascript 复制代码

saveAs(
  new Blob([mdText],
  { type: "text/markdown" }),`${title || "example"}.md`
);

生成word文件

1.1. dom节点获取

word文件生成的核心流程是需要根据dom节点，去生成word的节点，那么dom节点怎么获取？

需要根据输出内容是否已经在当前页面展示而定。

是，则可以直接获取；
否，则需要考虑提前插入html并隐藏显示，或者输出时动态生成插入html。

1.2. 节点转化

对于dom节点转化为word节点，由于docx不提供解析函数，所以需要手动实现。。。

我们需要首先知道内容里面会有哪些dom节点，再去docx文档查找对应的节点，例如p标签对应的是Paragraph节点对象

特别注意，需要严格遵守docx定义的类型，避免节点处理、嵌套错误，轻则展示不出，重则直接报错无法打开文件。

以下是自主实现的解析函数 (WARNING：仅供参考思路，不具有适用性)

typescript 复制代码

function processTextNode(node: any): TextRunWithStyle {
  return { text: he.decode(node.textContent || "") };
}

function processNode(
  node: any,
): (
  | Paragraph
  | Table
  | TextRunWithStyle
  | ExternalHyperlink
  | ParagraphChild
)[] {
  const nodeName = node.nodeName.toLowerCase();
  let elements: (Paragraph | Table | TextRunWithStyle | ParagraphChild)[] = [];

  if (node.nodeType === 3 && node.textContent !== "\n") {
    // Text node
    return [processTextNode(node)];
  }

  switch (nodeName) {
    case "h1":
    case "h2":
    case "h3":
    case "h4":
    case "h5":
    case "h6":
      elements.push(
        new Paragraph({
          heading:
            HeadingLevel[`HEADING_${nodeName[1]}` as keyof typeof HeadingLevel],
          children: node.childNodes
            .flatMap(processNode)
            .map((el: any) => new TextRun(el as TextRunWithStyle)),
          spacing: { after: 0 },
        }),
      );
      break;
    case "p":
      const pChildNodes = flatten<ParagraphChild>(
        node.childNodes.flatMap((c: any) => {
          if (c.nodeType === 3) {
            return new TextRun({
              text: he.decode(c.text || ""),
            }); // 对于纯文本的p，需要TextRun处理，不能使用processTextNode否则无法被读取
          }
          return processNode(c);
        }),
      );
      const ParagraphChildren: ParagraphChild[] = [];
      pChildNodes.forEach((el: any) => {
        if (el instanceof Paragraph) {
          elements.push(el);
        } else {
          // 兜底，如果有未被处理的text
          ParagraphChildren.push(el?.text ? new TextRun(el as any) : el);
        }
      });
  
      elements.push(
        new Paragraph({
          children: ParagraphChildren,
          spacing: { after: 0 },
        }),
      );
      break;
    case "ul":
    case "ol":
      node.childNodes.forEach((listItem: any) => {
        if (listItem.nodeName.toLowerCase() === "li") {
          // 首先处理当前列表项的文本内容
          const itemChildren = Array.from(listItem.childNodes);
          const currentItemContent: any[] = [];

          itemChildren.forEach((child: any) => {
            if (
              child.nodeType === 3 || // 处理文本节点
              (child.nodeType === 1 &&
                child.nodeName.toLowerCase() !== "ul" &&
                child.nodeName.toLowerCase() !== "ol" &&
                child.nodeName.toLowerCase() !== "p") // 处理非嵌套列表项的元素
            ) {
              currentItemContent.push(...processNode(child));
            } else if (
              child.nodeType === 1 &&
              child.nodeName.toLowerCase() === "p"
            ) {
              // 处理特殊p标签
              elements.push(...processNode(child));
            }
          });

          // 然后处理当前列表项的嵌套列表
          const nestedLists = itemChildren.filter(
            (child: any) =>
              child.nodeType === 1 &&
              (child.nodeName.toLowerCase() === "ul" ||
                child.nodeName.toLowerCase() === "ol"),
          );

          if (nodeName === "ul") {
            const paragraphOptions: any = {
              children: currentItemContent.map(
                (el: any) => new TextRun(el as TextRunWithStyle),
              ),
              spacing: { before: 0, after: 0 },
            };
            paragraphOptions.bullet = { level: 0 };
            elements.push(new Paragraph(paragraphOptions));
          }

          // 递归处理嵌套的列表项
          nestedLists.forEach((nestedList: any) => {
            elements.push(...processNode(nestedList));
          });
        } else {
          elements.push(...processNode(listItem));
        }
      });

      break;
    case "table":
      const rows = node.getElementsByTagName("tr");
      const cellMargin = {
        top: convertInchesToTwip(0.12),
        bottom: convertInchesToTwip(0.12),
        left: convertInchesToTwip(0.16),
        right: convertInchesToTwip(0.16),
      };
      const cellBorder = {
        top: {
          style: BorderStyle.SINGLE,
          size: 1,
          color: "e8e8e8",
        },
        bottom: {
          style: BorderStyle.SINGLE,
          size: 1,
          color: "e8e8e8",
        },
        left: {
          style: BorderStyle.SINGLE,
          size: 1,
          color: "e8e8e8",
        },
        right: {
          style: BorderStyle.SINGLE,
          size: 1,
          color: "e8e8e8",
        },
      };
      const tableHeaderRow = Array.from(rows)
        .filter((row: any) => row.parentNode.nodeName === "thead")
        .map((row: any) => {
          return new TableRow({
            children: Array.from(row.childNodes)
              .filter((cell: any) => cell.nodeName.toLowerCase() === "th")
              .map(
                (cell: any) =>
                  new TableCell({
                    children: [
                      new Paragraph({
                        alignment: AlignmentType.CENTER,
                        children: cell.childNodes.flatMap(processNode).map(
                          (el: any) =>
                            new TextRun({
                              ...el,
                              bold: true,
                            } as TextRunWithStyle),
                        ),
                      }),
                    ],
                    shading: {
                      type: ShadingType.SOLID,
                      color: "eeeeee",
                    },
                    borders: cellBorder,
                    margins: cellMargin,
                  }),
              ),
          });
        });

      const tableDataRows = Array.from(rows)
        .filter((row: any) => row.parentNode.nodeName === "tbody")
        .map(
          (row: any, rowIndex) =>
            new TableRow({
              children: Array.from(row.childNodes)
                .filter((cell: any) => cell.nodeName.toLowerCase() === "td")
                .map(
                  (cell: any) =>
                    new TableCell({
                      children: [
                        new Paragraph({
                          alignment: AlignmentType.CENTER,
                          children: cell.childNodes
                            .flatMap(processNode)
                            .map(
                              (el: any) => new TextRun(el as TextRunWithStyle),
                            ),
                        }),
                      ],
                      verticalAlign: VerticalAlign.CENTER,
                      borders: cellBorder,
                      margins: cellMargin,
                      shading: {
                        type: ShadingType.SOLID,
                        color: rowIndex % 2 === 0 ? "f9f9fa" : "ffffff",
                      },
                    }),
                ),
            }),
        );

      elements.push(
        new Table({
          rows: [...tableHeaderRow, ...tableDataRows],
          width: {
            size: 100,
            type: WidthType.PERCENTAGE,
          },
        }),
      );
      break;
    case "strong":
    case "b":
      elements = node.childNodes.flatMap(processNode).map((el: any) => ({
        ...(el as TextRunWithStyle),
        bold: true,
      }));
      break;
    case "em":
    case "i":
      elements = node.childNodes.flatMap(processNode).map((el: any) => ({
        ...(el as TextRunWithStyle),
        italics: true,
      }));
      break;
    case "a":
      // FixMe: 插入不生效
      const link = new Paragraph({
        children: [
          new ExternalHyperlink({
            children: [
              new TextRun({
                text: he.decode(node.textContent || ""),
                style: "Hyperlink",
              }),
            ],
            link: he.decode(node.getAttribute("href") || ""),
          }),
        ],
        spacing: { after: 0 },
      });
      elements.push(link);
      break;
    case "img":
      const imgSrc = node.getAttribute("src");
      const alt = node.getAttribute("alt");
      // 仅支持base64的dataurl
      if (imgSrc.startsWith("data:")) {
        const imageBase64Data = imgSrc.split(",")[1];
        let image = null;
        if (alt === "cover") {
          // 封面宽高比2:1
          image = new ImageRun({
            data: Buffer.from(imageBase64Data, "base64"),
            transformation: {
              width: 595,
              height: 297.5,
            },
          });
        } else {
          image = new ImageRun({
            data: Buffer.from(imageBase64Data, "base64"),
            transformation: {
              width: 595,
              height: 417,
            },
          });
        }

        // MD 文档的img标签![]()会被转化为<p><img/><p>,导致嵌套
        if (node.parentNode.nodeName === "body") {
          elements.push(
            new Paragraph({
              children: [image],
            }),
          );
        } else {
          elements.push(image);
        }
      }
      break;
    case "code":
      const codeChildNodes = node.childNodes.map(
        (el: any) =>
          new TextRun({
            text: el.text,
          }),
      );

      // MD 文档的code标签有行内代码，会导致导致嵌套Paragraph
      if (node.parentNode.nodeName === "body") {
        elements.push(
          new Paragraph({
            children: codeChildNodes,
          }),
        );
      } else {
        elements.push(codeChildNodes);
      }
      break;
    default:
      elements = node.childNodes.flatMap(processNode);
  }
  return elements;
}
export const convertMarkdownToDocx = async (markdown: string) => {
  const html = await convertMarkdownToHtml(markdown); // 假设这个函数已经存在
  const docElements = htmlStrToParagraph(html);

  return new Document({
    styles: {
      paragraphStyles: [
        {
          id: "Hyperlink",
          name: "Hyperlink",
          basedOn: "Normal",
          next: "Normal",
          run: {
            color: "0000FF",
            underline: {
              type: "single",
              color: "0000FF",
            },
          },
        },
      ],
    },
    numbering: {
      config: [
        {
          reference: "myOrderedList",
          levels: [
            { level: 0, format: "decimal", text: "%1." },
            { level: 1, format: "lowerLetter", text: "%2." },
            { level: 2, format: "lowerRoman", text: "%3." },
          ],
        },
      ],
    },
    sections: [{ children: docElements }],
  });
};

1.3. 处理节点

解析完节点之后，需要进行进一步处理，包含TextRunWithStyle类型检查兜底，空行间距样式。

typescript 复制代码

export const htmlStrToParagraph = (str: string): (Paragraph | Table)[] => {
  const dom = parseFromString(`<body>${str}</body>`);

  const body = dom.getElementsByTagName("body")[0];
  const docElements = body.childNodes.flatMap(processNode);

  // Ensure all TextRunWithStyle are wrapped in Paragraphs
  const finalElements: (Paragraph | Table)[] = [];
  let currentParagraph: Paragraph | null = null;

  docElements.forEach((el) => {
    if (el instanceof Paragraph || el instanceof Table) {
      if (currentParagraph) {
        finalElements.push(currentParagraph);
        currentParagraph = null;
      }
      finalElements.push(el);
    } else {
      if (!currentParagraph) {
        currentParagraph = new Paragraph({ children: [] });
      }
      currentParagraph.addChildElement(new TextRun(el as any));
    }
  });

  if (currentParagraph) {
    finalElements.push(currentParagraph);
  }

  // Add empty paragraphs for spacing
  const spacedElements: (Paragraph | Table)[] = [];
  finalElements.forEach((el) => {
    spacedElements.push(el);
    spacedElements.push(
      new Paragraph({
        children: [new TextRun({ text: "" })],
        spacing: { before: 0, after: 0 },
      }),
    );
  });

  return spacedElements;
};

1.4. 生成word文件对象

使用处理完毕的节点列表就可以生成word文件对象，相关的word样式定义都在此处声明

typescript 复制代码

export const generateDocx = async () => {
  const html = await getHtmlDom(); // 获取文件源html dom，假设这个函数已经存在
  const docElements = htmlStrToParagraph(html);

  return new Document({
    styles: {
      paragraphStyles: [
        {
          id: "Hyperlink",
          name: "Hyperlink",
          basedOn: "Normal",
          next: "Normal",
          run: {
            color: "0000FF",
            underline: {
              type: "single",
              color: "0000FF",
            },
          },
        },
      ],
    },
    numbering: {
      config: [
        {
          reference: "myOrderedList",
          levels: [
            { level: 0, format: "decimal", text: "%1." },
            { level: 1, format: "lowerLetter", text: "%2." },
            { level: 2, format: "lowerRoman", text: "%3." },
          ],
        },
      ],
    },
    sections: [{ children: docElements }],
  });
};

1.5. 导出文件

最后使用word自带的Packer即可导出word文件的Blob，使用file-saver的saveAs存储

typescript 复制代码

export const exportDocfile = async (title: string) => {
  const doc = await generateDocx();
  Packer.toBlob(doc).then((blob) => {
    saveAs(blob, `${title}.docx`);
  });
};

生成PDF文件

1.1. 模拟分页

拿到源html，进行模拟PDF 分页逻辑生成新的Html模版。

基本处理思路就是，每一页是一个固定高度的dom容器，遍历源dom节点，动态计算当前容器是否还有空余空间存放当前节点，有则插入，无则创建新的容器即新页面，插入到新容器。

1.1.1. 创建PDF内容父容器

创建一个PDF的父容器，父容器存放每页的子容器。同时需要获取源html的实际渲染结果，因为我们的PDF是有固定宽高，此处按照A4纸；我们需要保证A4纸宽高比的情况下，设置好子容器的实际高度。

typescript 复制代码

async function mockA4PageEle(sourceContainer: HTMLDivElement) {
  // 1. 创建A4页面父容器进行存放内容
  const pageEle = document.createElement("div");
  pageEle.style.position = "fixed";
  pageEle.style.zIndex = "-99";
  pageEle.style.top = "-9999999";
  pageEle.style.left = "0";
  document.body.appendChild(pageEle);
  // 2. 获取元素的实际渲染结果
  const PAGE_WIDTH = sourceContainer.getBoundingClientRect().width;
  const PAGE_HEIGHT = Math.ceil(PAGE_WIDTH / 0.707);
  const MARGIN = 80;
  const MAX_CONTENT_HEIGHT = PAGE_HEIGHT - 2 * MARGIN - 25; // 安全高度25
  const nodeList = Array.from(sourceContainer.children);
  const pageConfig: IPageConfig = {
    PAGE_WIDTH,
    PAGE_HEIGHT,
    MARGIN,
    MAX_CONTENT_HEIGHT,
  };
  // 3. 进行分页
  splitPage(pageConfig, nodeList, pageEle);

  return pageEle;
}

1.1.2. 分页

注意！！！⚠️：以下逻辑只使用于嵌套不深的情况，复杂html源请勿参考。

第一步创建第一页子容器，使用到封装函数generateA4Page

typescript 复制代码

function generateA4Page({ PAGE_WIDTH, PAGE_HEIGHT, MARGIN }: IPageConfig) {
  const currentPage = document.createElement("div");
  currentPage.style.width = `${PAGE_WIDTH - 2 * MARGIN}px`;
  currentPage.style.height = `${PAGE_HEIGHT - 2 * MARGIN}px`;
  currentPage.style.margin = `${MARGIN}px`;
  currentPage.style.marginBottom = `${2 * MARGIN}px`; // 强制处理间距重叠
  currentPage.style.pageBreakAfter = "always";
  currentPage.style.display = `flex`;
  currentPage.style.flexDirection = `column`;
  // currentPage.style.overflow = "hidden";
  // currentPage.style.border = `1px solid black`;
  return currentPage;
}

第二步进行遍历dom节点，只遍历第一层节点。所有节点插入页面子容器需要克隆。

对于特殊类型节点需要处理：

表格类型：

PDF展示空间有限，无法展示大型表格且阅读效率低。所以对于超大型表格，我们此处考虑直接截断前10行展示。
考虑当前页是否能插入表格，否则创建新页面；判断是否需要截断，是则取出前十行进行创建一个新的table节点插入，否则直接克隆节点

图片类型：

图片可能太大，则需要进行缩放，缩放方法视需求而定，此处的处理逻辑可以忽略。主要可以参考缩放图片函数scaleImg。

typescript 复制代码

function scaleImg(
  width: string,
  height: string,
  imgSrc: string,
  objectFit: CSSStyleDeclaration["objectFit"] = "cover",
  imgMargin: CSSStyleDeclaration["margin"] = "0 auto",
) {
  const scaledImg = document.createElement("img");
  const container = document.createElement("div");

  scaledImg.src = imgSrc;

  container.style.width = width;
  container.style.height = height;
  container.style.overflow = "hidden";
  container.style.marginBottom = "16px";
  // container.style.border = `1px solid black`;
  scaledImg.style.maxHeight = `100%`;
  scaledImg.style.maxWidth = `100%`;
  scaledImg.style.width = `auto`;
  scaledImg.style.height = `auto`;
  scaledImg.style.objectFit = objectFit;
  scaledImg.style.margin = imgMargin;

  container.appendChild(scaledImg);
  return container;
}

typescript 复制代码

function splitPage(
  pageConfig: IPageConfig,
  nodeList: Element[],
  pageEle: HTMLDivElement,
) {
  const { MAX_CONTENT_HEIGHT, PAGE_WIDTH, MARGIN } = pageConfig;

  // 创建第一页子容器
  let currentPage = generateA4Page(pageConfig);
  pageEle.appendChild(currentPage);
  let currentPageHeight = MAX_CONTENT_HEIGHT; // 当前页面剩余高度
  // 开始遍历节点
  for (let nodeIndex = 0; nodeIndex < nodeList.length; nodeIndex += 1) {
    const node = nodeList[nodeIndex];

    const nodeHeight = getNodeHeight(node);
    if (node.nodeName === "TABLE") {
      // 对于表格需要特殊处理, 超过十行展示预览表格，预览表格只展示最多十行
      const tableBody = Array.from(node.querySelector("tbody")?.children || []);
      const totalRowLen = tableBody.length;
      const canInsertRowLen = Math.floor(currentPageHeight / 55) - 1;
      // 若无法直接插入原表格/预览表格，则需要创建新页
      if (canInsertRowLen < 1 || canInsertRowLen < Math.min(totalRowLen, 10)) {
        currentPage = generateA4Page(pageConfig);
        pageEle.appendChild(currentPage);
        currentPageHeight = MAX_CONTENT_HEIGHT; // 重置当前页面剩余高度
      }
      if (totalRowLen <= 10) {
        // 少于10行直接插入
        const cloneNode = node.cloneNode(true); // 克隆节点，包括其子节点和属性
        currentPage.appendChild(cloneNode);
        currentPageHeight -= nodeHeight;
      } else {
        // 创建新的 table
        const tableHead = node.querySelector("thead");
        const curNewTable = document.createElement("table");
        curNewTable.appendChild(tableHead!.cloneNode(true));
        // 创建新的 tbody
        const newTbody = document.createElement("tbody");
        // 将每行分配到新的 tbody 中
        for (let i = 0; i < 10; i += 1) {
          newTbody.appendChild(tableBody[i].cloneNode(true));
        }
        // 将新的 tbody 添加到新的 table 中
        curNewTable.appendChild(newTbody);
        curNewTable.style.width = "100%";
        const cloneNode = curNewTable.cloneNode(true); // 克隆节点，包括其子节点和属性
        currentPage.appendChild(cloneNode);
        const tip = document.createElement("span");
        tip.innerText =
          "The PDF format cannot display the full content of the table. Please download it through the web version of the report.";
        tip.style.color = "grey";
        currentPage.appendChild(tip);
        currentPageHeight -= 30; // tip高度减去
        const rowHeight = 55;
        currentPageHeight -= rowHeight * Math.min(totalRowLen + 1, 11); // 需要计算table高度，因为未被渲染，无高度
      }
    } else if (node.querySelector("img")) {
      const StaticImgHeight = 498;
      const imgEle = node.querySelector("img");
      const imgAlt = imgEle?.getAttribute("alt");
      const imgSrc = imgEle?.getAttribute("src") || "";

      if (currentPageHeight < StaticImgHeight) {
        // 当前页面空间不足，需要新建一个页面
        currentPage = generateA4Page(pageConfig);
        pageEle.appendChild(currentPage);
        currentPageHeight = MAX_CONTENT_HEIGHT;
      }
      if (nodeHeight > StaticImgHeight && imgAlt !== "cover") {
        // 对于超过498高的图片，需要进行等比例缩放到固定高度498
        currentPage.appendChild(
          scaleImg("100%", `${StaticImgHeight}px`, imgSrc),
        );
        currentPageHeight -= StaticImgHeight + 16;
      } else if (imgAlt === "cover") {
        // 对于封面，需要进行2:1比例缩放
        const contentWidth = PAGE_WIDTH - 2 * MARGIN;
        const scaledImg = scaleImg(
          `${contentWidth}px`,
          `${contentWidth / 2}px`,
          imgSrc,
          "cover",
          "0",
        );
        scaledImg.style.marginBottom = "40px";
        currentPage.appendChild(scaledImg);
        currentPageHeight -= contentWidth / 2 + 40;
      } else {
        currentPage.appendChild(node.cloneNode(true));
        currentPageHeight -= nodeHeight;
      }
    } else if (
      nodeHeight > currentPageHeight &&
      nodeHeight < MAX_CONTENT_HEIGHT
    ) {
      // 其他超高内容需要插入新页，再插入
      currentPage = generateA4Page(pageConfig);
      pageEle.appendChild(currentPage);
      const cloneNode = node.cloneNode(true);
      currentPage.appendChild(cloneNode);
      currentPageHeight = MAX_CONTENT_HEIGHT;
      currentPageHeight -= nodeHeight;
    } else {
      // 将可以直接插入的节点添加到当前页面
      const cloneNode = node.cloneNode(true);
      currentPage?.appendChild(cloneNode);
      currentPageHeight -= nodeHeight;
    }
  }
}

1.2. 输出PDF

拿到上一步mockA4PageEle返回的dom，我们就可以进行输出PDF

核心关键方法为： addImage(curPageCanvasData,"JPEG",0,0,curPageCanvas.width,curPageCanvas.height)

PDF的输出主要流程：

将dom节点使用html2Canvas转化为图片canvasData
计算有多少页，也就是这个图片需要被分割成几页PDF，获取分割次数splitCount
分割图片，将当前页的内容绘制到另一个canvas上，再将当前页面的canvas使用addImage添加到PDF

此处分割页面的方法就是进行动态计算当前页面在canvasData的起始坐标y
当添加完一页内容之后，需要addPage，因为当前页已经填满了，需要创建新页面

typescript 复制代码

export const html2PDF = async (contentEle: HTMLElement, PDFTitle: string) => {
  try {
    // 创绘制切割后绘制canvas用的canvas标签以及对应的context对象
    const perCanvas = document.createElement("canvas");
    perCanvas.style.backgroundColor = "#fff";
    const context = perCanvas.getContext("2d");
    // document.body.appendChild(perCanvas);

    // 将需要下载的html标签转成canvas标签，并获取对应的base64码
    const canvas = await html2canvas(contentEle, {
      scrollX: 0,
      scrollY: 0,
    });
    const canvasData = canvas.toDataURL("image/jpeg", 1.0);

    // pdf的尺寸
    const pdfWidth = canvas.width;

    const pdfHeight = pdfWidth / 0.707;

    // 切割后的canvas图片的宽高，就等于每页pdf的宽高
    perCanvas.width = canvas.width;
    perCanvas.height = pdfHeight;

    const perHeight = pdfHeight;

    // 计算切割次数
    const splitCount = Math.floor(canvas.height / perHeight);

    // if (splitCount * perHeight < canvas.height) splitCount += 1;

    // 创建img对象，加载完整的canvas图片
    const img = new Image();
    img.src = canvasData;

    // 创建pdf对象
    const pdf = new JSPDF("p", "pt", [pdfWidth, pdfHeight]);
    // 待图片加载完成
    img.onload = () => {
      // console.log("loaded");

      // 切割canvas图片，贴到每一页pdf中
      for (let i = 0; i < splitCount; i += 1) {
        const startY = i * perHeight; // 起始y坐标

        // 清空画布
        context!.clearRect(0, 0, perCanvas.width, pdfHeight);
        context!.fillStyle = "#fff";
        context!.fillRect(0, 0, perCanvas.width, pdfHeight);
        // 绘制当前切割区域的图片
        context!.drawImage(
          img,
          0,
          startY,
          perCanvas.width,
          perHeight,
          0,
          0,
          perCanvas.width,
          perHeight,
        );
        const perCanvasData = perCanvas.toDataURL("image/jpeg", 1.0);
        pdf.addImage(
          perCanvasData,
          "JPEG",
          0,
          0,
          perCanvas.width,
          perCanvas.height,
        );
        if (i < splitCount - 1) pdf.addPage();
      }

      pdf.save(`${PDFTitle}.pdf`);
    };
    document.body.removeChild(contentEle);
  } catch (error) {
    console.warn(error);
  }
};

纯前端实现PDF、MD、WORD文件导出

前言

需求描述

难点疑点

1. 数据格式转化

2. 样式处理

技术方案

具体实现

生成md文件

1.2. 输出md文件

生成word文件

1.1. dom节点获取

1.2. 节点转化

1.3. 处理节点

1.4. 生成word文件对象

1.5. 导出文件

生成PDF文件

1.1. 模拟分页

1.1.1. 创建PDF内容父容器

1.1.2. 分页

1.2. 输出PDF