Word模版生成Pdf

大家好，最近遇见了一个需求，需要根据一个特定的Word模版，生成不同的Pdf报表，心想这不是一个简单的需求吗？还不是手到擒来，但是最后到底是经验不足了，踩坑了，到底是任何需求都不能小看呀！

初始：直接使用word模版渲染，转化为pdf文件

在刚开始的时间，我采用了先进行word的模版的渲染，然后通过将word转为pdf的方案：

渲染：

采用poi-tl 库进行word模版的渲染，这个库是一个不错的关于操作word模版的第三方库，如果你的需求是生成word文档，那么它将是一个不错的选择

xml 复制代码

<dependency>
    <groupId>com.deepoove</groupId>
    <artifactId>poi-tl</artifactId>
    <version>1.10.3</version>
</dependency>

在这个示例代码中，进行了模版文件的加载，然后调用render 函数，传入上下文对象，最终可以将渲染的结果写入一个输出流中

java 复制代码

FileInputStream fileInputStream = new FileInputStream(filePath);
XWPFTemplate template = XWPFTemplate.compile(fileInputStream).render(
        contentMap);
bos = new ByteArrayOutputStream();
template.write(bos);
byte[] byteArray = bos.toByteArray(); // 生成的文件数组
template.close();
bos.close();

转化：

在本方案中，采用了documents4j 这个第三方库，它可以通过几行代码 很轻松的实现word转pdf，但是缺点在于转化的速度比较慢（一页的word文档转化花费了4s的时间），并且只能在windows平台下使用，无法在linux平台下使用。

xml 复制代码

<!--        word转pdf需要的内容-->
        <dependency>
            <groupId>com.documents4j</groupId>
            <artifactId>documents4j-local</artifactId>
            <version>1.0.3</version>
        </dependency>
        <dependency>
            <groupId>com.documents4j</groupId>
            <artifactId>documents4j-transformer-msoffice-word</artifactId>
            <version>1.0.3</version>
        </dependency>

java 复制代码

IConverter converter = LocalConverter.builder().build();
converter.convert(bis).as(DocumentType.DOCX).to(response.getOutputStream()).as(DocumentType.PDF).execute();

官方的解释，它利用 Microsft Office 的 APIs 来进行文档转换，如果想要使用的话，需要在linux中安装**OpenOffice/LibreOffice** ，这是网络上给出的可行的方案，但是对应的博客评论区的小伙伴也是反应无法适用，但是我我并未进行尝试

总结

如果你的最后生成结果是一个word文档，那么建议选择poi-tl来完成你的业务逻辑，如果需要生成pdf文件，切服务器环境是windows可以选择这种方案。

注意：word转pdf的第三方库中有效果较好的Aspose库， 但是是收费的，破解版需要的同学可网络上自行寻找，不建议使用到商业项目中， 减少不必要的麻烦

中期：将word模版转为html，后续根据html生成pdf文件

渲染：word转html

采用poi 实现word转html，使用过程中，简单的word文件，可以实现较好的转化效果，但是复杂的文件，带表格等复杂格式的文件， 无法较好的转化完成，出现了样式丢失的情况

xml 复制代码

<!-- 以下组件为html转pdf需要的内容-->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>4.1.2</version>
            <!--            <version>3.16</version>-->
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>4.1.2</version>
            <!--            <version>3.16</version>-->
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>4.1.2</version>
        </dependency>

        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>xdocreport</artifactId>
            <version>2.0.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>ooxml-schemas</artifactId>
            <version>1.4</version>
        </dependency>

以下示例代码仅适用于word2007版本的文件(.docx)

java 复制代码

 public static String Word2007ToHtml(InputStream input)
            throws IOException {
        XWPFDocument document = new XWPFDocument(input);
        // 2) 解析 XHTML配置 (这里设置IURIResolver来设置图片存放的目录)
        XHTMLOptions options = XHTMLOptions.create();
        Map<String, String> imgMap = new HashMap<>();
//        String preUrl = fileUtil.getPreUrl();
        options.setExtractor(new IImageExtractor() {
            @Override
            public void extract(String imagePath, byte[] imageData) throws IOException {
                // 生成文件名
                String substring = imagePath.substring(imagePath.lastIndexOf("."));
//                保存文件拼接url路径
                InputStream inputStream = new ByteArrayInputStream(imageData);
//                FileUtils.saveFile(inputStream,,substring);
//                todo 与上传组件有所关联
//              得到图片的路径
                String imageurl = "";
//              保存到静态文件目录下
                imgMap.put(imagePath, imageurl);
            }
        });
        // html中图片的路径 相对路径
        options.URIResolver(new IURIResolver() {
            @Override
            public String resolve(String uri) {
                //设置图片路径
                return imgMap.get(uri);
            }
        });

        options.setIgnoreStylesIfUnused(false);
        options.setFragment(true);

        // 3) 将 XWPFDocument转换成XHTML
        ByteArrayOutputStream baos = new ByteArrayOutputStream();

//        XHTMLConverter.getInstance().convert(document, baos, options);
        XHTMLConverter.getInstance().convert(document, baos, options);
//        String content = new String(baos.toByteArray(),"gbk");
        String content = baos.toString(StandardCharsets.UTF_8.name());

//        org.jsoup.nodes.Document parse = Jsoup.parse(content);
        System.out.println(content);
        System.out.println("------------------------------------");
        return content;
    }

转化：html转pdf

对于html转pdf的需求网络中有较多的第三方库，并且效果比直接word转pdf的要好，在此也给出我的代码

xml 复制代码

<dependency>
    <groupId>com.lowagie</groupId>
    <artifactId>itext</artifactId>
    <version>2.1.7</version>
</dependency>

java 复制代码

 private static byte[] toByteArray(InputStream input) throws IOException {
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        byte[] buffer = new byte[4096];
        int n = 0;
        while (-1 != (n = input.read(buffer))) {
            output.write(buffer, 0, n);
        }
        return output.toByteArray();
    }

    /**
* 设置BaseFont
*
*  @param  fontPath 字体路径
*  @return
 */
private static ConverterProperties creatBaseFont(String fontPath) throws IOException {
        ClassPathResource resource = null;
//        if(StrUtil.isBlank(fontPath)) {
        ClassPathResource classPathResource = new ClassPathResource("table/STSONG.TTF");
        InputStream resourceAsStream = classPathResource.getInputStream();

//        InputStream resourceAsStream = Pdf7Kit.class.getClassLoader().getResourceAsStream("table/STSONG.TTF");
//        resource = new ClassPathResource("");
//        }

//         FileInputStream fileInputStream = new FileInputStream("E:\javaProject2023\UndergraduateManagementSystem\UndergraduateManagementSystemBackend\NascentManagementSystem\src\main\resources\table\STSONG.TTF");
        ConverterProperties properties = new ConverterProperties();

        FontProvider fontProvider = new DefaultFontProvider();
        FontProgram fontProgram;
        try {
            fontProgram = FontProgramFactory.createFont(Pdf7Kit.toByteArray(resourceAsStream));
//            fontProgram = FontProgramFactory.createFont(Pdf7Kit.toByteArray(fileInputStream));
            fontProvider.addFont(fontProgram);
            properties.setFontProvider(fontProvider);

        } catch (IOException e) {
            log.error("creat base font erro", e);
        }
        return properties;
    }
    
    
    
    /**
    * 根据html转为pdf
    **/
     public static void creatPdfBycontent(String htmlContent, String pdfPath, String fontPath) throws IOException {
        if (StrUtil.isBlank(htmlContent) || StrUtil.isBlank(pdfPath)) {
            log.warn("html2pdf fail. htmlPath or pdfPath is null .");
            return;
        }
        // 拼接html路径
//        String src = htmlPath;

        ConverterProperties properties = creatBaseFont(fontPath);
        File file = new File(pdfPath);
        if (!file.exists()) {
            File parentFile = file.getParentFile();
            if (parentFile != null && !parentFile.exists()) {
                parentFile.mkdirs();
            }
            file.createNewFile();
            HtmlConverter.convertToPdf(htmlContent, new FileOutputStream(file), properties);

        }
    }

总结

在本方案中，通过将word转pdf的过程变为 word-html-pdf，但是也存在一些问题， word转html的复杂情况下不是太好，建议通过引入富文本编辑器 ，手动调整html模版的格式

最后：直接制作pdf模版的方式

直接制作pdf模版的方式，优点在于代码简单，缺点在于pdf模版的制作比较复杂，且无法动态添加表格，只能完成简单的数据填充
采用Java中的itext库进行渲染

参考文章: blog.csdn.net/weixin_4420...

总结

格式准确，实现效果较好，缺点在于无法添加动态表格，适用于固定情况下的模板渲染

最终方案

综合以上方案，因为是客户对于服务器环境没有要求，并且在Word模版的渲染过程中有动态表格的存在，最终决定采用方案一，线上服务器环境采用Windows Server版本。

注：如果大佬们有其他更好的方案，还请指教。文章如有错误之处，还请指出