使用docx4j转换word为pdf处理中文乱码问题

word转pdf

实现方法

java 复制代码
import org.docx4j.Docx4J;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import cn.hutool.core.io.FileUtil;
import java.io.*;

@Slf4j
@Service
public class FileService {
    /**
     * 获取pdf文件,通过word文件转换
     *
     * @param file word文件
     * @date 2023/08/26 23:13
     */
    public File getPdfByWordFile(File file) {
        Mapper fontMapper = new IdentityPlusMapper();
        fontMapper.put("隶书", PhysicalFonts.get("LiSu"));
        fontMapper.put("宋体", PhysicalFonts.get("SimSun"));
        fontMapper.put("微软雅黑", PhysicalFonts.get("Microsoft Yahei"));
        fontMapper.put("黑体", PhysicalFonts.get("SimHei"));
        fontMapper.put("楷体", PhysicalFonts.get("KaiTi"));
        fontMapper.put("新宋体", PhysicalFonts.get("NSimSun"));
        fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));
        fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));
        fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));
        fontMapper.put("幼圆", PhysicalFonts.get("YouYuan"));
        fontMapper.put("华文宋体", PhysicalFonts.get("STSong"));
        fontMapper.put("华文中宋", PhysicalFonts.get("STZhongsong"));
        fontMapper.put("等线", PhysicalFonts.get("SimSun"));
        fontMapper.put("等线 Light", PhysicalFonts.get("SimSun"));
        fontMapper.put("华文琥珀", PhysicalFonts.get("STHupo"));
        fontMapper.put("华文隶书", PhysicalFonts.get("STLiti"));
        fontMapper.put("华文新魏", PhysicalFonts.get("STXinwei"));
        fontMapper.put("华文彩云", PhysicalFonts.get("STCaiyun"));
        fontMapper.put("方正姚体", PhysicalFonts.get("FZYaoti"));
        fontMapper.put("方正舒体", PhysicalFonts.get("FZShuTi"));
        fontMapper.put("华文细黑", PhysicalFonts.get("STXihei"));
        fontMapper.put("宋体扩展", PhysicalFonts.get("simsun-extB"));
        fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));
        fontMapper.put("新細明體", PhysicalFonts.get("SimSun"));
        //解决宋体(正文)和宋体(标题)的乱码问题
        PhysicalFonts.put("PMingLiU", PhysicalFonts.get("SimSun"));
        PhysicalFonts.put("新細明體", PhysicalFonts.get("SimSun"));

        //输出空文件
        File outFile = FileUtil.createTempFile(".pdf", true);
        WordprocessingMLPackage pkg = null;
        try (InputStream resources = FileUtil.getInputStream(file)) {
            pkg = Docx4J.load(resources);
            pkg.setFontMapper(fontMapper);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }

        try (FileOutputStream outputStream = new FileOutputStream(outFile)) {
            Docx4J.toPDF(pkg, outputStream);
        } catch (Exception e) {
            log.error("生成pdf文件异常");
        }
        return outFile;
    }
}

maven

docx4j版本自己酌情升级

可能存在漏洞

Dependency maven:org.apache.xmlgraphics:xmlgraphics-commons:2.3 is vulnerable

Upgrade to 2.9

CVE-2020-11988, Score: 8.2

Apache XmlGraphics Commons 2.4 and earlier is vulnerable to server-side request forgery, caused by improper input validation by the XMPParser. By using a specially-crafted argument, an attacker could exploit this vulnerability to cause the underlying server to make arbitrary GET requests. Users should upgrade to 2.6 or later.

xml 复制代码
        <!-- word转pdf -->
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-JAXB-Internal</artifactId>
            <version>8.2.4</version>
            <exclusions>
                <exclusion>
                    <groupId>xerces</groupId>
                    <artifactId>xercesImpl</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-export-fo</artifactId>
            <version>8.2.4</version>
        </dependency>
相关推荐
爱上语文12 分钟前
Springboot的三层架构
java·开发语言·spring boot·后端·spring
serve the people15 分钟前
springboot 单独新建一个文件实时写数据,当文件大于100M时按照日期时间做文件名进行归档
java·spring boot·后端
qmx_071 小时前
HTB-Jerry(tomcat war文件、msfvenom)
java·web安全·网络安全·tomcat
为风而战1 小时前
IIS+Ngnix+Tomcat 部署网站 用IIS实现反向代理
java·tomcat
技术无疆3 小时前
快速开发与维护:探索 AndroidAnnotations
android·java·android studio·android-studio·androidx·代码注入
架构文摘JGWZ6 小时前
Java 23 的12 个新特性!!
java·开发语言·学习
拾光师7 小时前
spring获取当前request
java·后端·spring
aPurpleBerry7 小时前
neo4j安装启动教程+对应的jdk配置
java·neo4j
我是苏苏7 小时前
Web开发:ABP框架2——入门级别的增删改查Demo
java·开发语言
xujinwei_gingko7 小时前
Spring IOC容器Bean对象管理-Java Config方式
java·spring