使用docx4j转换word为pdf处理中文乱码问题

word转pdf

实现方法

java 复制代码
import org.docx4j.Docx4J;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import cn.hutool.core.io.FileUtil;
import java.io.*;

@Slf4j
@Service
public class FileService {
    /**
     * 获取pdf文件,通过word文件转换
     *
     * @param file word文件
     * @date 2023/08/26 23:13
     */
    public File getPdfByWordFile(File file) {
        Mapper fontMapper = new IdentityPlusMapper();
        fontMapper.put("隶书", PhysicalFonts.get("LiSu"));
        fontMapper.put("宋体", PhysicalFonts.get("SimSun"));
        fontMapper.put("微软雅黑", PhysicalFonts.get("Microsoft Yahei"));
        fontMapper.put("黑体", PhysicalFonts.get("SimHei"));
        fontMapper.put("楷体", PhysicalFonts.get("KaiTi"));
        fontMapper.put("新宋体", PhysicalFonts.get("NSimSun"));
        fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));
        fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));
        fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));
        fontMapper.put("幼圆", PhysicalFonts.get("YouYuan"));
        fontMapper.put("华文宋体", PhysicalFonts.get("STSong"));
        fontMapper.put("华文中宋", PhysicalFonts.get("STZhongsong"));
        fontMapper.put("等线", PhysicalFonts.get("SimSun"));
        fontMapper.put("等线 Light", PhysicalFonts.get("SimSun"));
        fontMapper.put("华文琥珀", PhysicalFonts.get("STHupo"));
        fontMapper.put("华文隶书", PhysicalFonts.get("STLiti"));
        fontMapper.put("华文新魏", PhysicalFonts.get("STXinwei"));
        fontMapper.put("华文彩云", PhysicalFonts.get("STCaiyun"));
        fontMapper.put("方正姚体", PhysicalFonts.get("FZYaoti"));
        fontMapper.put("方正舒体", PhysicalFonts.get("FZShuTi"));
        fontMapper.put("华文细黑", PhysicalFonts.get("STXihei"));
        fontMapper.put("宋体扩展", PhysicalFonts.get("simsun-extB"));
        fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));
        fontMapper.put("新細明體", PhysicalFonts.get("SimSun"));
        //解决宋体(正文)和宋体(标题)的乱码问题
        PhysicalFonts.put("PMingLiU", PhysicalFonts.get("SimSun"));
        PhysicalFonts.put("新細明體", PhysicalFonts.get("SimSun"));

        //输出空文件
        File outFile = FileUtil.createTempFile(".pdf", true);
        WordprocessingMLPackage pkg = null;
        try (InputStream resources = FileUtil.getInputStream(file)) {
            pkg = Docx4J.load(resources);
            pkg.setFontMapper(fontMapper);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }

        try (FileOutputStream outputStream = new FileOutputStream(outFile)) {
            Docx4J.toPDF(pkg, outputStream);
        } catch (Exception e) {
            log.error("生成pdf文件异常");
        }
        return outFile;
    }
}

maven

docx4j版本自己酌情升级

可能存在漏洞

Dependency maven:org.apache.xmlgraphics:xmlgraphics-commons:2.3 is vulnerable

Upgrade to 2.9

CVE-2020-11988, Score: 8.2

Apache XmlGraphics Commons 2.4 and earlier is vulnerable to server-side request forgery, caused by improper input validation by the XMPParser. By using a specially-crafted argument, an attacker could exploit this vulnerability to cause the underlying server to make arbitrary GET requests. Users should upgrade to 2.6 or later.

xml 复制代码
        <!-- word转pdf -->
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-JAXB-Internal</artifactId>
            <version>8.2.4</version>
            <exclusions>
                <exclusion>
                    <groupId>xerces</groupId>
                    <artifactId>xercesImpl</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-export-fo</artifactId>
            <version>8.2.4</version>
        </dependency>
相关推荐
RoboWizard7 分钟前
扩容刚需 金士顿新款Canvas Plus存储卡
java·spring·缓存·电脑·金士顿
lang2015092823 分钟前
Spring Boot 入门:5分钟搭建Hello World
java·spring boot·后端
失散1332 分钟前
分布式专题——47 ElasticSearch搜索相关性详解
java·分布式·elasticsearch·架构
serve the people35 分钟前
LangChain 表达式语言核心组合:Prompt + LLM + OutputParser
java·langchain·prompt
想ai抽37 分钟前
深入starrocks-多列联合统计一致性探查与策略(YY一下)
java·数据库·数据仓库
武子康1 小时前
Java-152 深入浅出 MongoDB 索引详解 从 MongoDB B-树 到 MySQL B+树 索引机制、数据结构与应用场景的全面对比分析
java·开发语言·数据库·sql·mongodb·性能优化·nosql
杰克尼1 小时前
JavaWeb_p165部门管理
java·开发语言·前端
longgyy1 小时前
5 分钟用火山引擎 DeepSeek 调用大模型生成小红书文案
java·数据库·火山引擎
一成码农1 小时前
JavaSE面向对象(下)
java·开发语言
Madison-No71 小时前
【C++】探秘vector的底层实现
java·c++·算法