使用docx4j转换word为pdf处理中文乱码问题

word转pdf

实现方法

java 复制代码
import org.docx4j.Docx4J;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import cn.hutool.core.io.FileUtil;
import java.io.*;

@Slf4j
@Service
public class FileService {
    /**
     * 获取pdf文件,通过word文件转换
     *
     * @param file word文件
     * @date 2023/08/26 23:13
     */
    public File getPdfByWordFile(File file) {
        Mapper fontMapper = new IdentityPlusMapper();
        fontMapper.put("隶书", PhysicalFonts.get("LiSu"));
        fontMapper.put("宋体", PhysicalFonts.get("SimSun"));
        fontMapper.put("微软雅黑", PhysicalFonts.get("Microsoft Yahei"));
        fontMapper.put("黑体", PhysicalFonts.get("SimHei"));
        fontMapper.put("楷体", PhysicalFonts.get("KaiTi"));
        fontMapper.put("新宋体", PhysicalFonts.get("NSimSun"));
        fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));
        fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));
        fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));
        fontMapper.put("幼圆", PhysicalFonts.get("YouYuan"));
        fontMapper.put("华文宋体", PhysicalFonts.get("STSong"));
        fontMapper.put("华文中宋", PhysicalFonts.get("STZhongsong"));
        fontMapper.put("等线", PhysicalFonts.get("SimSun"));
        fontMapper.put("等线 Light", PhysicalFonts.get("SimSun"));
        fontMapper.put("华文琥珀", PhysicalFonts.get("STHupo"));
        fontMapper.put("华文隶书", PhysicalFonts.get("STLiti"));
        fontMapper.put("华文新魏", PhysicalFonts.get("STXinwei"));
        fontMapper.put("华文彩云", PhysicalFonts.get("STCaiyun"));
        fontMapper.put("方正姚体", PhysicalFonts.get("FZYaoti"));
        fontMapper.put("方正舒体", PhysicalFonts.get("FZShuTi"));
        fontMapper.put("华文细黑", PhysicalFonts.get("STXihei"));
        fontMapper.put("宋体扩展", PhysicalFonts.get("simsun-extB"));
        fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));
        fontMapper.put("新細明體", PhysicalFonts.get("SimSun"));
        //解决宋体(正文)和宋体(标题)的乱码问题
        PhysicalFonts.put("PMingLiU", PhysicalFonts.get("SimSun"));
        PhysicalFonts.put("新細明體", PhysicalFonts.get("SimSun"));

        //输出空文件
        File outFile = FileUtil.createTempFile(".pdf", true);
        WordprocessingMLPackage pkg = null;
        try (InputStream resources = FileUtil.getInputStream(file)) {
            pkg = Docx4J.load(resources);
            pkg.setFontMapper(fontMapper);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }

        try (FileOutputStream outputStream = new FileOutputStream(outFile)) {
            Docx4J.toPDF(pkg, outputStream);
        } catch (Exception e) {
            log.error("生成pdf文件异常");
        }
        return outFile;
    }
}

maven

docx4j版本自己酌情升级

可能存在漏洞

Dependency maven:org.apache.xmlgraphics:xmlgraphics-commons:2.3 is vulnerable

Upgrade to 2.9

CVE-2020-11988, Score: 8.2

Apache XmlGraphics Commons 2.4 and earlier is vulnerable to server-side request forgery, caused by improper input validation by the XMPParser. By using a specially-crafted argument, an attacker could exploit this vulnerability to cause the underlying server to make arbitrary GET requests. Users should upgrade to 2.6 or later.

xml 复制代码
        <!-- word转pdf -->
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-JAXB-Internal</artifactId>
            <version>8.2.4</version>
            <exclusions>
                <exclusion>
                    <groupId>xerces</groupId>
                    <artifactId>xercesImpl</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-export-fo</artifactId>
            <version>8.2.4</version>
        </dependency>
相关推荐
开发小能手-roy13 分钟前
Java集合框架选型指南:从ArrayList到ConcurrentSkipListMap
java·开发语言
凡人叶枫29 分钟前
Effective C++ 条款41:了解隐式接口和编译期多态
java·开发语言·c++·effective c++
凡人叶枫33 分钟前
Effective C++ 条款42:了解 typename 的双重意义
java·linux·服务器·c++
chushiyunen1 小时前
java中的路径处理、左右斜杠
java·开发语言·python
yyxx4121231 小时前
上海企业如何选择专业的钉钉服务商
java·大数据·人工智能·钉钉
一杯奶茶¥1 小时前
水果销售网站 CRM客户信息管理系统 超市管理系 酒店管理系统 健身房管理系统 在线音乐网站 校园招聘系统
java·vue.js·spring boot·mysql·spring·java项目
重生之后端学习1 小时前
Java入门
java·开发语言·职场和发展
碧海蓝天20221 小时前
C++法则24:在标准 C++ 中,没有任何可移植的方式判断指针 T* pt 指向的内存位置是否已经 构造了对象,程序员必须手动跟踪哪些元素已构造。
java·开发语言·c++
один but you2 小时前
const和constexpr常量表达式
java·前端·javascript
码云数智-大飞2 小时前
RAII 与智能指针深度拆解
java·前端·算法