使用 Apache POI 和 Apache PDFBox 实现 DOCX 转 PDF
Apache POI 用于读取 DOCX 文件内容,Apache PDFBox 用于生成 PDF 文件。需添加以下依赖:
XML
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
代码实现:
java
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class DocxToPdfConverter {
public static void convert(String docxPath, String pdfPath) throws Exception {
try (XWPFDocument doc = new XWPFDocument(new FileInputStream(docxPath));
PDDocument pdfDoc = new PDDocument()) {
PDPage page = new PDPage();
pdfDoc.addPage(page);
try (PDPageContentStream contentStream = new PDPageContentStream(pdfDoc, page)) {
contentStream.setFont(PDType1Font.HELVETICA, 12);
contentStream.beginText();
contentStream.newLineAtOffset(25, 700);
for (XWPFParagraph paragraph : doc.getParagraphs()) {
contentStream.showText(paragraph.getText());
contentStream.newLineAtOffset(0, -15);
}
contentStream.endText();
}
pdfDoc.save(pdfPath);
}
}
}
使用 LibreOffice 命令行转换
通过调用 LibreOffice 的命令行工具实现格式转换,需先安装 LibreOffice:
java
public class LibreOfficeConverter {
public static void convert(String inputPath, String outputPath) throws Exception {
String command = "libreoffice --headless --convert-to pdf " + inputPath + " --outdir " + outputPath;
Process process = Runtime.getRuntime().exec(command);
process.waitFor();
}
}
使用第三方库 docx4j 和 flying-saucer-pdf
docx4j 处理 DOCX 文件,flying-saucer-pdf 生成 PDF:
XML
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j</artifactId>
<version>11.4.4</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>9.1.22</version>
</dependency>
转换代码:
java
import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import java.io.FileOutputStream;
public class Docx4jConverter {
public static void convert(String docxPath, String pdfPath) throws Exception {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File(docxPath));
Docx4J.toPDF(wordMLPackage, new FileOutputStream(pdfPath));
}
}
注意事项
DOCX 转 PDF 的保真度取决于所选工具。Apache POI 方案对复杂格式支持有限,LibreOffice 转换效果最佳但需安装软件,docx4j 方案对中文支持可能需要额外配置字体。