PDF文件转文件
1. 引入Maven依赖
主要使用了 pdfbox 包与 hutool 包。
pdfbox 负责 pdf 到图片的转换;
hutool 负责文件读取转换。
xml
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>5.8.34</version>
</dependency>
2. 代码实现
主要思路:
pdfbox 提供了操作输入流与操作字节数组的两种方式。
2.1 字节数组
java
public void pdf2Image() {
// 这边简单采用读取本地文件的形式
File file = new File("");
File outFile = new File("");
byte[] bytes = FileUtil.readBytes(file);
String formatName = "png";
try (PDDocument document = PDDocument.load(bytes)) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
// 将 BufferedImage 转换为字节数组
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int i = 0; i < numberOfPages; i++) {
// 渲染第一页为 BufferedImage
BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 50);
ImgUtil.write(bufferedImage, formatName, baos);
}
OutputStream outputStream = new FileOutputStream(outFile);
baos.writeTo(outputStream);
} catch (Exception e) {
e.printStackTrace();
}
}
通过字节数组可实现 pdf 文件转换为图片,但是这个代码在处理大文件时会一次性把文件读进内存导致内存溢出。
2.2 文件流
java
public void pdf2Image() {
File file = new File("");
File outFile = new File("");
String formatName = "png";
try (InputStream is = new BufferedInputStream(new FileInputStream(file))) {
PDDocument document = PDDocument.load(is, MemoryUsageSetting.setupTempFileOnly());
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
// 将 BufferedImage 转换为字节数组
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (int i = 0; i < numberOfPages; i++) {
// 渲染第一页为 BufferedImage
BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 50);
ImgUtil.write(bufferedImage, formatName, baos);
}
OutputStream outputStream = new FileOutputStream(outFile);
baos.writeTo(outputStream);
} catch (Exception e) {
e.printStackTrace();
}
}