解析邮件文本内容; Mime文本解析; MimeStreamParser; multipart解析

原始文本

bash 复制代码
------=_Part_46705_715015081.1699589700255
Content-Type: text/html;charset=UTF-8
Content-Transfer-Encoding: base64

PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW
VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt
bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC
BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog
ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2
R5Pgo8L2h0bWw+
------=_Part_46705_715015081.1699589700255--

Maven

xml 复制代码
 <dependency>
     <groupId>org.apache.james</groupId>
     <artifactId>apache-mime4j-core</artifactId>
     <version>0.8.9</version>
 </dependency>

解析方法

java 复制代码
String data = "------=_Part_46705_715015081.1699589700255\n" +
        "Content-Type: text/html;charset=UTF-8\n" +
        "Content-Transfer-Encoding: base64\n" +
        "\n" +
        "PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW\n" +
        "VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt\n" +
        "bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC\n" +
        "BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog\n" +
        "ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2\n" +
        "R5Pgo8L2h0bWw+\n" +
        "------=_Part_46705_715015081.1699589700255--";
System.out.println(data);
HtmContentHandler contentHandler = new HtmContentHandler();
MimeConfig mime4jParserConfig = MimeConfig.DEFAULT;
BodyDescriptorBuilder bodyDescriptorBuilder = new DefaultBodyDescriptorBuilder();
MimeStreamParser mime4jParser = new MimeStreamParser(mime4jParserConfig, DecodeMonitor.SILENT, bodyDescriptorBuilder);
mime4jParser.setContentDecoding(true);
mime4jParser.setContentHandler(contentHandler);
mime4jParser.parse(new ByteArrayInputStream(data.getBytes(UTF_8)));
System.out.println(contentHandler.getData());

HtmContentHandler

java 复制代码
import org.apache.commons.io.IOUtils;
import org.apache.james.mime4j.MimeException;
import org.apache.james.mime4j.dom.Header;
import org.apache.james.mime4j.field.ContentTypeFieldImpl;
import org.apache.james.mime4j.message.SimpleContentHandler;
import org.apache.james.mime4j.stream.BodyDescriptor;
import org.apache.james.mime4j.stream.Field;

import java.io.IOException;
import java.io.InputStream;
import java.util.Optional;

/**
 * @author zengrenyuan
 * @date 2023/11/10
 **/
public class HtmContentHandler extends SimpleContentHandler {
    private String data;
    private String charset;
    private String contentType;

    @Override
    public void body(BodyDescriptor bd, InputStream is) throws MimeException, IOException {
        this.data = IOUtils.toString(is, Optional.ofNullable(charset).orElse("UTF-8"));
        //这里可以处理文本内容
    }

    @Override
    public void headers(Header header) {
         //在这里解析头信息
        Field contentType = header.getField("Content-Type");
        if (contentType != null) {
            if (contentType instanceof ContentTypeFieldImpl) {
                this.contentType = ((ContentTypeFieldImpl) contentType).getMimeType();
                charset = ((ContentTypeFieldImpl) contentType).getParameter("charset");
            }
        }
    }
    public String getData() {
        return data;
    }

    public String getCharset() {
        return charset;
    }

    public String getContentType() {
        return contentType;
    }
}

参考资料

https://james.apache.org/mime4j/index.html

https://github.com/apache/james-mime4j

如果想解析一段Email数据也可以参考

https://github.com/ram-sharma-6453/email-mime-parser

相关推荐
禾小西14 分钟前
Java 逐梦力扣之旅_[204. 计数质数]
java·算法·leetcode
ゞ 正在缓冲99%…19 分钟前
leetcode295.数据流的中位数
java·数据结构·算法·leetcode·
有梦想的攻城狮2 小时前
spring-cloud-alibaba-nacos-config使用说明
java·spring·nacos·springcloud·配置中心
Yan-英杰3 小时前
【百日精通JAVA | SQL篇 | 第三篇】 MYSQL增删改查
java·数据库·sql
矛取矛求5 小时前
C++ 标准库参考手册深度解析
java·开发语言·c++
cijiancao5 小时前
23 种设计模式中的解释器模式
java·设计模式·解释器模式
南七行者5 小时前
对模板方法模式的理解
java·设计模式·模板方法
麻芝汤圆5 小时前
MapReduce 的广泛应用:从数据处理到智能决策
java·开发语言·前端·hadoop·后端·servlet·mapreduce
努力的搬砖人.5 小时前
java如何实现一个秒杀系统(原理)
java·经验分享·后端·面试
哈哈哈哈哈哈哈哈哈...........5 小时前
【java】在 Java 中,获取一个类的`Class`对象有多种方式
java·开发语言·python