解析邮件文本内容; Mime文本解析; MimeStreamParser; multipart解析

原始文本

bash 复制代码
------=_Part_46705_715015081.1699589700255
Content-Type: text/html;charset=UTF-8
Content-Transfer-Encoding: base64

PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW
VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt
bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC
BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog
ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2
R5Pgo8L2h0bWw+
------=_Part_46705_715015081.1699589700255--

Maven

xml 复制代码
 <dependency>
     <groupId>org.apache.james</groupId>
     <artifactId>apache-mime4j-core</artifactId>
     <version>0.8.9</version>
 </dependency>

解析方法

java 复制代码
String data = "------=_Part_46705_715015081.1699589700255\n" +
        "Content-Type: text/html;charset=UTF-8\n" +
        "Content-Transfer-Encoding: base64\n" +
        "\n" +
        "PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW\n" +
        "VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt\n" +
        "bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC\n" +
        "BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog\n" +
        "ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2\n" +
        "R5Pgo8L2h0bWw+\n" +
        "------=_Part_46705_715015081.1699589700255--";
System.out.println(data);
HtmContentHandler contentHandler = new HtmContentHandler();
MimeConfig mime4jParserConfig = MimeConfig.DEFAULT;
BodyDescriptorBuilder bodyDescriptorBuilder = new DefaultBodyDescriptorBuilder();
MimeStreamParser mime4jParser = new MimeStreamParser(mime4jParserConfig, DecodeMonitor.SILENT, bodyDescriptorBuilder);
mime4jParser.setContentDecoding(true);
mime4jParser.setContentHandler(contentHandler);
mime4jParser.parse(new ByteArrayInputStream(data.getBytes(UTF_8)));
System.out.println(contentHandler.getData());

HtmContentHandler

java 复制代码
import org.apache.commons.io.IOUtils;
import org.apache.james.mime4j.MimeException;
import org.apache.james.mime4j.dom.Header;
import org.apache.james.mime4j.field.ContentTypeFieldImpl;
import org.apache.james.mime4j.message.SimpleContentHandler;
import org.apache.james.mime4j.stream.BodyDescriptor;
import org.apache.james.mime4j.stream.Field;

import java.io.IOException;
import java.io.InputStream;
import java.util.Optional;

/**
 * @author zengrenyuan
 * @date 2023/11/10
 **/
public class HtmContentHandler extends SimpleContentHandler {
    private String data;
    private String charset;
    private String contentType;

    @Override
    public void body(BodyDescriptor bd, InputStream is) throws MimeException, IOException {
        this.data = IOUtils.toString(is, Optional.ofNullable(charset).orElse("UTF-8"));
        //这里可以处理文本内容
    }

    @Override
    public void headers(Header header) {
         //在这里解析头信息
        Field contentType = header.getField("Content-Type");
        if (contentType != null) {
            if (contentType instanceof ContentTypeFieldImpl) {
                this.contentType = ((ContentTypeFieldImpl) contentType).getMimeType();
                charset = ((ContentTypeFieldImpl) contentType).getParameter("charset");
            }
        }
    }
    public String getData() {
        return data;
    }

    public String getCharset() {
        return charset;
    }

    public String getContentType() {
        return contentType;
    }
}

参考资料

https://james.apache.org/mime4j/index.html

https://github.com/apache/james-mime4j

如果想解析一段Email数据也可以参考

https://github.com/ram-sharma-6453/email-mime-parser

相关推荐
掘金詹姆斯2 分钟前
LangChain4j—人工智能服务 AIService(三)
java·人工智能
掘金詹姆斯3 分钟前
LangChain4j—聊天记忆 Chat memory(四)
java·人工智能
Chase_______3 分钟前
Java后端开发——分层解耦详解
java·开发语言·spring·web
喝可乐的布偶猫4 分钟前
Java----super 关键字
java·开发语言
篱笆院的狗12 分钟前
Java 中 ConcurrentHashMap 1.7 和 1.8 之间有哪些区别?
java·开发语言
与秋逐鹿¥33 分钟前
在Mybatis中为什么要同时指定扫描mapper接口和 mapper.xml 文件,理论单独扫描 xml 文件就可以啊
java·tomcat·mybatis
异常君44 分钟前
Netty Reactor 线程模型详解:构建高性能网络应用的关键
java·后端·netty
学习OK呀1 小时前
日常代码中加解密技术的使用
java·后端
Dcs1 小时前
Java 消息代理:企业集成的 5 项基本技术
java
木昜先生1 小时前
知识点:深入理解 Java 虚拟线程(Project Loom)
java·后端