解析邮件文本内容; Mime文本解析; MimeStreamParser; multipart解析

原始文本

bash 复制代码
------=_Part_46705_715015081.1699589700255
Content-Type: text/html;charset=UTF-8
Content-Transfer-Encoding: base64

PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW
VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt
bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC
BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog
ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2
R5Pgo8L2h0bWw+
------=_Part_46705_715015081.1699589700255--

Maven

xml 复制代码
 <dependency>
     <groupId>org.apache.james</groupId>
     <artifactId>apache-mime4j-core</artifactId>
     <version>0.8.9</version>
 </dependency>

解析方法

java 复制代码
String data = "------=_Part_46705_715015081.1699589700255\n" +
        "Content-Type: text/html;charset=UTF-8\n" +
        "Content-Transfer-Encoding: base64\n" +
        "\n" +
        "PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW\n" +
        "VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt\n" +
        "bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC\n" +
        "BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog\n" +
        "ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2\n" +
        "R5Pgo8L2h0bWw+\n" +
        "------=_Part_46705_715015081.1699589700255--";
System.out.println(data);
HtmContentHandler contentHandler = new HtmContentHandler();
MimeConfig mime4jParserConfig = MimeConfig.DEFAULT;
BodyDescriptorBuilder bodyDescriptorBuilder = new DefaultBodyDescriptorBuilder();
MimeStreamParser mime4jParser = new MimeStreamParser(mime4jParserConfig, DecodeMonitor.SILENT, bodyDescriptorBuilder);
mime4jParser.setContentDecoding(true);
mime4jParser.setContentHandler(contentHandler);
mime4jParser.parse(new ByteArrayInputStream(data.getBytes(UTF_8)));
System.out.println(contentHandler.getData());

HtmContentHandler

java 复制代码
import org.apache.commons.io.IOUtils;
import org.apache.james.mime4j.MimeException;
import org.apache.james.mime4j.dom.Header;
import org.apache.james.mime4j.field.ContentTypeFieldImpl;
import org.apache.james.mime4j.message.SimpleContentHandler;
import org.apache.james.mime4j.stream.BodyDescriptor;
import org.apache.james.mime4j.stream.Field;

import java.io.IOException;
import java.io.InputStream;
import java.util.Optional;

/**
 * @author zengrenyuan
 * @date 2023/11/10
 **/
public class HtmContentHandler extends SimpleContentHandler {
    private String data;
    private String charset;
    private String contentType;

    @Override
    public void body(BodyDescriptor bd, InputStream is) throws MimeException, IOException {
        this.data = IOUtils.toString(is, Optional.ofNullable(charset).orElse("UTF-8"));
        //这里可以处理文本内容
    }

    @Override
    public void headers(Header header) {
         //在这里解析头信息
        Field contentType = header.getField("Content-Type");
        if (contentType != null) {
            if (contentType instanceof ContentTypeFieldImpl) {
                this.contentType = ((ContentTypeFieldImpl) contentType).getMimeType();
                charset = ((ContentTypeFieldImpl) contentType).getParameter("charset");
            }
        }
    }
    public String getData() {
        return data;
    }

    public String getCharset() {
        return charset;
    }

    public String getContentType() {
        return contentType;
    }
}

参考资料

https://james.apache.org/mime4j/index.html

https://github.com/apache/james-mime4j

如果想解析一段Email数据也可以参考

https://github.com/ram-sharma-6453/email-mime-parser

相关推荐
像我这样帅的人丶你还7 小时前
Java 后端详解(四):分页与搜索
java·javascript·后端
她的男孩7 小时前
数据权限为什么不能只靠注解?Forge 的 Mapper 层 SQL 改写源码拆解
java·后端·架构
tntxia7 小时前
Mybatis的日志输入
java
亦暖筑序9 小时前
Java 8老系统Browser Agent实战:三层拦截把AI操作后台变成可审计流程
java·后端·设计模式
用户2986985301412 小时前
Java 实现 Word 文档加密与权限解除
java·后端
Yeats_Liao12 小时前
14:Servlet中的页面跳转-Java Web
java·后端·架构
未秃头的程序猿13 小时前
告别"if-else地狱"!Java 21模式匹配,代码优雅了10倍
java·后端·面试
鹤望兰67513 小时前
字节跳动国际支付-后端开发-三面面经
java
Flittly13 小时前
【AgentScope Java新手村系列】(14)人机交互
java·spring boot·spring
RainCity13 小时前
Java Swing 自定义组件库分享(十二)
java·笔记·后端