解析邮件文本内容; Mime文本解析; MimeStreamParser; multipart解析

原始文本

bash 复制代码
------=_Part_46705_715015081.1699589700255
Content-Type: text/html;charset=UTF-8
Content-Transfer-Encoding: base64

PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW
VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt
bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC
BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog
ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2
R5Pgo8L2h0bWw+
------=_Part_46705_715015081.1699589700255--

Maven

xml 复制代码
 <dependency>
     <groupId>org.apache.james</groupId>
     <artifactId>apache-mime4j-core</artifactId>
     <version>0.8.9</version>
 </dependency>

解析方法

java 复制代码
String data = "------=_Part_46705_715015081.1699589700255\n" +
        "Content-Type: text/html;charset=UTF-8\n" +
        "Content-Transfer-Encoding: base64\n" +
        "\n" +
        "PGh0bWw+CiAgICA8aGVhZD4KICAgICAgICA8bWV0YSBodHRwLW\n" +
        "VxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRt\n" +
        "bDsgY2hhcnNldD1VVEYtOCI+CiAgICAgICAgPHRpdGxlPkpTUC\n" +
        "BQYWdlPC90aXRsZT4KICAgIDwvaGVhZD4KICAgIDxib2R5Pgog\n" +
        "ICAgICAgIDxoMT5IZWxsbyBXb3JsZCE8L2gxPgogICAgPC9ib2\n" +
        "R5Pgo8L2h0bWw+\n" +
        "------=_Part_46705_715015081.1699589700255--";
System.out.println(data);
HtmContentHandler contentHandler = new HtmContentHandler();
MimeConfig mime4jParserConfig = MimeConfig.DEFAULT;
BodyDescriptorBuilder bodyDescriptorBuilder = new DefaultBodyDescriptorBuilder();
MimeStreamParser mime4jParser = new MimeStreamParser(mime4jParserConfig, DecodeMonitor.SILENT, bodyDescriptorBuilder);
mime4jParser.setContentDecoding(true);
mime4jParser.setContentHandler(contentHandler);
mime4jParser.parse(new ByteArrayInputStream(data.getBytes(UTF_8)));
System.out.println(contentHandler.getData());

HtmContentHandler

java 复制代码
import org.apache.commons.io.IOUtils;
import org.apache.james.mime4j.MimeException;
import org.apache.james.mime4j.dom.Header;
import org.apache.james.mime4j.field.ContentTypeFieldImpl;
import org.apache.james.mime4j.message.SimpleContentHandler;
import org.apache.james.mime4j.stream.BodyDescriptor;
import org.apache.james.mime4j.stream.Field;

import java.io.IOException;
import java.io.InputStream;
import java.util.Optional;

/**
 * @author zengrenyuan
 * @date 2023/11/10
 **/
public class HtmContentHandler extends SimpleContentHandler {
    private String data;
    private String charset;
    private String contentType;

    @Override
    public void body(BodyDescriptor bd, InputStream is) throws MimeException, IOException {
        this.data = IOUtils.toString(is, Optional.ofNullable(charset).orElse("UTF-8"));
        //这里可以处理文本内容
    }

    @Override
    public void headers(Header header) {
         //在这里解析头信息
        Field contentType = header.getField("Content-Type");
        if (contentType != null) {
            if (contentType instanceof ContentTypeFieldImpl) {
                this.contentType = ((ContentTypeFieldImpl) contentType).getMimeType();
                charset = ((ContentTypeFieldImpl) contentType).getParameter("charset");
            }
        }
    }
    public String getData() {
        return data;
    }

    public String getCharset() {
        return charset;
    }

    public String getContentType() {
        return contentType;
    }
}

参考资料

https://james.apache.org/mime4j/index.html

https://github.com/apache/james-mime4j

如果想解析一段Email数据也可以参考

https://github.com/ram-sharma-6453/email-mime-parser

相关推荐
方也_arkling17 小时前
【Java-Day08】static / final / 枚举
java·开发语言
橙淮17 小时前
Spring Bean作用域与生命周期全解析
java·spring
Chengbei1118 小时前
一站式源码安全检测工具、云安全 / APP / 小程序源码敏感信息递归多层目录扫描AK、JWT、手机号、身份证等敏感信息
java·开发语言·安全·web安全·网络安全·系统安全·安全架构
llz_11218 小时前
web-第一次课后作业
java·开发语言·idea
秋918 小时前
Java项目运行5天左右自动宕机:系统性定位与解决方案
java·开发语言·python
小江的记录本18 小时前
【JVM虚拟机】垃圾回收GC:垃圾收集器:CMS:核心原理、回收流程、优缺点、废弃原因(附《思维导图》+《面试高频考点清单》)
java·jvm·后端·python·spring·面试·maven
DIY源码阁18 小时前
JavaSwing学生成绩管理系统 - MySQL版
java·数据库·mysql·eclipse
basketball61619 小时前
C++ NULL 和 nullptr 区别 以及 nullptr 的核心实现
java·开发语言·c++
JAVA面经实录91720 小时前
MyBatis面试题库
java·mybatis
小江的记录本20 小时前
【JVM虚拟机】垃圾回收GC:垃圾回收算法:标记-清除、标记-复制、标记-整理、分代收集(附《思维导图》+《面试高频考点清单》)
java·jvm·后端·python·算法·安全·面试