MongoDB $in查询参数上限是多少个?

在 MongoDB 日常开发中,$in 是最常用的查询操作符之一,用于匹配字段值在指定数组中的文档。但很多开发者都会疑惑:$in 后的数组参数到底能放多少个值?是否存在硬性上限?本文将从 MongoDB 内核限制、实际验证、源码线索及最佳实践四个维度,彻底讲清楚这个问题。

1、$in 的参数上限不是固定值

MongoDB 4.x 中,查询 query 参数的长度并没有明确的硬性限制。MongoDB 并没有为 $in 操作符单独设定 "数组元素数量上限",但 $in 的参数会受限于两个关键全局限制,最终决定了实际可传入的参数数量:

  1. BSON 文档大小限制 :默认最大 16MB(16*1024* 1024 字节);MongoDB 使用 BSON(Binary JSON)格式存储和传输数据。根据 MongoDB 的设计,单个 BSON 文档的最大大小为 16MB。因此,无论查询条件多么复杂,整个查询请求的 BSON 表示不能超过 16MB。
  2. 查询条件的内存限制:MongoDB 会将查询条件加载到内存中,过大的数组会占用过多内存,触发性能问题甚至查询失败。

1.1 16MB源代码哪里定义的啊?

com.mongodb.internal.connection.MessageSettings

java 复制代码
@Immutable
public final class MessageSettings {
    private static final int DEFAULT_MAX_DOCUMENT_SIZE = 0x1000000;  // 16MB
    private static final int DEFAULT_MAX_MESSAGE_SIZE = 0x2000000;   // 32MB
    private static final int DEFAULT_MAX_BATCH_COUNT = 1000;

    private final int maxDocumentSize;
    private final int maxMessageSize;
    private final int maxBatchCount;
    private final int maxWireVersion;
    private final ServerType serverType;

    /**
     * Gets the builder
     *
     * @return the builder
     */
    public static Builder builder() {
        return new Builder();
    }

    /**
     * A MessageSettings builder.
     */
    @NotThreadSafe
    public static final class Builder {
        private int maxDocumentSize = DEFAULT_MAX_DOCUMENT_SIZE;
        private int maxMessageSize = DEFAULT_MAX_MESSAGE_SIZE;
        private int maxBatchCount = DEFAULT_MAX_BATCH_COUNT;
        private int maxWireVersion;
        private ServerType serverType;
...
}

1.2 限制验证大小源代码

com.mongodb.internal.connection.RequestMessage发现验证不是16MB,而是16MB+16KB。16KB是头空间大小

java 复制代码
abstract class RequestMessage {

    static final AtomicInteger REQUEST_ID = new AtomicInteger(1);

    static final int MESSAGE_PROLOGUE_LENGTH = 16;

    // Allow an extra 16K to the maximum allowed size of a query or command document, so that, for example,
    // a 16M document can be upserted via findAndModify
    private static final int DOCUMENT_HEADROOM = 16 * 1024;
...
protected void addDocument(final BsonDocument document, final BsonOutput bsonOutput,
                               final FieldNameValidator validator) {
        addDocument(document, getCodec(document), EncoderContext.builder().build(), bsonOutput, validator,
                    settings.getMaxDocumentSize() + DOCUMENT_HEADROOM, null);
    }

    protected void addDocument(final BsonDocument document, final BsonOutput bsonOutput,
                               final FieldNameValidator validator, final List<BsonElement> extraElements) {
        addDocument(document, getCodec(document), EncoderContext.builder().build(), bsonOutput, validator,
                settings.getMaxDocumentSize() + DOCUMENT_HEADROOM, extraElements);
    }

2、预估$in上线算法

Java 为保证跨平台一致性,将基础类型的字节数固定:int 固定 4 字节、long 固定 8 字节、char 固定 2 字节(支持 Unicode)。

BSON文档大小限制16MB,转成字节是16*1024*1024=16,777,216。

int类型4字节,16,777,216 / 4字节=4,194,304个。

long类型8个字节,16,777,216 / 8字节=2,097,152个。

3、验证int类型的上限个数

测试代码:

cpp 复制代码
public class Test {
	public static void main(String[] args) throws Exception {
		Injector injector = Guice.createInjector(new TransactionModule());
		MongoDao mongoDao = injector.getInstance(MongoDao.class);
		
		BasicDBObject query = new BasicDBObject();
		List<Integer> value = new ArrayList<Integer>();
		for (int i = 0; i < 1377250; i++) {
			value.add(i+1);
		}
		int size = value.size();
		while(true) {
			for (int i = 0; i < 10; i++) {
				value.add(++size);
			}
			System.out.println(value.size());
			query.append("id", new BasicDBObject("$in", value));
			mongoDao.query("tianlong8bu", "users", query, null);
		}
		
	}
}

代码运行结果,1377280报错超过限制。

java 复制代码
1377260
1377270
1377280
Exception in thread "main" org.bson.BsonMaximumSizeExceededException: Document size of 16793640 is larger than maximum of 16793600.
	at org.bson.BsonBinaryWriter.validateSize(BsonBinaryWriter.java:418)
	at org.bson.BsonBinaryWriter.backpatchSize(BsonBinaryWriter.java:412)
	at org.bson.BsonBinaryWriter.doWriteEndDocument(BsonBinaryWriter.java:133)
	at org.bson.AbstractBsonWriter.writeEndDocument(AbstractBsonWriter.java:307)
	at com.mongodb.internal.connection.BsonWriterDecorator.writeEndDocument(BsonWriterDecorator.java:53)
	at com.mongodb.internal.connection.LevelCountingBsonWriter.writeEndDocument(LevelCountingBsonWriter.java:48)
	at com.mongodb.internal.connection.ElementExtendingBsonWriter.writeEndDocument(ElementExtendingBsonWriter.java:43)
	at org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:121)
	at org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)
	at com.mongodb.internal.connection.RequestMessage.addDocument(RequestMessage.java:238)
	at com.mongodb.internal.connection.RequestMessage.addDocument(RequestMessage.java:188)
	at com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:161)
	at com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)
	at com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:62)
	at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:308)
	at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:114)
	at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:601)
	at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:81)
	at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:249)
	at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:214)
	at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:123)
	at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:113)
	at com.mongodb.internal.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:328)
	at com.mongodb.internal.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:318)
	at com.mongodb.internal.operation.CommandOperationHelper.executeCommandWithConnection(CommandOperationHelper.java:201)
	at com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:659)
	at com.mongodb.internal.operation.FindOperation$1.call(FindOperation.java:653)
	at com.mongodb.internal.operation.OperationHelper.withReadConnectionSource(OperationHelper.java:583)
	at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:653)
	at com.mongodb.internal.operation.FindOperation.execute(FindOperation.java:81)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:184)
	at com.mongodb.client.internal.MongoIterableImpl.execute(MongoIterableImpl.java:135)
	at com.mongodb.client.internal.MongoIterableImpl.iterator(MongoIterableImpl.java:92)
	at com.mongodb.client.internal.MongoIterableImpl.cursor(MongoIterableImpl.java:97)

限制是16MB,转成字节16*1024*1025/1377280 = 12字节。平均一个int类型占用12个字节,跟上面int类型占用4个字节差距比较大,相差8个字节。这个8个字节占用的呢?

我做了第二次实验,把int换成了Long类型,发现一个Long类型平均占用16个字节,long不是默认占用8个字节吗?也是相差8个字节。这个8个字节占用的呢?

4 分析IntegerCodec编码逻辑

MongoDB给基础类型都提供了对象的编码解码器,int对应的是IntegerCodec。

java 复制代码
public class IntegerCodec implements Codec<Integer> {

    @Override
    public void encode(final BsonWriter writer, final Integer value, final EncoderContext encoderContext) {
        writer.writeInt32(value);
    }

    @Override
    public Integer decode(final BsonReader reader, final DecoderContext decoderContext) {
        return decodeInt(reader);
    }

    @Override
    public Class<Integer> getEncoderClass() {
        return Integer.class;
    }
}

BsonWriterDecorator

java 复制代码
 @Override
    public void writeInt32(final int value) {
        bsonWriter.writeInt32(value);
    }

AbstractBsonWriter

java 复制代码
@Override
    public void writeInt32(final int value) {
        checkPreconditions("writeInt32", State.VALUE);
        doWriteInt32(value);
        setState(getNextState());
    }

BsonType

java 复制代码
 /**
     * A BSON 32-bit integer.
     */
    INT32(0x10),
    /**
     * A BSON timestamp.
     */
    TIMESTAMP(0x11),
    /**
     * A BSON 64-bit integer.
     */
    INT64(0x12),
    /**
     * A BSON Decimal128.
     *
     * @since 3.4
     */
    DECIMAL128(0x13),

BsonBinaryWriter的doWriteInt32核心点,doWriteInt32不单单写了int值,这估计就是8个字节多出来的原因了。

5 $不是数组而是伪装成数组的文档

MongoDB 的 BSON 数组不是 "纯值列表",而是伪装成数组的文档------ 每个元素对应一个「键为索引字符串(如 "0""1377280")

第一步:BSON 对每个值的类型做标记:int 是 0x10(1 字节)

第二步:writeCurrentName()数组索引键名,16的索引字符串是 "15", 1377280的索引字符串是"1377280",写出字符串的时候还要写个结束符0

第三步:bsonOutput.writeInt32写入数值

java 复制代码
 @Override
    protected void doWriteInt32(final int value) {
        bsonOutput.writeByte(BsonType.INT32.getValue());
        writeCurrentName();
        bsonOutput.writeInt32(value);
    }

private void writeCurrentName() {
        if (getContext().getContextType() == BsonContextType.ARRAY) {
            bsonOutput.writeCString(Integer.toString(getContext().index++));
        } else {
            bsonOutput.writeCString(getName());
        }
    }

OutputBuffer

java 复制代码
@Override
    public void writeInt32(final int value) {
        write(value >> 0);
        write(value >> 8);
        write(value >> 16);
        write(value >> 24);
    }

private int writeCharacters(final String str, final boolean checkForNullCharacters) {
        int len = str.length();
        int total = 0;

        for (int i = 0; i < len;) {
            int c = Character.codePointAt(str, i);

            if (checkForNullCharacters && c == 0x0) {
                throw new BsonSerializationException(format("BSON cstring '%s' is not valid because it contains a null character "
                                                            + "at index %d", str, i));
            }
            if (c < 0x80) {
                write((byte) c);
                total += 1;
            } else if (c < 0x800) {
                write((byte) (0xc0 + (c >> 6)));
                write((byte) (0x80 + (c & 0x3f)));
                total += 2;
            } else if (c < 0x10000) {
                write((byte) (0xe0 + (c >> 12)));
                write((byte) (0x80 + ((c >> 6) & 0x3f)));
                write((byte) (0x80 + (c & 0x3f)));
                total += 3;
            } else {
                write((byte) (0xf0 + (c >> 18)));
                write((byte) (0x80 + ((c >> 12) & 0x3f)));
                write((byte) (0x80 + ((c >> 6) & 0x3f)));
                write((byte) (0x80 + (c & 0x3f)));
                total += 4;
            }

            i += Character.charCount(c);
        }

        write((byte) 0);
        total++;
        return total;
    }

int类型占用4个字节,一个int类型做标记:int 是 0x10(1 字节),数组索引键名(字符串+结束符号)

元素序号范围 索引字符串 int 是 0x10(1 字节)+字符 + 结束符) 该范围元素数量
1-9 "0"-"8" 1+1+1=3 字节 9
10-99 "9"-"98" 1+2+1=4 字节 90
100-999 "99"-"998" 1+3+1=5 字节 900
1000-9999 "999"-"9998" 1+4+1=6 字节 9000
10000-99999 "9999"-"99998" 1+5+1=7 字节 90000
100000-999999 "99999"-"999998" 1+6+1=8 字节 900000
1000000-1377280 "999999"-"1377280" 1+7+1=9 字节 377281

6、总结

MongoDB $in 参数无固定数量上限,核心受 BSON 文档 16MB 默认限制(源码 MessageSettings 定义,实际校验含 16KB 头空间)。理论按 Java 基础类型字节估算 int 可放 419 万 +、Long 209 万 +,int类型4个字节,Long类型8个字节,但实测 int 平均 12 字节、Long 16 字节,均多 8 字节。根源是 BSON 数组元素需类型标识(1 字节)+ 索引字符串键名(含长度和结束符,百万级元素平均 7 字节)+ 字符串结束符+int值。建议结合编码逻辑和实际开销预估上限,避免超量导致查询失败。

相关推荐
西门吹雪分身2 小时前
Mongodb存储大文件
数据库·mongodb·文件存储·gridfs
IvorySQL6 小时前
PostgreSQL 技术日报 (3月11日)|4库合一性能提升350倍与内核新讨论
数据库·postgresql·开源
IvorySQL6 小时前
谁动了我的查询结果?PostgreSQL 联表加锁的隐藏陷阱
数据库·postgresql·开源
爱可生开源社区8 小时前
🧪 你的大模型实验室开张啦!亲手测出最懂你 SQL 的 AI
数据库·sql·llm
赵渝强老师12 小时前
【赵渝强老师】使用TiSpark在Spark中访问TiDB
数据库·mysql·tidb·国产数据库
Qinana14 小时前
第一次用向量数据库!手搓《天龙八部》RAG助手,让AI真正“懂”你
前端·数据库·后端
DolphinDB1 天前
集成 Prometheus 与 DolphinDB 规则引擎,构建敏捷监控解决方案
数据库
IvorySQL1 天前
PostgreSQL 技术日报 (3月10日)|IIoT 性能瓶颈与内核优化新讨论
数据库·postgresql·开源
DBA小马哥1 天前
时序数据库是什么?能源行业国产化替换的入门必看
数据库·时序数据库