MongoDB源码delete分析观察者getOpObserver()->onDelete

db.user.deleteOne({ "age" : 2828})的explain()输出结果如下,deleteOne执行的策略是DELETE>FETCH>IXSCAN,先按照FETCH>IXSCAN读取数据,再根据DELETE删除。

mongo/db/exec/delete.cpp的doWork,这个是DeleteStage获取自己儿子节点FetchStage,执行儿子节点FetchStage,FetchStage的doWork方法继续寻找自己儿子节点IndexScan,继续执行儿子节点IndexScan。DeleteStage获取到Document文档,继续调用collection()->deleteDocument()进行文档删除。

db.user.deleteOne({ "age" : 2828})核心类是DeleteStage,mongo/db/exec/delete.cpp的doWork,主要逻辑1、WriteUnitOfWork创建了一个写事务单元,这是 MongoDB 实现 ACID 特性的关键机制之一。在这个事务单元内的所有操作会被视为一个原子操作,要么全部成功提交,要么失败时全部回滚;2、调用集合的deleteDocument方法执行删除操作。getOpObserver()->onDelete通过 MongoDB 的 OpObserver 机制,在文档删除操作执行前触发自定义逻辑,是 MongoDB 提供的用于监控和扩展数据操作行为的重要接口,常见于数据库插件、审计工具或自定义业务逻辑的实现中

1、删除代码触发OpObserver

mongo/db/catalog/collection_impl.cpp的deleteDocument代码

cpp 复制代码
void CollectionImpl::deleteDocument(OperationContext* opCtx,
                                    StmtId stmtId,
                                    RecordId loc,
                                    OpDebug* opDebug,
                                    bool fromMigrate,
                                    bool noWarn,
                                    Collection::StoreDeletedDoc storeDeletedDoc) {
    if (isCapped()) {
        log() << "failing remove on a capped ns " << _ns;
        uasserted(10089, "cannot remove from a capped collection");
        return;
    }

    Snapshotted<BSONObj> doc = docFor(opCtx, loc);
    getGlobalServiceContext()->getOpObserver()->aboutToDelete(opCtx, ns(), doc.value());

    boost::optional<BSONObj> deletedDoc;
    if (storeDeletedDoc == Collection::StoreDeletedDoc::On) {
        deletedDoc.emplace(doc.value().getOwned());
    }

    int64_t keysDeleted;
    _indexCatalog->unindexRecord(opCtx, doc.value(), loc, noWarn, &keysDeleted);
    _recordStore->deleteRecord(opCtx, loc);

    getGlobalServiceContext()->getOpObserver()->onDelete(
        opCtx, ns(), uuid(), stmtId, fromMigrate, deletedDoc);

    if (opDebug) {
        opDebug->additiveMetrics.incrementKeysDeleted(keysDeleted);
    }
}

getGlobalServiceContext()->getOpObserver()->aboutToDelete(opCtx, ns(), doc.value());在删除前调用,此时文档还存在,可用于实现复制、审计等功能。

getGlobalServiceContext()->getOpObserver()->onDelete(opCtx, ns(), uuid(), stmtId, fromMigrate, deletedDoc);在删除后调用,可用于触发变更流 (Change Streams) 等功能。

在 MongoDB中,OpObserver(操作观察者)是一个核心接口,用于监听数据库的各种操作事件 。通过getGlobalServiceContext()->getOpObserver()获取的观察者实例注册了一系列与数据库操作相关的监听服务。mongo/db/op_observer.h定义代码如下:

cpp 复制代码
namespace mongo {

struct InsertStatement;
class OperationContext;

namespace repl {
class OpTime;
}  // namespace repl

/**
 * Holds document update information used in logging.
 */
struct OplogUpdateEntryArgs {
    CollectionUpdateArgs updateArgs;

    NamespaceString nss;
    CollectionUUID uuid;

    OplogUpdateEntryArgs(CollectionUpdateArgs updateArgs, NamespaceString nss, CollectionUUID uuid)
        : updateArgs(std::move(updateArgs)), nss(std::move(nss)), uuid(std::move(uuid)) {}
};

struct TTLCollModInfo {
    Seconds expireAfterSeconds;
    Seconds oldExpireAfterSeconds;
    std::string indexName;
};

class OpObserver {
public:
    enum class CollectionDropType {
        // The collection is being dropped immediately, in one step.
        kOnePhase,

        // The collection is being dropped in two phases, by renaming to a drop pending collection
        // which is registered to be reaped later.
        kTwoPhase,
    };

    virtual ~OpObserver() = default;

    virtual void onCreateIndex(OperationContext* opCtx,
                               const NamespaceString& nss,
                               CollectionUUID uuid,
                               BSONObj indexDoc,
                               bool fromMigrate) = 0;

    virtual void onStartIndexBuild(OperationContext* opCtx,
                                   const NamespaceString& nss,
                                   CollectionUUID collUUID,
                                   const UUID& indexBuildUUID,
                                   const std::vector<BSONObj>& indexes,
                                   bool fromMigrate) = 0;

    virtual void onCommitIndexBuild(OperationContext* opCtx,
                                    const NamespaceString& nss,
                                    CollectionUUID collUUID,
                                    const UUID& indexBuildUUID,
                                    const std::vector<BSONObj>& indexes,
                                    bool fromMigrate) = 0;

    virtual void onAbortIndexBuild(OperationContext* opCtx,
                                   const NamespaceString& nss,
                                   CollectionUUID collUUID,
                                   const UUID& indexBuildUUID,
                                   const std::vector<BSONObj>& indexes,
                                   const Status& cause,
                                   bool fromMigrate) = 0;

    virtual void onInserts(OperationContext* opCtx,
                           const NamespaceString& nss,
                           OptionalCollectionUUID uuid,
                           std::vector<InsertStatement>::const_iterator begin,
                           std::vector<InsertStatement>::const_iterator end,
                           bool fromMigrate) = 0;
    virtual void onUpdate(OperationContext* opCtx, const OplogUpdateEntryArgs& args) = 0;
    virtual void aboutToDelete(OperationContext* opCtx,
                               const NamespaceString& nss,
                               const BSONObj& doc) = 0;
  
    virtual void onDelete(OperationContext* opCtx,
                          const NamespaceString& nss,
                          OptionalCollectionUUID uuid,
                          StmtId stmtId,
                          bool fromMigrate,
                          const boost::optional<BSONObj>& deletedDoc) = 0;
  
    virtual void onInternalOpMessage(OperationContext* opCtx,
                                     const NamespaceString& nss,
                                     const boost::optional<UUID> uuid,
                                     const BSONObj& msgObj,
                                     const boost::optional<BSONObj> o2MsgObj) = 0;

    
    void onOpMessage(OperationContext* opCtx, const BSONObj& msgObj) {
        onInternalOpMessage(opCtx, {}, boost::none, msgObj, boost::none);
    }

    virtual void onCreateCollection(OperationContext* opCtx,
                                    Collection* coll,
                                    const NamespaceString& collectionName,
                                    const CollectionOptions& options,
                                    const BSONObj& idIndex,
                                    const OplogSlot& createOpTime) = 0;
   
    virtual void onCollMod(OperationContext* opCtx,
                           const NamespaceString& nss,
                           OptionalCollectionUUID uuid,
                           const BSONObj& collModCmd,
                           const CollectionOptions& oldCollOptions,
                           boost::optional<TTLCollModInfo> ttlInfo) = 0;
    virtual void onDropDatabase(OperationContext* opCtx, const std::string& dbName) = 0;

   
    virtual repl::OpTime onDropCollection(OperationContext* opCtx,
                                          const NamespaceString& collectionName,
                                          OptionalCollectionUUID uuid,
                                          std::uint64_t numRecords,
                                          CollectionDropType dropType) = 0;

   
    virtual void onDropIndex(OperationContext* opCtx,
                             const NamespaceString& nss,
                             OptionalCollectionUUID uuid,
                             const std::string& indexName,
                             const BSONObj& indexInfo) = 0;

   
    virtual repl::OpTime preRenameCollection(OperationContext* opCtx,
                                             const NamespaceString& fromCollection,
                                             const NamespaceString& toCollection,
                                             OptionalCollectionUUID uuid,
                                             OptionalCollectionUUID dropTargetUUID,
                                             std::uint64_t numRecords,
                                             bool stayTemp) = 0;
  
    virtual void postRenameCollection(OperationContext* opCtx,
                                      const NamespaceString& fromCollection,
                                      const NamespaceString& toCollection,
                                      OptionalCollectionUUID uuid,
                                      OptionalCollectionUUID dropTargetUUID,
                                      bool stayTemp) = 0;
  
    virtual void onRenameCollection(OperationContext* opCtx,
                                    const NamespaceString& fromCollection,
                                    const NamespaceString& toCollection,
                                    OptionalCollectionUUID uuid,
                                    OptionalCollectionUUID dropTargetUUID,
                                    std::uint64_t numRecords,
                                    bool stayTemp) = 0;

    virtual void onApplyOps(OperationContext* opCtx,
                            const std::string& dbName,
                            const BSONObj& applyOpCmd) = 0;
    virtual void onEmptyCapped(OperationContext* opCtx,
                               const NamespaceString& collectionName,
                               OptionalCollectionUUID uuid) = 0;

  
    virtual void onUnpreparedTransactionCommit(
        OperationContext* opCtx, const std::vector<repl::ReplOperation>& statements) = 0;
  
    virtual void onPreparedTransactionCommit(
        OperationContext* opCtx,
        OplogSlot commitOplogEntryOpTime,
        Timestamp commitTimestamp,
        const std::vector<repl::ReplOperation>& statements) noexcept = 0;

   
    virtual void onTransactionPrepare(OperationContext* opCtx,
                                      const std::vector<OplogSlot>& reservedSlots,
                                      std::vector<repl::ReplOperation>& statements) = 0;

   
    virtual void onTransactionAbort(OperationContext* opCtx,
                                    boost::optional<OplogSlot> abortOplogEntryOpTime) = 0;

    ...
    virtual void onReplicationRollback(OperationContext* opCtx,
                                       const RollbackObserverInfo& rbInfo) = 0;

    struct Times;

protected:
    class ReservedTimes;
};

struct OpObserver::Times {
    static Times& get(OperationContext*);

    std::vector<repl::OpTime> reservedOpTimes;

private:
    friend OpObserver::ReservedTimes;

    // Because `OpObserver`s are re-entrant, it is necessary to track the recursion depth to know
    // when to actually clear the `reservedOpTimes` vector, using the `ReservedTimes` scope object.
    int _recursionDepth = 0;
};


class OpObserver::ReservedTimes {
    ReservedTimes(const ReservedTimes&) = delete;
    ReservedTimes& operator=(const ReservedTimes&) = delete;

public:
    explicit ReservedTimes(OperationContext* const opCtx);
    ~ReservedTimes();

    const Times& get() const {
        return _times;
    }

private:
    Times& _times;
};

}  // namespace mongo

2、OpObserver 注册的核心监听服务

2.1. 数据库操作监听
  • onCreateCollection/onCollMod/preRenameCollection/onRenameCollection/onDropCollection。监听集合创建、修改、重命名,删除动作。
2.2. 数据操作监听
  • onInsert:插入操作前后的回调,用于审计、权限控制或数据验证。
  • onUpdate:更新操作前后的回调,可获取更新前和更新后的数据。
  • aboutToDelete/onDelete:删除操作前后的回调(如用户之前代码中使用的场景)。
2.3. 事务与并发控制监听
  • onUnpreparedTransactionCommit/onPreparedTransactionCommit/onTransactionPrepare/onTransactionAbort:监听事务生命周期,支持分布式事务的跟踪和日志记录。
2.4. 索引与存储相关监听
  • onCreateIndex/onDropIndex:监听索引创建和删除事件,用于索引管理和元数据跟踪。
  • onStartIndexBuild/onCommitIndexBuild/onAbortIndexBuild:监听索引构建过程,可用于监控长时间运行的索引操作。
2.5. 索引与存储相关监听
  • onDropDatabase:监听数据库删除事件

mongod启动db.cpp入口流程是:main-> mongoDbMain -> initAndListen -> _initAndListen; _initAndListen初始化OpObserverRegistry对象,OpObserverRegistry增加观察者OpObserverShardingImpl、AuthOpObserver...

mongo/db/db.cpp的_initAndListen观察者注册代码如下:

cpp 复制代码
 auto serviceContext = getGlobalServiceContext();

    serviceContext->setFastClockSource(FastClockSourceFactory::create(Milliseconds(10)));
    auto opObserverRegistry = std::make_unique<OpObserverRegistry>();
    opObserverRegistry->addObserver(std::make_unique<OpObserverShardingImpl>());
    opObserverRegistry->addObserver(std::make_unique<AuthOpObserver>());

    if (serverGlobalParams.clusterRole == ClusterRole::ShardServer) {
        opObserverRegistry->addObserver(std::make_unique<ShardServerOpObserver>());
    } else if (serverGlobalParams.clusterRole == ClusterRole::ConfigServer) {
        opObserverRegistry->addObserver(std::make_unique<ConfigServerOpObserver>());
    }
    setupFreeMonitoringOpObserver(opObserverRegistry.get());


    serviceContext->setOpObserver(std::move(opObserverRegistry));

常见OpObserver关系图如下:

serviceContext->setOpObserver(std::move(opObserverRegistry));上面观察者注册的是OpObserverRegistry对象。

getGlobalServiceContext()->getOpObserver()获取的对象是OpObserverRegistry;

3、执行观察者****OpObserver 监听流程

mongo/db/op_observer_impl.cpp的代码如下:

cpp 复制代码
    void aboutToDelete(OperationContext* const opCtx,
                       const NamespaceString& nss,
                       const BSONObj& doc) override {
        ReservedTimes times{opCtx};
        for (auto& o : _observers){
			 o->aboutToDelete(opCtx, nss, doc);
		}           
    }
    void onDelete(OperationContext* const opCtx,
                  const NamespaceString& nss,
                  OptionalCollectionUUID uuid,
                  StmtId stmtId,
                  bool fromMigrate,
                  const boost::optional<BSONObj>& deletedDoc) override {
        ReservedTimes times{opCtx};
        for (auto& o : _observers)
            o->onDelete(opCtx, nss, uuid, stmtId, fromMigrate, deletedDoc);
    }

MongoDB中OpObserverImpl类的onDelete方法实现,其主要功能是在删除操作发生时记录操作日志,并根据不同情况进行相应处理。/mongo/db/op_observer_impl.cpp中onDelete代码如下:

cpp 复制代码
void OpObserverImpl::onDelete(OperationContext* opCtx,
                              const NamespaceString& nss,
                              OptionalCollectionUUID uuid,
                              StmtId stmtId,
                              bool fromMigrate,
                              const boost::optional<BSONObj>& deletedDoc) {
    auto& documentKey = documentKeyDecoration(opCtx);
    invariant(!documentKey.isEmpty());
	std::cout<<"conca OpObserverImpl aboutToDelete...documentKey="<<documentKey<<std::endl;

    auto txnParticipant = TransactionParticipant::get(opCtx);
    const bool inMultiDocumentTransaction =
        txnParticipant && opCtx->writesAreReplicated() && txnParticipant.transactionIsOpen();

    OpTimeBundle opTime;
    if (inMultiDocumentTransaction) {
      ...
    } else {
		std::cout<<"conca OpObserverImpl aboutToDelete...replLogDelete"<<std::endl;
        opTime = replLogDelete(opCtx, nss, uuid, stmtId, fromMigrate, deletedDoc);
        SessionTxnRecord sessionTxnRecord;
        sessionTxnRecord.setLastWriteOpTime(opTime.writeOpTime);
        sessionTxnRecord.setLastWriteDate(opTime.wallClockTime);
        onWriteOpCompleted(opCtx, std::vector<StmtId>{stmtId}, sessionTxnRecord);
    }

    ...
}

inMultiDocumentTransaction代码块 如果是多文档事务的一部分,将删除操作添加到事务操作列表中

replLogDelete代码块非事务操作则直接记录删除操作到 oplog,并更新会话事务记录。replLogDelete代码是 MongoDB 中处理删除操作日志记录的核心函数,主要功能是将删除操作写入操作日志(oplog)。这个函数在分布式环境中非常重要,因为 oplog 是 MongoDB 复制集和分片集群实现数据一致性的基础。

/mongo/db/op_observer_impl.cpp中replLogDelete代码如下:

cpp 复制代码
/**
 * Write oplog entry(ies) for the delete operation.
 */
OpTimeBundle replLogDelete(OperationContext* opCtx,
                           const NamespaceString& nss,
                           OptionalCollectionUUID uuid,
                           StmtId stmtId,
                           bool fromMigrate,
                           const boost::optional<BSONObj>& deletedDoc) {
    MutableOplogEntry oplogEntry;
    oplogEntry.setNss(nss);
    oplogEntry.setUuid(uuid);

    repl::OplogLink oplogLink;
    repl::appendRetryableWriteInfo(opCtx, &oplogEntry, &oplogLink, stmtId);

    OpTimeBundle opTimes;

    if (deletedDoc && opCtx->getTxnNumber()) {
		std::cout<<"conca OpObserverImpl aboutToDelete...replLogDelete...getTxnNumber is not null"<<std::endl;

        MutableOplogEntry noopEntry = oplogEntry;
        noopEntry.setOpType(repl::OpTypeEnum::kNoop);
        noopEntry.setObject(deletedDoc.get());
        auto noteOplog = logOperation(opCtx, &noopEntry);
        opTimes.prePostImageOpTime = noteOplog;
        oplogLink.preImageOpTime = noteOplog;
    }

    oplogEntry.setOpType(repl::OpTypeEnum::kDelete);
    oplogEntry.setObject(documentKeyDecoration(opCtx));
    oplogEntry.setFromMigrateIfTrue(fromMigrate);
    // oplogLink could have been changed to include preImageOpTime by the previous no-op write.
    repl::appendRetryableWriteInfo(opCtx, &oplogEntry, &oplogLink, stmtId);
    opTimes.writeOpTime = logOperation(opCtx, &oplogEntry);
    opTimes.wallClockTime = oplogEntry.getWallClockTime();
    return opTimes;
}

if (deletedDoc && opCtx->getTxnNumber()) {}代码块:如果删除的文档存在且当前操作属于事务(有事务编号),创建一个特殊的 noop 类型操作日志,记录被删除文档的完整内容,这一步是为了支持 MongoDB 的变更流(Change Streams)和文档级别的回滚。

oplogEntry.setOpType(repl::OpTypeEnum::kDelete)代码块:记录删除操作:设置操作类型为 kDelete,设置删除的文档键documentKey(documentKeyDecoration(opCtx)值是{ _id: ObjectId('681dc7c924a18a334118bc27') }),调用 logOperation 函数将删除操作写入 oplog,返回操作时间(OpTime)和时钟时间。

4、oplog是MongoDB 实现复制集和分片集群的基础

/mongo/db/op_observer_impl.cpp中logOperation代码如下:

cpp 复制代码
repl::OpTime logOperation(OperationContext* opCtx, MutableOplogEntry* oplogEntry) {
    oplogEntry->setWallClockTime(getWallClockTimeForOpLog(opCtx));
    auto& times = OpObserver::Times::get(opCtx).reservedOpTimes;
    auto opTime = repl::logOp(opCtx, oplogEntry);
    times.push_back(opTime);
    return opTime;
}

这段代码是 MongoDB 中将操作写入 oplog(操作日志)的核心函数之一。oplog 是 MongoDB 实现复制集和分片集群的基础,所有对数据的变更都会记录在这里,用于在副本之间同步数据。

  1. 设置操作时间戳

    • 获取当前操作的时钟时间(wall clock time)
    • 将这个时间戳设置到 oplog 条目中
  2. 预留操作时间

    • 从操作上下文中获取 OpObserver::Times 结构
    • 这个结构包含一个 reservedOpTimes 列表,用于记录操作的时间戳
  3. 写入操作日志

    • 调用 repl::logOp 函数将 oplog 条目写入磁盘
    • 这个函数会分配一个唯一的操作时间戳(OpTime)
  4. 记录操作时间

    • 将分配的 OpTime 添加到预留时间列表中
    • 返回这个 OpTime 给调用者
相关推荐
dovens1 分钟前
PostgreSQL 中进行数据导入和导出
大数据·数据库·postgresql
IOT.FIVE.NO.11 分钟前
claude code desktop cowork报错解决和记录Workspace..The isolated Linux environment ...
linux·服务器·数据库
Rick199310 分钟前
mysql 慢查询怎么快速定位
android·数据库·mysql
科技小花7 小时前
全球化深水区,数据治理成为企业出海 “核心竞争力”
大数据·数据库·人工智能·数据治理·数据中台·全球化
X56618 小时前
如何在 Laravel 中正确保存嵌套动态表单数据(主服务与子服务)
jvm·数据库·python
虹科网络安全9 小时前
艾体宝干货|数据复制详解:类型、原理与适用场景
java·开发语言·数据库
2301_7717172110 小时前
解决mysql报错:1406, Data too long for column
android·数据库·mysql
小江的记录本10 小时前
【Kafka核心】架构模型:Producer、Broker、Consumer、Consumer Group、Topic、Partition、Replica
java·数据库·分布式·后端·搜索引擎·架构·kafka
dvjr cloi10 小时前
MySQL Workbench菜单汉化为中文
android·数据库·mysql
dFObBIMmai11 小时前
MySQL主从同步中大事务导致的延迟_如何拆分大事务优化同步
jvm·数据库·python