【大数据技术-联邦集群RBF】DFSRouter日志一直打印修改Membership为EXPIRED状态的日志分析

生产环境遇到下面报错

复制代码
2025-04-23 17:44:15,780 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED

报错原因是,之前子集群配置了3个router,2个nn,然后会向StateStore中存储6个MembershipState。

后来,将子集群的router停了两个,只运行一个router,这样的后果就是会在运行的router日志发现上面报错。

因为router会周期性下载MembershipState,每次都会去检查是否过期,而我们停了2个Router,这俩Router之前和NameNode形成Membership并上报到了StateStore,并且我们关闭了删除过期记录的参数dfs.federation.router.store.membership.expiration.deletion,所以,会在运行的Router中打印上面报错。

修复做法,选择下面之一都可以:

  1. 开启删除过期参数
    1. dfs.federation.router.store.membership.expiration默认未5min,若设置dfs.federation.router.store.membership.expiration.deletion=2min,则表示membership过期了(超过5min没汇报),在等2min就删除它。
  2. 启动已停止的router

参考源码

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

java 复制代码
  public void overrideExpiredRecords(QueryResult<R> query) throws IOException {
    List<R> commitRecords = new ArrayList<>();
    List<R> deleteRecords = new ArrayList<>();
    List<R> newRecords = query.getRecords();
    long currentDriverTime = query.getTimestamp();
    if (newRecords == null || currentDriverTime <= 0) {
      LOG.error("Cannot check overrides for record");
      return;
    }
    for (R record : newRecords) {
      if (record.shouldBeDeleted(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        if (getDriver().remove(record)) {
          deleteRecords.add(record);
          LOG.info("Deleted State Store record {}: {}", recordName, record);
        } else {
          LOG.warn("Couldn't delete State Store record {}: {}", recordName,
              record);
        }
      } else if (record.checkExpired(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        LOG.info("Override State Store record {}: {}", recordName, record);
        commitRecords.add(record);
      }
    }
    if (commitRecords.size() > 0) {
      getDriver().putAll(commitRecords, true, false);
    }
    if (deleteRecords.size() > 0) {
      newRecords.removeAll(deleteRecords);
    }
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#checkExpired

java 复制代码
   @Override
  public boolean checkExpired(long currentTime) {
    if (super.checkExpired(currentTime)) {
      this.setState(EXPIRED);
      // Commit it
      return true;
    }
    return false;
  }

 public boolean checkExpired(long currentTime) {
    long expiration = getExpirationMs();
    long modifiedTime = getDateModified();
    if (modifiedTime > 0 && expiration > 0) {
      return (modifiedTime + expiration) < currentTime;
    }
    return false;
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#shouldBeDeleted

java 复制代码
public boolean shouldBeDeleted(long currentTime) {
  long deletionTime = getDeletionMs();
  if (isExpired() && deletionTime > 0) {
    long elapsedTime = currentTime - (getDateModified() + getExpirationMs());
    return elapsedTime > deletionTime;
  } else {
    return false;
  }
}
相关推荐
hongtianzai17 小时前
Laravel9.X核心特性全解析
android·java·数据库
七夜zippoe17 小时前
OpenClaw 会话管理:单聊、群聊、多模型
大数据·人工智能·fastapi·token·openclaw
小陈工17 小时前
2026年3月22日技术资讯洞察:数据库优化进入预测时代,网络安全威胁全面升级
java·开发语言·数据库·python·安全·web安全·django
小胖java17 小时前
养老院管理系统
java·spring boot
爱丽_17 小时前
synchronized到底锁的是什么:对象头 Mark Word、Monitor、锁升级与排查
java
ywlovecjy17 小时前
Tomcat下载,安装,配置终极版(2024)
java·tomcat
二进制person17 小时前
JavaEE初阶 --JVM
java·java-ee
北风toto17 小时前
IDEA模块名字和文件夹名字不一样的解决方式
java·ide·intellij-idea
程途知微17 小时前
synchronized锁升级全流程解析
java
亓才孓17 小时前
[Java笔试]易错点总结
java·开发语言