【大数据技术-联邦集群RBF】DFSRouter日志一直打印修改Membership为EXPIRED状态的日志分析

生产环境遇到下面报错

复制代码
2025-04-23 17:44:15,780 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED

报错原因是,之前子集群配置了3个router,2个nn,然后会向StateStore中存储6个MembershipState。

后来,将子集群的router停了两个,只运行一个router,这样的后果就是会在运行的router日志发现上面报错。

因为router会周期性下载MembershipState,每次都会去检查是否过期,而我们停了2个Router,这俩Router之前和NameNode形成Membership并上报到了StateStore,并且我们关闭了删除过期记录的参数dfs.federation.router.store.membership.expiration.deletion,所以,会在运行的Router中打印上面报错。

修复做法,选择下面之一都可以:

  1. 开启删除过期参数
    1. dfs.federation.router.store.membership.expiration默认未5min,若设置dfs.federation.router.store.membership.expiration.deletion=2min,则表示membership过期了(超过5min没汇报),在等2min就删除它。
  2. 启动已停止的router

参考源码

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

java 复制代码
  public void overrideExpiredRecords(QueryResult<R> query) throws IOException {
    List<R> commitRecords = new ArrayList<>();
    List<R> deleteRecords = new ArrayList<>();
    List<R> newRecords = query.getRecords();
    long currentDriverTime = query.getTimestamp();
    if (newRecords == null || currentDriverTime <= 0) {
      LOG.error("Cannot check overrides for record");
      return;
    }
    for (R record : newRecords) {
      if (record.shouldBeDeleted(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        if (getDriver().remove(record)) {
          deleteRecords.add(record);
          LOG.info("Deleted State Store record {}: {}", recordName, record);
        } else {
          LOG.warn("Couldn't delete State Store record {}: {}", recordName,
              record);
        }
      } else if (record.checkExpired(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        LOG.info("Override State Store record {}: {}", recordName, record);
        commitRecords.add(record);
      }
    }
    if (commitRecords.size() > 0) {
      getDriver().putAll(commitRecords, true, false);
    }
    if (deleteRecords.size() > 0) {
      newRecords.removeAll(deleteRecords);
    }
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#checkExpired

java 复制代码
   @Override
  public boolean checkExpired(long currentTime) {
    if (super.checkExpired(currentTime)) {
      this.setState(EXPIRED);
      // Commit it
      return true;
    }
    return false;
  }

 public boolean checkExpired(long currentTime) {
    long expiration = getExpirationMs();
    long modifiedTime = getDateModified();
    if (modifiedTime > 0 && expiration > 0) {
      return (modifiedTime + expiration) < currentTime;
    }
    return false;
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#shouldBeDeleted

java 复制代码
public boolean shouldBeDeleted(long currentTime) {
  long deletionTime = getDeletionMs();
  if (isExpired() && deletionTime > 0) {
    long elapsedTime = currentTime - (getDateModified() + getExpirationMs());
    return elapsedTime > deletionTime;
  } else {
    return false;
  }
}
相关推荐
田里的水稻2 分钟前
C++_队列编码实例,从末端添加对象,同时把头部的对象剔除掉,中的队列长度为设置长度NUM_OBJ
java·c++·算法
ponnylv11 分钟前
深入剖析Spring Boot启动流程
java·开发语言·spring boot·spring
前行的小黑炭27 分钟前
Android 协程的使用:结合一个环境噪音检查功能的例子来玩玩
android·java·kotlin
打码人的日常分享28 分钟前
运维服务方案,运维巡检方案,运维安全保障方案文件
大数据·运维·安全·word·安全架构
李少兄1 小时前
解决IntelliJ IDEA 提交代码时无复选框问题
java·ide·intellij-idea
cyforkk1 小时前
Spring Boot @RestController 注解详解
java·spring boot·后端
半夏陌离2 小时前
SQL 拓展指南:不同数据库差异对比(MySQL/Oracle/SQL Server 基础区别)
大数据·数据库·sql·mysql·oracle·数据库架构
叫我阿柒啊2 小时前
从Java全栈到前端框架:一次真实面试的深度复盘
java·spring boot·typescript·vue·database·testing·microservices
点云SLAM2 小时前
C++ 常见面试题汇总
java·开发语言·c++·算法·面试·内存管理
sniper_fandc2 小时前
IDEA修改系统缓存路径,防止C盘爆满
java·ide·intellij-idea