【大数据技术-联邦集群RBF】DFSRouter日志一直打印修改Membership为EXPIRED状态的日志分析

生产环境遇到下面报错

复制代码
2025-04-23 17:44:15,780 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED

报错原因是,之前子集群配置了3个router,2个nn,然后会向StateStore中存储6个MembershipState。

后来,将子集群的router停了两个,只运行一个router,这样的后果就是会在运行的router日志发现上面报错。

因为router会周期性下载MembershipState,每次都会去检查是否过期,而我们停了2个Router,这俩Router之前和NameNode形成Membership并上报到了StateStore,并且我们关闭了删除过期记录的参数dfs.federation.router.store.membership.expiration.deletion,所以,会在运行的Router中打印上面报错。

修复做法,选择下面之一都可以:

  1. 开启删除过期参数
    1. dfs.federation.router.store.membership.expiration默认未5min,若设置dfs.federation.router.store.membership.expiration.deletion=2min,则表示membership过期了(超过5min没汇报),在等2min就删除它。
  2. 启动已停止的router

参考源码

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

java 复制代码
  public void overrideExpiredRecords(QueryResult<R> query) throws IOException {
    List<R> commitRecords = new ArrayList<>();
    List<R> deleteRecords = new ArrayList<>();
    List<R> newRecords = query.getRecords();
    long currentDriverTime = query.getTimestamp();
    if (newRecords == null || currentDriverTime <= 0) {
      LOG.error("Cannot check overrides for record");
      return;
    }
    for (R record : newRecords) {
      if (record.shouldBeDeleted(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        if (getDriver().remove(record)) {
          deleteRecords.add(record);
          LOG.info("Deleted State Store record {}: {}", recordName, record);
        } else {
          LOG.warn("Couldn't delete State Store record {}: {}", recordName,
              record);
        }
      } else if (record.checkExpired(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        LOG.info("Override State Store record {}: {}", recordName, record);
        commitRecords.add(record);
      }
    }
    if (commitRecords.size() > 0) {
      getDriver().putAll(commitRecords, true, false);
    }
    if (deleteRecords.size() > 0) {
      newRecords.removeAll(deleteRecords);
    }
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#checkExpired

java 复制代码
   @Override
  public boolean checkExpired(long currentTime) {
    if (super.checkExpired(currentTime)) {
      this.setState(EXPIRED);
      // Commit it
      return true;
    }
    return false;
  }

 public boolean checkExpired(long currentTime) {
    long expiration = getExpirationMs();
    long modifiedTime = getDateModified();
    if (modifiedTime > 0 && expiration > 0) {
      return (modifiedTime + expiration) < currentTime;
    }
    return false;
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#shouldBeDeleted

java 复制代码
public boolean shouldBeDeleted(long currentTime) {
  long deletionTime = getDeletionMs();
  if (isExpired() && deletionTime > 0) {
    long elapsedTime = currentTime - (getDateModified() + getExpirationMs());
    return elapsedTime > deletionTime;
  } else {
    return false;
  }
}
相关推荐
顾漂亮2 分钟前
Spring AOP 实战案例+避坑指南
java·后端·spring
SimonKing32 分钟前
Mybatis-Plus的竞争对手来了,试试 MyBatis-Flex
java·后端·程序员
光军oi38 分钟前
JAVA全栈JVM篇————初识JVM
java·开发语言·jvm
我命由我1234543 分钟前
PDFBox - PDFBox 加载 PDF 异常清单(数据为 null、数据为空、数据异常、文件为 null、文件不存在、文件异常)
java·服务器·后端·java-ee·pdf·intellij-idea·intellij idea
7哥♡ۣۖᝰꫛꫀꪝۣℋ1 小时前
Spring Boot
java·spring boot·后端
Moniane1 小时前
C++深度解析:从核心特性到现代编程实践
java·开发语言·jvm
攻城狮CSU1 小时前
C# 数据加载专题 之泛型序列化
java·servlet·c#
浩泽学编程1 小时前
【源码深度 第1篇】LinkedList:双向链表的设计与实现
java·数据结构·后端·链表·jdk
哲此一生9841 小时前
创建一个SpringBoot项目(连接数据库)
java·spring boot·后端
文心快码BaiduComate1 小时前
Comate Zulu实测:不会编程也能做软件?AI程序员现状令人震惊
java·程序员·前端框架