【大数据技术-联邦集群RBF】DFSRouter日志一直打印修改Membership为EXPIRED状态的日志分析

生产环境遇到下面报错

复制代码
2025-04-23 17:44:15,780 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED

报错原因是,之前子集群配置了3个router,2个nn,然后会向StateStore中存储6个MembershipState。

后来,将子集群的router停了两个,只运行一个router,这样的后果就是会在运行的router日志发现上面报错。

因为router会周期性下载MembershipState,每次都会去检查是否过期,而我们停了2个Router,这俩Router之前和NameNode形成Membership并上报到了StateStore,并且我们关闭了删除过期记录的参数dfs.federation.router.store.membership.expiration.deletion,所以,会在运行的Router中打印上面报错。

修复做法,选择下面之一都可以:

  1. 开启删除过期参数
    1. dfs.federation.router.store.membership.expiration默认未5min,若设置dfs.federation.router.store.membership.expiration.deletion=2min,则表示membership过期了(超过5min没汇报),在等2min就删除它。
  2. 启动已停止的router

参考源码

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

java 复制代码
  public void overrideExpiredRecords(QueryResult<R> query) throws IOException {
    List<R> commitRecords = new ArrayList<>();
    List<R> deleteRecords = new ArrayList<>();
    List<R> newRecords = query.getRecords();
    long currentDriverTime = query.getTimestamp();
    if (newRecords == null || currentDriverTime <= 0) {
      LOG.error("Cannot check overrides for record");
      return;
    }
    for (R record : newRecords) {
      if (record.shouldBeDeleted(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        if (getDriver().remove(record)) {
          deleteRecords.add(record);
          LOG.info("Deleted State Store record {}: {}", recordName, record);
        } else {
          LOG.warn("Couldn't delete State Store record {}: {}", recordName,
              record);
        }
      } else if (record.checkExpired(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        LOG.info("Override State Store record {}: {}", recordName, record);
        commitRecords.add(record);
      }
    }
    if (commitRecords.size() > 0) {
      getDriver().putAll(commitRecords, true, false);
    }
    if (deleteRecords.size() > 0) {
      newRecords.removeAll(deleteRecords);
    }
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#checkExpired

java 复制代码
   @Override
  public boolean checkExpired(long currentTime) {
    if (super.checkExpired(currentTime)) {
      this.setState(EXPIRED);
      // Commit it
      return true;
    }
    return false;
  }

 public boolean checkExpired(long currentTime) {
    long expiration = getExpirationMs();
    long modifiedTime = getDateModified();
    if (modifiedTime > 0 && expiration > 0) {
      return (modifiedTime + expiration) < currentTime;
    }
    return false;
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#shouldBeDeleted

java 复制代码
public boolean shouldBeDeleted(long currentTime) {
  long deletionTime = getDeletionMs();
  if (isExpired() && deletionTime > 0) {
    long elapsedTime = currentTime - (getDateModified() + getExpirationMs());
    return elapsedTime > deletionTime;
  } else {
    return false;
  }
}
相关推荐
繁依Fanyi8 分钟前
我的 PDF 工具箱:CodeBuddy 打造 PDFMagician 的全过程记录
java·pdf·uni-app·生活·harmonyos·codebuddy首席试玩官
遗憾皆是温柔16 分钟前
MyBatis—动态 SQL
java·数据库·ide·sql·mybatis
野曙35 分钟前
快速选择算法:优化大数据中的 Top-K 问题
大数据·数据结构·c++·算法·第k小·第k大
LallanaLee37 分钟前
常见面试题
java·开发语言
爱尚你19931 小时前
Java 泛型与类型擦除:为什么解析对象时能保留泛型信息?
java
电商数据girl1 小时前
酒店旅游类数据采集API接口之携程数据获取地方美食品列表 获取地方美餐馆列表 景点评论
java·大数据·开发语言·python·json·旅游
CircleMouse1 小时前
基于 RedisTemplate 的分页缓存设计
java·开发语言·后端·spring·缓存
ktkiko112 小时前
顶层架构 - 消息集群推送方案
java·开发语言·架构
zybsjn2 小时前
后端系统做国际化改造,生成多语言包
java·python·c#
OJAC近屿智能2 小时前
ChatGPT再升级!
大数据·人工智能·百度·chatgpt·近屿智能