【大数据技术-联邦集群RBF】DFSRouter日志一直打印修改Membership为EXPIRED状态的日志分析

生产环境遇到下面报错

复制代码
2025-04-23 17:44:15,780 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router1:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn1:nn1:8020-EXPIRED
2025-04-23 17:44:15,781 INFO  store.CachedRecordStore (CachedRecordStore.java:overrideExpiredRecords(192)) - Override State Store record MembershipState: router2:8888->hh-fed-sub25:nn2:nn2:8020-EXPIRED

报错原因是,之前子集群配置了3个router,2个nn,然后会向StateStore中存储6个MembershipState。

后来,将子集群的router停了两个,只运行一个router,这样的后果就是会在运行的router日志发现上面报错。

因为router会周期性下载MembershipState,每次都会去检查是否过期,而我们停了2个Router,这俩Router之前和NameNode形成Membership并上报到了StateStore,并且我们关闭了删除过期记录的参数dfs.federation.router.store.membership.expiration.deletion,所以,会在运行的Router中打印上面报错。

修复做法,选择下面之一都可以:

  1. 开启删除过期参数
    1. dfs.federation.router.store.membership.expiration默认未5min,若设置dfs.federation.router.store.membership.expiration.deletion=2min,则表示membership过期了(超过5min没汇报),在等2min就删除它。
  2. 启动已停止的router

参考源码

org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords

java 复制代码
  public void overrideExpiredRecords(QueryResult<R> query) throws IOException {
    List<R> commitRecords = new ArrayList<>();
    List<R> deleteRecords = new ArrayList<>();
    List<R> newRecords = query.getRecords();
    long currentDriverTime = query.getTimestamp();
    if (newRecords == null || currentDriverTime <= 0) {
      LOG.error("Cannot check overrides for record");
      return;
    }
    for (R record : newRecords) {
      if (record.shouldBeDeleted(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        if (getDriver().remove(record)) {
          deleteRecords.add(record);
          LOG.info("Deleted State Store record {}: {}", recordName, record);
        } else {
          LOG.warn("Couldn't delete State Store record {}: {}", recordName,
              record);
        }
      } else if (record.checkExpired(currentDriverTime)) {
        String recordName = StateStoreUtils.getRecordName(record.getClass());
        LOG.info("Override State Store record {}: {}", recordName, record);
        commitRecords.add(record);
      }
    }
    if (commitRecords.size() > 0) {
      getDriver().putAll(commitRecords, true, false);
    }
    if (deleteRecords.size() > 0) {
      newRecords.removeAll(deleteRecords);
    }
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#checkExpired

java 复制代码
   @Override
  public boolean checkExpired(long currentTime) {
    if (super.checkExpired(currentTime)) {
      this.setState(EXPIRED);
      // Commit it
      return true;
    }
    return false;
  }

 public boolean checkExpired(long currentTime) {
    long expiration = getExpirationMs();
    long modifiedTime = getDateModified();
    if (modifiedTime > 0 && expiration > 0) {
      return (modifiedTime + expiration) < currentTime;
    }
    return false;
  }

org.apache.hadoop.hdfs.server.federation.store.records.BaseRecord#shouldBeDeleted

java 复制代码
public boolean shouldBeDeleted(long currentTime) {
  long deletionTime = getDeletionMs();
  if (isExpired() && deletionTime > 0) {
    long elapsedTime = currentTime - (getDateModified() + getExpirationMs());
    return elapsedTime > deletionTime;
  } else {
    return false;
  }
}
相关推荐
讓丄帝愛伱几秒前
excel导出实例
java·python·excel
p***q7819 分钟前
SpringBoot实战:高效实现API限流策略
java·spring boot·后端
3***161028 分钟前
【JavaEE】Spring Boot 项目创建
java·spring boot·java-ee
6***v41731 分钟前
VScode 开发 Springboot 程序
java·spring boot·后端
t***316535 分钟前
SpringBoot中自定义Starter
java·spring boot·后端
橘子编程38 分钟前
经典排序算法全解析
java·算法·排序算法
z***33539 分钟前
SpringBoot获取bean的几种方式
java·spring boot·后端
aloha_7891 小时前
联易融测开面试准备
java·python·面试·单元测试
s***46981 小时前
【SpringBoot篇】详解Bean的管理(获取bean,bean的作用域,第三方bean)
java·spring boot·后端
jiayong231 小时前
Elasticsearch 完全指南:原理、优势与应用场景
大数据·elasticsearch·搜索引擎