Spark 管理和更新Hadoop token 流程

Hadoop Token 管理

  • AM 通过 kerberos authentication
  • AM 获取 Yarn 和 HDFS Token
  • AM send tokens to containers
  • Containers load tokens

Enable debug message

log4j.logger.org.apache.hadoop.security=DEBUG

AM Generate tokens

Logs:

log 复制代码
23/09/07 22:38:50,375 INFO [main] security.HadoopDelegationTokenManager:57 : Attempting to login to KDC using principal: hadoop_user@PROD.COM, keytab: /home/hadoop_user/hadoop_user.keytab
23/09/07 22:38:50,381 DEBUG [main] security.UserGroupInformation:246 : Hadoop login
23/09/07 22:38:50,381 DEBUG [main] security.UserGroupInformation:192 : hadoop login commit
23/09/07 22:38:50,382 DEBUG [main] security.UserGroupInformation:200 : Using kerberos user: hadoop_user@PROD.COM
23/09/07 22:38:50,382 DEBUG [main] security.UserGroupInformation:218 : Using user: "hadoop_user@PROD.COM" with name: hadoop_user@PROD.COM
23/09/07 22:38:50,382 DEBUG [main] security.UserGroupInformation:230 : User entry: "hadoop_user@PROD.COM"
23/09/07 22:38:50,382 INFO [main] security.HadoopDelegationTokenManager:57 : Successfully logged into KDC.
23/09/07 22:38:51,247 INFO [main] security.HadoopFSDelegationTokenProvider:57 : getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-113291108_1, ugi=hadoop_user@PROD.COM (auth:KERBEROS)]] with renewer rm/hadoop-rm-1.vip.hadoop.COM@PROD.COM
23/09/07 22:38:51,391 DEBUG [main] security.SaslRpcClient:493 : Sending sasl message state: NEGOTIATE

23/09/07 22:38:51,398 DEBUG [main] security.SaslRpcClient:288 : Get token info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.token.TokenInfo(value=class org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSelector)
23/09/07 22:38:51,399 DEBUG [main] security.SaslRpcClient:241 : tokens aren't supported for this protocol or user doesn't have one
23/09/07 22:38:51,399 DEBUG [main] security.SaslRpcClient:313 : Get kerberos info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.KerberosInfo(clientPrincipal=, serverPrincipal=dfs.namenode.kerberos.principal)
23/09/07 22:38:51,420 DEBUG [main] security.SaslRpcClient:260 : RPC Server's Kerberos principal name for protocol=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB is nn/hadoop-nn-2.vip.hadoop.COM@PROD.COM
23/09/07 22:38:51,421 DEBUG [main] security.SaslRpcClient:271 : Creating SASL GSSAPI(KERBEROS)  client to authenticate to service at hadoop-nn-2.vip.hadoop.COM
23/09/07 22:38:51,425 DEBUG [main] security.SaslRpcClient:194 : Use KERBEROS authentication for protocol ClientNamenodeProtocolPB
23/09/07 22:38:51,441 DEBUG [main] security.SaslRpcClient:493 : Sending sasl message state: INITIATE
23/09/07 22:38:51,506 INFO [main] security.HadoopFSDelegationTokenProvider:57 : getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-113291108_1, ugi=hadoop_user@PROD.COM (auth:KERBEROS)]] with renewer hadoop_user@PROD.COM
23/09/07 22:38:52,807 INFO [main] security.HadoopDelegationTokenManager:57 : Scheduling renewal in 18.0 h.
23/09/07 22:38:52,809 INFO [main] security.HadoopDelegationTokenManager:57 : Updating delegation tokens.
23/09/07 22:38:52,833 INFO [main] deploy.SparkHadoopUtil:57 : Updating delegation tokens for current user.
23/09/07 22:38:52,858 INFO [dispatcher-CoarseGrainedScheduler] deploy.SparkHadoopUtil:57 : Updating delegation tokens for current user.
23/09/07 22:38:53,119 DEBUG [main] security.SaslRpcClient:288 : Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@48f2054d
23/09/07 22:38:53,120 DEBUG [main] security.SaslRpcClient:241 : tokens aren't supported for this protocol or user doesn't have one
23/09/07 22:38:53,121 DEBUG [main] security.SaslRpcClient:313 : Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$1@6ce26986
23/09/07 22:38:53,124 DEBUG [main] security.SaslRpcClient:343 : getting serverKey: yarn.resourcemanager.principal conf value: rm/_HOST@PROD.COM principal: rm/hadoop-rm-1.hadoop-rm-rm.hm-prod.svc.35.tess.io@PROD.COM
23/09/07 22:38:53,124 DEBUG [main] security.SaslRpcClient:260 : RPC Server's Kerberos principal name for protocol=org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is rm/hadoop-rm-1.hadoop-rm-rm.hm-prod.svc.35.tess.io@PROD.COM
23/09/07 22:38:53,124 DEBUG [main] security.SaslRpcClient:271 : Creating SASL GSSAPI(KERBEROS)  client to authenticate to service at hadoop-rm-1.hadoop-rm-rm.hm-prod.svc.35.tess.io
23/09/07 22:38:53,125 DEBUG [main] security.SaslRpcClient:194 : Use KERBEROS authentication for protocol ApplicationClientProtocolPB
23/09/07 22:38:53,131 DEBUG [main] security.SaslRpcClient:493 : Sending sasl message state: INITIATE
23/09/07 22:38:53,182 DEBUG [main] token.Token:260 : Cloned private token Kind: HDFS_DELEGATION_TOKEN, Service: hadoop-nn-1.vip.hadoop.COM:8020, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hadoop, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139)
23/09/07 22:38:53,182 DEBUG [main] token.Token:260 : Cloned private token Kind: HDFS_DELEGATION_TOKEN, Service: hadoop-nn-2.vip.hadoop.COM:8020, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hadoop, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139)
23/09/07 22:38:53,182 DEBUG [main] token.Token:260 : Cloned private token Kind: HDFS_DELEGATION_TOKEN, Service: hadoop-nn-3.vip.hadoop.COM:8020, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139) from Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hadoop, Ident: (token for hadoop_user: HDFS_DELEGATION_TOKEN owner=hadoop_user@PROD.COM, renewer=yarn, realUser=, issueDate=1694151531461, maxDate=1694756331461, sequenceNumber=1007863, masterKeyId=7139)

CoarseGrainedSchedulerBackend

启动 token manager

scala 复制代码
  override def start(): Unit = {
    if (UserGroupInformation.isSecurityEnabled()) {
      delegationTokenManager = createTokenManager()
      delegationTokenManager.foreach { dtm =>
        val ugi = UserGroupInformation.getCurrentUser()
        val tokens = if (dtm.renewalEnabled) {
          dtm.start()
        } else {
          val creds = ugi.getCredentials()
          dtm.obtainDelegationTokens(creds)
          if (creds.numberOfTokens() > 0 || creds.numberOfSecretKeys() > 0) {
            SparkHadoopUtil.get.serialize(creds)
          } else {
            null
          }
        }
        if (tokens != null) {
          updateDelegationTokens(tokens)
        }
      }
    }
  }

HadoopDelegationTokenManager

定时 Refresh tokens

scala 复制代码
  def start(): Array[Byte] = {
    require(renewalEnabled, "Token renewal must be enabled to start the renewer.")
    require(schedulerRef != null, "Token renewal requires a scheduler endpoint.")
    renewalExecutor =
      ThreadUtils.newDaemonSingleThreadScheduledExecutor("Credential Renewal Thread")

    val ugi = UserGroupInformation.getCurrentUser()
    if (ugi.isFromKeytab()) {
      // In Hadoop 2.x, renewal of the keytab-based login seems to be automatic, but in Hadoop 3.x,
      // it is configurable (see hadoop.kerberos.keytab.login.autorenewal.enabled, added in
      // HADOOP-9567). This task will make sure that the user stays logged in regardless of that
      // configuration's value. Note that checkTGTAndReloginFromKeytab() is a no-op if the TGT does
      // not need to be renewed yet.
      val tgtRenewalTask = new Runnable() {
        override def run(): Unit = {
          try {
            ugi.checkTGTAndReloginFromKeytab()
          } catch {
            case e: Throwable =>
              logWarning("Failed to renew TGT from keytab file", e)
          }
        }
      }
      val tgtRenewalPeriod = sparkConf.get(KERBEROS_RELOGIN_PERIOD)
      renewalExecutor.scheduleAtFixedRate(tgtRenewalTask, tgtRenewalPeriod, tgtRenewalPeriod,
        TimeUnit.SECONDS)
    }

    updateTokensTask()
  }

  private def updateTokensTask(): Array[Byte] = {
    try {
      val freshUGI = doLogin()
      val creds = obtainTokensAndScheduleRenewal(freshUGI)
      val tokens = SparkHadoopUtil.get.serialize(creds)

      logInfo("Updating delegation tokens.")
      schedulerRef.send(UpdateDelegationTokens(tokens))
      tokens
    } catch {
      case _: InterruptedException =>
        // Ignore, may happen if shutting down.
        null
      case e: Exception =>
        val delay = TimeUnit.SECONDS.toMillis(sparkConf.get(CREDENTIALS_RENEWAL_RETRY_WAIT))
        logWarning(s"Failed to update tokens, will try again in ${UIUtils.formatDuration(delay)}!" +
          " If this happens too often tasks will fail.", e)
        scheduleRenewal(delay)
        null
    }
  }

CoarseGrainedSchedulerBackend

      case UpdateDelegationTokens(newDelegationTokens) =>
        updateDelegationTokens(newDelegationTokens)

Container 启动 Load token

log 复制代码
23/09/07 23:41:56,279 DEBUG [main] security.UserGroupInformation:246 : Hadoop login
23/09/07 23:41:56,281 DEBUG [main] security.UserGroupInformation:192 : hadoop login commit
23/09/07 23:41:56,284 DEBUG [main] security.UserGroupInformation:214 : Using local user: UnixPrincipal: hadoop_user
23/09/07 23:41:56,285 DEBUG [main] security.UserGroupInformation:218 : Using user: "UnixPrincipal: hadoop_user" with name: hadoop_user
23/09/07 23:41:56,285 DEBUG [main] security.UserGroupInformation:230 : User entry: "hadoop_user"
23/09/07 23:41:56,285 DEBUG [main] security.UserGroupInformation:741 : Reading credentials from location /hadoop/1/yarn/local/usercache/hadoop_user/appcache/application_1684894519955_69959/container_e2311_1684894519955_69959_01_000021/container_tokens
23/09/07 23:41:56,303 DEBUG [main] security.UserGroupInformation:746 : Loaded 7 tokens from /hadoop/1/yarn/local/usercache/hadoop_user/appcache/application_1684894519955_69959/container_e2311_1684894519955_69959_01_000021/container_tokens
23/09/07 23:41:56,304 DEBUG [main] security.UserGroupInformation:787 : UGI loginUser: hadoop_user (auth:SIMPLE)

23/09/07 23:44:54,825 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:284 : Get token info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.token.TokenInfo(value=class org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSelector)
23/09/07 23:44:54,831 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:267 : Creating SASL DIGEST-MD5(TOKEN)  client to authenticate to service at default
23/09/07 23:44:54,833 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:190 : Use TOKEN authentication for protocol ClientNamenodeProtocolPB
23/09/07 23:44:54,836 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:690 : SASL client callback: setting username: ABZiX2Nhcm1lbEBQUk9ELkVCQVkuQ09NBHlhcm4AigGKc4ZbmYoBipeS35mMAUIqhY4Ggg==
23/09/07 23:44:54,836 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:695 : SASL client callback: setting userPassword
23/09/07 23:44:54,836 DEBUG [Executor 1 task launch worker for task 1785, task 1757.0 in stage 8.0 of app application_1684894519955_69959] security.SaslRpcClient:700 : SASL client callback: setting realm: default

Spark AM 和 Executor 更新收到的 tokens

scala 复制代码
    case UpdateDelegationTokens(tokenBytes) =>
      logInfo(s"Received tokens of ${tokenBytes.length} bytes")
      SparkHadoopUtil.get.addDelegationTokens(tokenBytes, env.conf)

.....	
  UserGroupInformation.getCurrentUser.addCredentials(creds)

Debug message

# 创建 UGI

UserGroupInformation createRemoteUser(String user, AuthMethod authMethod)

# 为什么UGI.commit() 选择了 local user, 而不是 kerberos user??

23/09/07 22:39:05,904 DEBUG [main] security.UserGroupInformation:246 : Hadoop login
23/09/07 22:39:05,905 DEBUG [main] security.UserGroupInformation:192 : hadoop login commit
23/09/07 22:39:05,908 DEBUG [main] security.UserGroupInformation:214 : Using local user: UnixPrincipal: b_carmel
23/09/07 22:39:05,909 DEBUG [main] security.UserGroupInformation:218 : Using user: "UnixPrincipal: b_carmel" with name: b_carmel
23/09/07 22:39:05,909 DEBUG [main] security.UserGroupInformation:230 : User entry: "b_carmel"

因为:
UGI
    HadoopLoginModule
        Subject subject // 这个没有 KerberosPrincipal 


* UGI 在 Client 中叫 ticket
* 通过 Clint.ticket 创建 SaslRpcClient

UserGroupInformation realUser // 一般指系统用户
    doSubjectLogin(subject, null);
-->
UserGroupInformation loginUser  // 也是 proxyUser
    Credentials
        secretKeysMap
            Text alias --> byte[] key
    // 准备 loginUser Credentials
    // 23/09/07 22:39:05,928 DEBUG [main] security.UserGroupInformation:746 : Loaded 2 tokens from /hadoop/2/yarn/local/usercache/b_carmel/appcache/application_1693993514169_0446/container_e6161_1693993514169_0446_01_000001/container_tokens

    User user // sub class of Principal
        String fullName // 等于 name
        AuthenticationMethod authMethod 
    // 23/09/07 22:39:05,929 DEBUG [main] security.UserGroupInformation:787 : UGI loginUser: b_carmel (auth:SIMPLE)

    Subject subject;  // 这个和 HadoopLoginModule 中的 subject 相同
相关推荐
szxinmai主板定制专家1 小时前
【国产NI替代】基于FPGA的32通道(24bits)高精度终端采集核心板卡
大数据·人工智能·fpga开发
TGB-Earnest3 小时前
【py脚本+logstash+es实现自动化检测工具】
大数据·elasticsearch·自动化
大圣数据星球5 小时前
Fluss 写入数据湖实战
大数据·设计模式·flink
suweijie7685 小时前
SpringCloudAlibaba | Sentinel从基础到进阶
java·大数据·sentinel
Data跳动10 小时前
Spark内存都消耗在哪里了?
大数据·分布式·spark
woshiabc11111 小时前
windows安装Elasticsearch及增删改查操作
大数据·elasticsearch·搜索引擎
lucky_syq12 小时前
Saprk和Flink的区别
大数据·flink
lucky_syq12 小时前
流式处理,为什么Flink比Spark Streaming好?
大数据·flink·spark
袋鼠云数栈12 小时前
深入浅出Flink CEP丨如何通过Flink SQL作业动态更新Flink CEP作业
大数据