OpenYurt-YurtHub 解析

定位

目前 OpenYurt 提供边缘节点自治功能:即在云边网络断连时,边缘节点重启或者业务容器重启的情况下,业务容器能够在边缘节点自动恢复,而不会被云端驱逐、重新调度。

  • 节点维度的 SideCar,节点上组件和 kube-apiserver 之间的流量代理,有边缘(edge)和云端(cloud)两种运行模式。其中边缘 YurtHub 会缓存云端返回的数据。
  • 部署形态:以 Static Pod 形态运行在每个节点上

边缘自治

如果网络情况不好,时断时续,边缘节点无法正确上报自己的状态给中心,就会导致节点的驱逐

当 kubelet、KubeProxy、Flannel 等这些需要访问 KAS 的时候,会先访问 YurtHub,YurtHub 会将请求转发给KAS,之后会将请求后的结果放在缓存中(缓存在本地);如果后面存在断网的情况,则 kubelet 等这些直接就从Yurthub的缓存中拿数据,不去请求KAS了

YurtHub 通过本地缓存资源,使得在云边网络断连的情况下,Pod 以及 Kubelet 也能够通过 YurtHub 获取所需资源而保持其正常运行

Pod 无缝迁移

在原生 k8s 集群中,默认情况下,Pod 通过 InClusterConfig 地址访问 Kube APIServer。在云边场景中,云端与边缘可能位于不同的网络平面;此时,Pod 就无法通过 InClusterConfig 地址访问到 Kube APIServer。同时,在云边断连的情况下,边缘 Pod 需要重启时,由于无法连接 Kube APIServer 获取资源配置而重启失败

为了解决以上两个问题,YurtHub 提供了 Pod 零修改迁移到边缘环境中的能力。对于使用 InClusterConfig 访问 Kube APIServer 的 Pod,YurtHub 在不修改 Pod 本身的配置的前提下,自动调整节点上 Pod 的访问地址,将 Pod 的请求转发至 YurtHub,使得 Pod 在云边断网时也能够正常运行,实现了 Pod 在云场景下到云边场景的无缝迁移

实现原理

基本架构

Edge 模式的 Yurthub 访问方式

HTTP Server

代理所有的客户端请求,按照云边网络的联通性进行转发到对应的handler进行处理

Load Balancer

将边缘的请求转发到云上的一个具体的server,转发的方式有轮训和优先级调度这两种

Cache Manager

将从云端 api-server 返回的响应结果存储到本地 (local storage);如果边缘节点离线则直接由 Local-proxy 直接查询本地缓存来返回响应

Health Check

健康检查,周期性的检查边缘和云上的连接情况

Filter

过滤器用于修改从云端返回的响应response

vbnet 复制代码
YurtHubServerAddr: "127.0.0.1:10267"
YurtHubProxyServerAddr: "127.0.0.1:10261",
YurtHubProxyServerSecureAddr: "127.0.0.1:10268",
YurtHubProxyServerDummyAddr: "169.254.2.1:10261",
YurtHubProxyServerSecureDummyAddr: "169.254.2.1:10268"

Yurthub 默认会在本地起这么几个地址进行监听,我们查看本机上的这几个地址在进行监听

ruby 复制代码
root@root:~# netstat -tnpl | grep -i yurthub
tcp        0      0 127.0.0.1:10267         0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 169.254.2.1:10268       0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 127.0.0.1:10268         0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 169.254.2.1:10261       0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 127.0.0.1:10261         0.0.0.0:*               LISTEN      1974065/yurthub

源码解析

我们进入到容器中实际观察,确实容器里面k8s的访问地址被修改成为了访问yurthub的地址

启动 transport manager

go 复制代码
klog.Infof("%d. new transport manager", trace)
transportManager, err := transport.NewTransportManager(cfg.CertManager, ctx.Done())
if err != nil {
    return fmt.Errorf("could not new transport manager, %w", err)
}
trace++

创建健康检查

go 复制代码
// 创建Health-check
var cloudHealthChecker healthchecker.MultipleBackendsHealthChecker
if cfg.WorkingMode == util.WorkingModeEdge {
    // Health Check 负责定期检测 Kube APIServer 是否可以访问,设置 Kube APIServer 的健康状态,作为请求转发给云端或者本地处理的依据
    // 此外,Health Check 还负责将 YurtHub 的心跳信息更新到云端。
    klog.Infof("%d. create health checkers for remote servers and pool coordinator", trace)
    cloudHealthChecker, err = healthchecker.NewCloudAPIServerHealthChecker(cfg, cloudClients, ctx.Done())
    if err != nil {
        return fmt.Errorf("could not new cloud health checker, %w", err)
    }
} else {
    klog.Infof("%d. disable health checker for node %s because it is a cloud node", trace, cfg.NodeName)
    // In cloud mode, cloud health checker is not needed.
    // This fake checker will always report that the cloud is healthy and pool coordinator is unhealthy.
    cloudHealthChecker = healthchecker.NewFakeChecker(true, make(map[string]int))
}
trace++

健康检查的关键点:

go 复制代码
func (p *prober) Probe(phase string) bool {
    if p.kubeletStopped() {
        p.markAsUnhealthy(phase)
        return false
    }
    
    baseLease := p.getLastNodeLease()            1.先获取节点的租约信息
    lease, err := p.nodeLease.Update(baseLease)  2.更新租约
    if err == nil {
        if err := p.setLastNodeLease(lease); err != nil {
            klog.Errorf("failed to store last node lease: %v", err)
        }
        p.markAsHealthy(phase)                  3.更新成功,标记为健康状态
        return true
    }

    klog.Errorf("failed to probe: %v, remote server %s", err, p.ServerName())
    p.markAsUnhealthy(phase)
    return false
}

进行节点健康检查的逻辑,定时启动 Probe 函数

go 复制代码
func (hc *cloudAPIServerHealthChecker) run(stopCh <-chan struct{}) {
    intervalTicker := time.NewTicker(time.Duration(hc.heartbeatInterval) * time.Second)
    defer intervalTicker.Stop()

    for {
        select {
        case <-stopCh:
            klog.Infof("exit normally in health check loop.")
            return
        case <-intervalTicker.C:
            // Ensure that the node heartbeat can be reported when there is a healthy remote server.
            // Try to detect all remote server in a loop, if there is a remote server can update nodeLease, exit the loop.
            for i := 0; i < len(hc.remoteServers); i++ {
                p := hc.getProber()
                if p.Probe(ProbePhaseNormal) {
                    break
                }
            }
        }
    }
}

创建Cache-Manager

只需要为边缘节点创建 cache-manager,不需要为云上节点创建

perl 复制代码
var cacheMgr cachemanager.CacheManager
if cfg.WorkingMode == util.WorkingModeEdge {
    klog.Infof("%d. new cache manager with storage wrapper and serializer manager", trace)
cacheMgr = cachemanager.NewCacheManager(cfg.StorageWrapper, cfg.SerializerManager, cfg.RESTMapperManager, cfg.SharedFactory)
} else {
    klog.Infof("%d. disable cache manager for node %s because it is a cloud node", trace, cfg.NodeName)
}

我们分别看两个缓存Response和查询缓存的例子

  • 缓存 Response
go 复制代码
func (cm *cacheManager) CacheResponse(req *http.Request, prc io.ReadCloser, stopCh <-chan struct{}) error {
    ctx := req.Context()
    这里会从对应的 httpReq ctx 里面拿到对应的资源信息,然后判断是否是 watch 资源
    是的话便会存储到本地的cache中
    info, _ := apirequest.RequestInfoFrom(ctx)    
    if isWatch(ctx) {
        return cm.saveWatchObject(ctx, info, prc, stopCh)
    }
    
    ....
}

func (cm *cacheManager) saveWatchObject(ctx context.Context, info *apirequest.RequestInfo, r io.ReadCloser, stopCh <-chan struct{}) error {
    ....
    ....
    for {
        watchType, obj, err := d.Decode()
        if err != nil {
            klog.Errorf("%s %s watch decode ended with: %v", comp, info.Path, err)
            return err
        }

        switch watchType {
        case watch.Added, watch.Modified, watch.Deleted:
             ....

            key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
                Component: comp,
                Namespace: ns,
                Name:      name,
                Resources: info.Resource,
                Group:     info.APIGroup,
                Version:   info.APIVersion,
            })
            if err != nil {
                klog.Errorf("failed to get cache path, %v", err)
                continue
            }

            switch watchType {
  这里会判断是否是添加或者修改事件,如果是便会将其存入到本地的cache
这个cache可以是内存,也可以是磁盘
            case watch.Added, watch.Modified:            
                err = cm.storeObjectWithKey(key, obj)
                if watchType == watch.Added {
                    addObjCnt++
                } else {
                    updateObjCnt++
                }
            
            如果是删除事件,则删除本地对应的缓存  
            case watch.Deleted:
                err = cm.storage.Delete(key)
                delObjCnt++
            default:
                // impossible go to here
            }

           ....
    }
}

我们到对应的缓存目录下面可以发现很多缓存到本地的信息,比如 nodes, pods, secrets, services

ruby 复制代码
root@root:/etc/kubernetes/cache/kubelet# ls
configmaps  csidrivers  csinodes  events  leases  nodes  pods  secrets  services
  • 查询缓存
go 复制代码
func (cm *cacheManager) QueryCache(req *http.Request) (runtime.Object, error) {
    ctx := req.Context()
    info, ok := apirequest.RequestInfoFrom(ctx)
    if !ok || info == nil || info.Resource == "" {
        return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
    }
    if !info.IsResourceRequest {
        return nil, fmt.Errorf("failed to QueryCache for getting non-resource request %s", util.ReqString(req))
    }

    判断是什么操作
    switch info.Verb {
    case "list":
        return cm.queryListObject(req)
    case "get", "patch", "update":
        return cm.queryOneObject(req)
    default:
        return nil, fmt.Errorf("failed to QueryCache, unsupported verb %s of request %s", info.Verb, util.ReqString(req))
    }
}


func (cm *cacheManager) queryOneObject(req *http.Request) (runtime.Object, error) {
    ctx := req.Context()
    info, ok := apirequest.RequestInfoFrom(ctx)
    if !ok || info == nil || info.Resource == "" {
        return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
    }

    comp, _ := util.ClientComponentFrom(ctx)
    // query in-memory cache first
    var isInMemoryCache = isInMemeoryCache(ctx)
    var isInMemoryCacheMiss bool
    if isInMemoryCache {
        if obj, err := cm.queryInMemeryCache(info); err != nil {
            ....
        }
    }

    // fall back to normal query
    key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
        Component: comp,
        Namespace: info.Namespace,
        Name:      info.Name,
        Resources: info.Resource,
        Group:     info.APIGroup,
        Version:   info.APIVersion,
    })
    if err != nil {
        return nil, err
    }

    klog.V(4).Infof("component: %s try to get key: %s", comp, key.Key())
    obj, err := cm.storage.Get(key)
    if err != nil {
        klog.Errorf("failed to get obj %s from storage, %v", key.Key(), err)
        return nil, err
    }
     
    ....
}

注意这里的区别

In order to accelerate kubelet get node and lease object, we cache them

node 和 lease 的信息是直接缓存内存中的,这样是为了 kubelet 能够快速获取;但是看本地的缓存的文件,也会在本地文件中也缓存一份

创建GC

GC 的作用主要在于两点

  • Yurthub 启动的时候就会删除本地上缓存的,但是云端没有的pod的信息
  • 定时回收事件 Event

其基本的逻辑都是先判断本地的缓存,然后获取云上的缓存,然后本地存在但是云上不存在就会被GC回收掉

go 复制代码
func (m *GCManager) gcPodsWhenRestart() {
    localPodKeys, err := m.store.ListResourceKeysOfComponent("kubelet", schema.GroupVersionResource{
        Group:    "",
        Version:  "v1",
        Resource: "pods",
    })
    
    ....
    ....
    
    listOpts := metav1.ListOptions{FieldSelector: fields.OneTermEqualSelector("spec.nodeName", m.nodeName).String()}
    podList, err := kubeClient.CoreV1().Pods(v1.NamespaceAll).List(context.Background(), listOpts)
    if err != nil {
        klog.Errorf("could not list pods for node(%s), %v", m.nodeName, err)
        return
    }
    ....
     
这里执行主要的Diff逻辑
    deletedPods := make([]storage.Key, 0)
  for i := range localPodKeys {
 if _, ok := currentPodKeys[localPodKeys[i]]; !ok {
deletedPods = append (deletedPods, localPodKeys[i])
}
}

    if len(deletedPods) == len(localPodKeys) {
        klog.Infof("it's dangerous to gc all cache pods, so skip gc")
        return
    }

    for _, key := range deletedPods {
        if err := m.store.Delete(key); err != nil {
            klog.Errorf("failed to gc pod %s, %v", key, err)
        } else {
            klog.Infof("gc pod %s successfully", key)
        }
    }
}

构造 Yurt 反向代理 handler

作用

用于 proxy 客户端的请求

源码解析

go 复制代码
klog.Infof("%d. new reverse proxy handler for remote servers", trace)
yurtProxyHandler, err := proxy.NewYurtReverseProxyHandler( ... )
if err != nil {
    return fmt.Errorf("could not create reverse proxy handler, %w", err)
}

Yurthub 的反向代理处理的逻辑一共只有下面这几种

  1. 如果是 Cloud 模式,直接通过 LoabBalancer 进行访问
  2. 如果是 Edge 模式,需要先判断请求的类型(比如是否是 kubelet 的 leaseReq 或者事件的创建请求 EventCreateReq )
go 复制代码
func (p *yurtReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
    if p.workingMode == hubutil.WorkingModeCloud {
        p.loadBalancer.ServeHTTP(rw, req)
        return
    }
        
    switch {
   case util.IsKubeletLeaseReq(req):
p.handleKubeletLease(rw, req)
 case util.IsEventCreateRequest(req):
p.eventHandler(rw, req)
 case util.IsPoolScopedResouceListWatchRequest(req):
p.poolScopedResouceHandler(rw, req)
 case util.IsSubjectAccessReviewCreateGetRequest(req):
p.subjectAccessReviewHandler(rw, req)
    default:
        // For resource request that do not need to be handled by pool-coordinator,
        // handling the request with cloud apiserver or local cache.
        if p.cloudHealthChecker.IsHealthy() {
            p.loadBalancer.ServeHTTP(rw, req)
        } else {
            p.localProxy.ServeHTTP(rw, req)
        }
    }
}

针对边缘发起的请求,yurthub 会加一下对应的中间件 chain

每个中间件的作用各不相同

ini 复制代码
func (p *yurtReverseProxy) buildHandlerChain(handler http.Handler) http.Handler {
 1. 记录请求从yurthub代理出去到处理完毕的耗时
    handler = util.WithRequestTrace(handler)            
    
 2. 记录请求头部的Content类型
    handler = util.WithRequestContentType(handler)
    
 3. 是否需要缓存Response的头部header检查
    if p.workingMode == hubutil.WorkingModeEdge {
        handler = util.WithCacheHeaderCheck(handler)
    }
    ........
    return handler
}

一定要注意上面这个中间件的顺序。越往下的chain是越先被调用,所以记录是实际的请求处理延迟的chain要放在第一位,这样计算得才准确

启动 yurthub Servers

整体逻辑

Server 对应的 handler 主要有两类

  • hubServerHandler
  • proxyHandler

其中 proxyHandler 又分为 nonResourceReq 和 resouceReq 的处理逻辑

  • nonResourceReq 直接从本地的cache里面拿
  • resouceReq 转发到远端的 kube-apiserver 进行处理

源码解析

go 复制代码
// start yurthub proxy servers for forwarding requests to cloud kube-apiserver
if cfg.WorkingMode == util.WorkingModeEdge {
    proxyHandler = wrapNonResourceHandler(proxyHandler, cfg, rest)
}
if cfg.YurtHubProxyServerServing != nil {
    if err := cfg.YurtHubProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

if cfg.YurtHubDummyProxyServerServing != nil {
    if err := cfg.YurtHubDummyProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

if cfg.YurtHubSecureProxyServerServing != nil {
    if _, err := cfg.YurtHubSecureProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

对如下的 non-resource 资源的的 handler 进行了额外的处理

go 复制代码
var nonResourceReqPaths = map[string]storage.ClusterInfoType{
    "/version":                       storage.Version,
    "/apis/discovery.k8s.io/v1":      storage.APIResourcesInfo,
    "/apis/discovery.k8s.io/v1beta1": storage.APIResourcesInfo,
}

直接从本地的缓存中进行 handler 处理

go 复制代码
func wrapNonResourceHandler(proxyHandler http.Handler, config *config.YurtHubConfiguration, restMgr *rest.RestConfigManager) http.Handler {
    wrapMux := mux.NewRouter()

    // register handler for non resource requests
    for path := range nonResourceReqPaths {
        wrapMux.Handle(path, localCacheHandler(nonResourceHandler, restMgr, config.StorageWrapper, path)).Methods("GET")
    }

    // register handler for other requests
    wrapMux.PathPrefix("/").Handler(proxyHandler)
    return wrapMux
}

FAQ

OpenYurt 将 kubelet 的请求接收到后,怎么识别具体是什么样的请求才进行转发到云上的api-server

总结逻辑如下

OpenYurt 安装成功后会自动修改掉 kubelet 的配置信息,让其不要访问 api-server

而是访问本地的 10261 端口,这个端口就是 yurthub 服务监听的本地端口

scss 复制代码
root@root:~ cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
    server: http://127.0.0.1:10261
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default-auth
  name: default-context
current-context: default-context
kind: Config
preferences: {}

本地服务收到请求之后,再从 http 的 Request 的 context 里面拿出对应的请求信息 RequestInfo

这个结构体会存储所有的客户端发送过来的HTTP请求的对 api-server 访问的具体信息

go 复制代码
// RequestInfo holds information parsed from the http.Request
type RequestInfo struct {
    // IsResourceRequest indicates whether or not the request is for an API resource or subresource
    IsResourceRequest bool
    // Path is the URL path of the request
    Path string
    // Verb is the kube verb associated with the request for API requests, not the http verb.  This includes things like list and watch.
    // for non-resource requests, this is the lowercase http verb
    Verb string

    APIPrefix  string
    APIGroup   string
    APIVersion string
    Namespace  string
    // Resource is the name of the resource being requested.  This is not the kind.  For example: pods
    Resource string
    // Subresource is the name of the subresource being requested.  This is a different resource, scoped to the parent resource, but it may have a different kind.
    // For instance, /pods has the resource "pods" and the kind "Pod", while /pods/foo/status has the resource "pods", the sub resource "status", and the kind "Pod"
    // (because status operates on pods). The binding resource for a pod though may be /pods/foo/binding, which has resource "pods", subresource "binding", and kind "Binding".
    Subresource string
    // Name is empty for some verbs, but if the request directly indicates a name (not in body content) then this field is filled in.
    Name string
    // Parts are the path parts for the request, always starting with /{resource}/{name}
    Parts []string
}

那 yurthub 拿到这个 http context 里面的信息可以进行判断客户端操作的是哪种资源了

相关推荐
福大大架构师每日一题7 小时前
22.1 k8s不同role级别的服务发现
容器·kubernetes·服务发现
weixin_453965008 小时前
[单master节点k8s部署]30.ceph分布式存储(一)
分布式·ceph·kubernetes
weixin_453965008 小时前
[单master节点k8s部署]32.ceph分布式存储(三)
分布式·ceph·kubernetes
tangdou3690986558 小时前
1分钟搞懂K8S中的NodeSelector
云原生·容器·kubernetes
later_rql11 小时前
k8s-集群部署1
云原生·容器·kubernetes
weixin_4539650013 小时前
[单master节点k8s部署]31.ceph分布式存储(二)
分布式·ceph·kubernetes
大G哥16 小时前
记一次K8S 环境应用nginx stable-alpine 解析内部域名失败排查思路
运维·nginx·云原生·容器·kubernetes
妍妍的宝贝16 小时前
k8s 中微服务之 MetailLB 搭配 ingress-nginx 实现七层负载
nginx·微服务·kubernetes
福大大架构师每日一题18 小时前
23.1 k8s监控中标签relabel的应用和原理
java·容器·kubernetes
程序那点事儿18 小时前
k8s 之动态创建pv失败(踩坑)
云原生·容器·kubernetes