OpenYurt-YurtHub 解析

定位

目前 OpenYurt 提供边缘节点自治功能:即在云边网络断连时,边缘节点重启或者业务容器重启的情况下,业务容器能够在边缘节点自动恢复,而不会被云端驱逐、重新调度。

  • 节点维度的 SideCar,节点上组件和 kube-apiserver 之间的流量代理,有边缘(edge)和云端(cloud)两种运行模式。其中边缘 YurtHub 会缓存云端返回的数据。
  • 部署形态:以 Static Pod 形态运行在每个节点上

边缘自治

如果网络情况不好,时断时续,边缘节点无法正确上报自己的状态给中心,就会导致节点的驱逐

当 kubelet、KubeProxy、Flannel 等这些需要访问 KAS 的时候,会先访问 YurtHub,YurtHub 会将请求转发给KAS,之后会将请求后的结果放在缓存中(缓存在本地);如果后面存在断网的情况,则 kubelet 等这些直接就从Yurthub的缓存中拿数据,不去请求KAS了

YurtHub 通过本地缓存资源,使得在云边网络断连的情况下,Pod 以及 Kubelet 也能够通过 YurtHub 获取所需资源而保持其正常运行

Pod 无缝迁移

在原生 k8s 集群中,默认情况下,Pod 通过 InClusterConfig 地址访问 Kube APIServer。在云边场景中,云端与边缘可能位于不同的网络平面;此时,Pod 就无法通过 InClusterConfig 地址访问到 Kube APIServer。同时,在云边断连的情况下,边缘 Pod 需要重启时,由于无法连接 Kube APIServer 获取资源配置而重启失败

为了解决以上两个问题,YurtHub 提供了 Pod 零修改迁移到边缘环境中的能力。对于使用 InClusterConfig 访问 Kube APIServer 的 Pod,YurtHub 在不修改 Pod 本身的配置的前提下,自动调整节点上 Pod 的访问地址,将 Pod 的请求转发至 YurtHub,使得 Pod 在云边断网时也能够正常运行,实现了 Pod 在云场景下到云边场景的无缝迁移

实现原理

基本架构

Edge 模式的 Yurthub 访问方式

HTTP Server

代理所有的客户端请求,按照云边网络的联通性进行转发到对应的handler进行处理

Load Balancer

将边缘的请求转发到云上的一个具体的server,转发的方式有轮训和优先级调度这两种

Cache Manager

将从云端 api-server 返回的响应结果存储到本地 (local storage);如果边缘节点离线则直接由 Local-proxy 直接查询本地缓存来返回响应

Health Check

健康检查,周期性的检查边缘和云上的连接情况

Filter

过滤器用于修改从云端返回的响应response

vbnet 复制代码
YurtHubServerAddr: "127.0.0.1:10267"
YurtHubProxyServerAddr: "127.0.0.1:10261",
YurtHubProxyServerSecureAddr: "127.0.0.1:10268",
YurtHubProxyServerDummyAddr: "169.254.2.1:10261",
YurtHubProxyServerSecureDummyAddr: "169.254.2.1:10268"

Yurthub 默认会在本地起这么几个地址进行监听,我们查看本机上的这几个地址在进行监听

ruby 复制代码
root@root:~# netstat -tnpl | grep -i yurthub
tcp        0      0 127.0.0.1:10267         0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 169.254.2.1:10268       0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 127.0.0.1:10268         0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 169.254.2.1:10261       0.0.0.0:*               LISTEN      1974065/yurthub
tcp        0      0 127.0.0.1:10261         0.0.0.0:*               LISTEN      1974065/yurthub

源码解析

我们进入到容器中实际观察,确实容器里面k8s的访问地址被修改成为了访问yurthub的地址

启动 transport manager

go 复制代码
klog.Infof("%d. new transport manager", trace)
transportManager, err := transport.NewTransportManager(cfg.CertManager, ctx.Done())
if err != nil {
    return fmt.Errorf("could not new transport manager, %w", err)
}
trace++

创建健康检查

go 复制代码
// 创建Health-check
var cloudHealthChecker healthchecker.MultipleBackendsHealthChecker
if cfg.WorkingMode == util.WorkingModeEdge {
    // Health Check 负责定期检测 Kube APIServer 是否可以访问,设置 Kube APIServer 的健康状态,作为请求转发给云端或者本地处理的依据
    // 此外,Health Check 还负责将 YurtHub 的心跳信息更新到云端。
    klog.Infof("%d. create health checkers for remote servers and pool coordinator", trace)
    cloudHealthChecker, err = healthchecker.NewCloudAPIServerHealthChecker(cfg, cloudClients, ctx.Done())
    if err != nil {
        return fmt.Errorf("could not new cloud health checker, %w", err)
    }
} else {
    klog.Infof("%d. disable health checker for node %s because it is a cloud node", trace, cfg.NodeName)
    // In cloud mode, cloud health checker is not needed.
    // This fake checker will always report that the cloud is healthy and pool coordinator is unhealthy.
    cloudHealthChecker = healthchecker.NewFakeChecker(true, make(map[string]int))
}
trace++

健康检查的关键点:

go 复制代码
func (p *prober) Probe(phase string) bool {
    if p.kubeletStopped() {
        p.markAsUnhealthy(phase)
        return false
    }
    
    baseLease := p.getLastNodeLease()            1.先获取节点的租约信息
    lease, err := p.nodeLease.Update(baseLease)  2.更新租约
    if err == nil {
        if err := p.setLastNodeLease(lease); err != nil {
            klog.Errorf("failed to store last node lease: %v", err)
        }
        p.markAsHealthy(phase)                  3.更新成功,标记为健康状态
        return true
    }

    klog.Errorf("failed to probe: %v, remote server %s", err, p.ServerName())
    p.markAsUnhealthy(phase)
    return false
}

进行节点健康检查的逻辑,定时启动 Probe 函数

go 复制代码
func (hc *cloudAPIServerHealthChecker) run(stopCh <-chan struct{}) {
    intervalTicker := time.NewTicker(time.Duration(hc.heartbeatInterval) * time.Second)
    defer intervalTicker.Stop()

    for {
        select {
        case <-stopCh:
            klog.Infof("exit normally in health check loop.")
            return
        case <-intervalTicker.C:
            // Ensure that the node heartbeat can be reported when there is a healthy remote server.
            // Try to detect all remote server in a loop, if there is a remote server can update nodeLease, exit the loop.
            for i := 0; i < len(hc.remoteServers); i++ {
                p := hc.getProber()
                if p.Probe(ProbePhaseNormal) {
                    break
                }
            }
        }
    }
}

创建Cache-Manager

只需要为边缘节点创建 cache-manager,不需要为云上节点创建

perl 复制代码
var cacheMgr cachemanager.CacheManager
if cfg.WorkingMode == util.WorkingModeEdge {
    klog.Infof("%d. new cache manager with storage wrapper and serializer manager", trace)
cacheMgr = cachemanager.NewCacheManager(cfg.StorageWrapper, cfg.SerializerManager, cfg.RESTMapperManager, cfg.SharedFactory)
} else {
    klog.Infof("%d. disable cache manager for node %s because it is a cloud node", trace, cfg.NodeName)
}

我们分别看两个缓存Response和查询缓存的例子

  • 缓存 Response
go 复制代码
func (cm *cacheManager) CacheResponse(req *http.Request, prc io.ReadCloser, stopCh <-chan struct{}) error {
    ctx := req.Context()
    这里会从对应的 httpReq ctx 里面拿到对应的资源信息,然后判断是否是 watch 资源
    是的话便会存储到本地的cache中
    info, _ := apirequest.RequestInfoFrom(ctx)    
    if isWatch(ctx) {
        return cm.saveWatchObject(ctx, info, prc, stopCh)
    }
    
    ....
}

func (cm *cacheManager) saveWatchObject(ctx context.Context, info *apirequest.RequestInfo, r io.ReadCloser, stopCh <-chan struct{}) error {
    ....
    ....
    for {
        watchType, obj, err := d.Decode()
        if err != nil {
            klog.Errorf("%s %s watch decode ended with: %v", comp, info.Path, err)
            return err
        }

        switch watchType {
        case watch.Added, watch.Modified, watch.Deleted:
             ....

            key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
                Component: comp,
                Namespace: ns,
                Name:      name,
                Resources: info.Resource,
                Group:     info.APIGroup,
                Version:   info.APIVersion,
            })
            if err != nil {
                klog.Errorf("failed to get cache path, %v", err)
                continue
            }

            switch watchType {
  这里会判断是否是添加或者修改事件,如果是便会将其存入到本地的cache
这个cache可以是内存,也可以是磁盘
            case watch.Added, watch.Modified:            
                err = cm.storeObjectWithKey(key, obj)
                if watchType == watch.Added {
                    addObjCnt++
                } else {
                    updateObjCnt++
                }
            
            如果是删除事件,则删除本地对应的缓存  
            case watch.Deleted:
                err = cm.storage.Delete(key)
                delObjCnt++
            default:
                // impossible go to here
            }

           ....
    }
}

我们到对应的缓存目录下面可以发现很多缓存到本地的信息,比如 nodes, pods, secrets, services

ruby 复制代码
root@root:/etc/kubernetes/cache/kubelet# ls
configmaps  csidrivers  csinodes  events  leases  nodes  pods  secrets  services
  • 查询缓存
go 复制代码
func (cm *cacheManager) QueryCache(req *http.Request) (runtime.Object, error) {
    ctx := req.Context()
    info, ok := apirequest.RequestInfoFrom(ctx)
    if !ok || info == nil || info.Resource == "" {
        return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
    }
    if !info.IsResourceRequest {
        return nil, fmt.Errorf("failed to QueryCache for getting non-resource request %s", util.ReqString(req))
    }

    判断是什么操作
    switch info.Verb {
    case "list":
        return cm.queryListObject(req)
    case "get", "patch", "update":
        return cm.queryOneObject(req)
    default:
        return nil, fmt.Errorf("failed to QueryCache, unsupported verb %s of request %s", info.Verb, util.ReqString(req))
    }
}


func (cm *cacheManager) queryOneObject(req *http.Request) (runtime.Object, error) {
    ctx := req.Context()
    info, ok := apirequest.RequestInfoFrom(ctx)
    if !ok || info == nil || info.Resource == "" {
        return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
    }

    comp, _ := util.ClientComponentFrom(ctx)
    // query in-memory cache first
    var isInMemoryCache = isInMemeoryCache(ctx)
    var isInMemoryCacheMiss bool
    if isInMemoryCache {
        if obj, err := cm.queryInMemeryCache(info); err != nil {
            ....
        }
    }

    // fall back to normal query
    key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
        Component: comp,
        Namespace: info.Namespace,
        Name:      info.Name,
        Resources: info.Resource,
        Group:     info.APIGroup,
        Version:   info.APIVersion,
    })
    if err != nil {
        return nil, err
    }

    klog.V(4).Infof("component: %s try to get key: %s", comp, key.Key())
    obj, err := cm.storage.Get(key)
    if err != nil {
        klog.Errorf("failed to get obj %s from storage, %v", key.Key(), err)
        return nil, err
    }
     
    ....
}

注意这里的区别

In order to accelerate kubelet get node and lease object, we cache them

node 和 lease 的信息是直接缓存内存中的,这样是为了 kubelet 能够快速获取;但是看本地的缓存的文件,也会在本地文件中也缓存一份

创建GC

GC 的作用主要在于两点

  • Yurthub 启动的时候就会删除本地上缓存的,但是云端没有的pod的信息
  • 定时回收事件 Event

其基本的逻辑都是先判断本地的缓存,然后获取云上的缓存,然后本地存在但是云上不存在就会被GC回收掉

go 复制代码
func (m *GCManager) gcPodsWhenRestart() {
    localPodKeys, err := m.store.ListResourceKeysOfComponent("kubelet", schema.GroupVersionResource{
        Group:    "",
        Version:  "v1",
        Resource: "pods",
    })
    
    ....
    ....
    
    listOpts := metav1.ListOptions{FieldSelector: fields.OneTermEqualSelector("spec.nodeName", m.nodeName).String()}
    podList, err := kubeClient.CoreV1().Pods(v1.NamespaceAll).List(context.Background(), listOpts)
    if err != nil {
        klog.Errorf("could not list pods for node(%s), %v", m.nodeName, err)
        return
    }
    ....
     
这里执行主要的Diff逻辑
    deletedPods := make([]storage.Key, 0)
  for i := range localPodKeys {
 if _, ok := currentPodKeys[localPodKeys[i]]; !ok {
deletedPods = append (deletedPods, localPodKeys[i])
}
}

    if len(deletedPods) == len(localPodKeys) {
        klog.Infof("it's dangerous to gc all cache pods, so skip gc")
        return
    }

    for _, key := range deletedPods {
        if err := m.store.Delete(key); err != nil {
            klog.Errorf("failed to gc pod %s, %v", key, err)
        } else {
            klog.Infof("gc pod %s successfully", key)
        }
    }
}

构造 Yurt 反向代理 handler

作用

用于 proxy 客户端的请求

源码解析

go 复制代码
klog.Infof("%d. new reverse proxy handler for remote servers", trace)
yurtProxyHandler, err := proxy.NewYurtReverseProxyHandler( ... )
if err != nil {
    return fmt.Errorf("could not create reverse proxy handler, %w", err)
}

Yurthub 的反向代理处理的逻辑一共只有下面这几种

  1. 如果是 Cloud 模式,直接通过 LoabBalancer 进行访问
  2. 如果是 Edge 模式,需要先判断请求的类型(比如是否是 kubelet 的 leaseReq 或者事件的创建请求 EventCreateReq )
go 复制代码
func (p *yurtReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
    if p.workingMode == hubutil.WorkingModeCloud {
        p.loadBalancer.ServeHTTP(rw, req)
        return
    }
        
    switch {
   case util.IsKubeletLeaseReq(req):
p.handleKubeletLease(rw, req)
 case util.IsEventCreateRequest(req):
p.eventHandler(rw, req)
 case util.IsPoolScopedResouceListWatchRequest(req):
p.poolScopedResouceHandler(rw, req)
 case util.IsSubjectAccessReviewCreateGetRequest(req):
p.subjectAccessReviewHandler(rw, req)
    default:
        // For resource request that do not need to be handled by pool-coordinator,
        // handling the request with cloud apiserver or local cache.
        if p.cloudHealthChecker.IsHealthy() {
            p.loadBalancer.ServeHTTP(rw, req)
        } else {
            p.localProxy.ServeHTTP(rw, req)
        }
    }
}

针对边缘发起的请求,yurthub 会加一下对应的中间件 chain

每个中间件的作用各不相同

ini 复制代码
func (p *yurtReverseProxy) buildHandlerChain(handler http.Handler) http.Handler {
 1. 记录请求从yurthub代理出去到处理完毕的耗时
    handler = util.WithRequestTrace(handler)            
    
 2. 记录请求头部的Content类型
    handler = util.WithRequestContentType(handler)
    
 3. 是否需要缓存Response的头部header检查
    if p.workingMode == hubutil.WorkingModeEdge {
        handler = util.WithCacheHeaderCheck(handler)
    }
    ........
    return handler
}

一定要注意上面这个中间件的顺序。越往下的chain是越先被调用,所以记录是实际的请求处理延迟的chain要放在第一位,这样计算得才准确

启动 yurthub Servers

整体逻辑

Server 对应的 handler 主要有两类

  • hubServerHandler
  • proxyHandler

其中 proxyHandler 又分为 nonResourceReq 和 resouceReq 的处理逻辑

  • nonResourceReq 直接从本地的cache里面拿
  • resouceReq 转发到远端的 kube-apiserver 进行处理

源码解析

go 复制代码
// start yurthub proxy servers for forwarding requests to cloud kube-apiserver
if cfg.WorkingMode == util.WorkingModeEdge {
    proxyHandler = wrapNonResourceHandler(proxyHandler, cfg, rest)
}
if cfg.YurtHubProxyServerServing != nil {
    if err := cfg.YurtHubProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

if cfg.YurtHubDummyProxyServerServing != nil {
    if err := cfg.YurtHubDummyProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

if cfg.YurtHubSecureProxyServerServing != nil {
    if _, err := cfg.YurtHubSecureProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
        return err
    }
}

对如下的 non-resource 资源的的 handler 进行了额外的处理

go 复制代码
var nonResourceReqPaths = map[string]storage.ClusterInfoType{
    "/version":                       storage.Version,
    "/apis/discovery.k8s.io/v1":      storage.APIResourcesInfo,
    "/apis/discovery.k8s.io/v1beta1": storage.APIResourcesInfo,
}

直接从本地的缓存中进行 handler 处理

go 复制代码
func wrapNonResourceHandler(proxyHandler http.Handler, config *config.YurtHubConfiguration, restMgr *rest.RestConfigManager) http.Handler {
    wrapMux := mux.NewRouter()

    // register handler for non resource requests
    for path := range nonResourceReqPaths {
        wrapMux.Handle(path, localCacheHandler(nonResourceHandler, restMgr, config.StorageWrapper, path)).Methods("GET")
    }

    // register handler for other requests
    wrapMux.PathPrefix("/").Handler(proxyHandler)
    return wrapMux
}

FAQ

OpenYurt 将 kubelet 的请求接收到后,怎么识别具体是什么样的请求才进行转发到云上的api-server

总结逻辑如下

OpenYurt 安装成功后会自动修改掉 kubelet 的配置信息,让其不要访问 api-server

而是访问本地的 10261 端口,这个端口就是 yurthub 服务监听的本地端口

scss 复制代码
root@root:~ cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
    server: http://127.0.0.1:10261
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default-auth
  name: default-context
current-context: default-context
kind: Config
preferences: {}

本地服务收到请求之后,再从 http 的 Request 的 context 里面拿出对应的请求信息 RequestInfo

这个结构体会存储所有的客户端发送过来的HTTP请求的对 api-server 访问的具体信息

go 复制代码
// RequestInfo holds information parsed from the http.Request
type RequestInfo struct {
    // IsResourceRequest indicates whether or not the request is for an API resource or subresource
    IsResourceRequest bool
    // Path is the URL path of the request
    Path string
    // Verb is the kube verb associated with the request for API requests, not the http verb.  This includes things like list and watch.
    // for non-resource requests, this is the lowercase http verb
    Verb string

    APIPrefix  string
    APIGroup   string
    APIVersion string
    Namespace  string
    // Resource is the name of the resource being requested.  This is not the kind.  For example: pods
    Resource string
    // Subresource is the name of the subresource being requested.  This is a different resource, scoped to the parent resource, but it may have a different kind.
    // For instance, /pods has the resource "pods" and the kind "Pod", while /pods/foo/status has the resource "pods", the sub resource "status", and the kind "Pod"
    // (because status operates on pods). The binding resource for a pod though may be /pods/foo/binding, which has resource "pods", subresource "binding", and kind "Binding".
    Subresource string
    // Name is empty for some verbs, but if the request directly indicates a name (not in body content) then this field is filled in.
    Name string
    // Parts are the path parts for the request, always starting with /{resource}/{name}
    Parts []string
}

那 yurthub 拿到这个 http context 里面的信息可以进行判断客户端操作的是哪种资源了

相关推荐
Johny_Zhao1 小时前
Docker + CentOS 部署 Zookeeper 集群 + Kubernetes Operator 自动化运维方案
linux·网络安全·docker·信息安全·zookeeper·kubernetes·云计算·系统运维
木鱼时刻1 天前
容器与 Kubernetes 基本概念与架构
容器·架构·kubernetes
chuanauc2 天前
Kubernets K8s 学习
java·学习·kubernetes
庸子2 天前
基于Jenkins和Kubernetes构建DevOps自动化运维管理平台
运维·kubernetes·jenkins
李白你好2 天前
高级运维!Kubernetes(K8S)常用命令的整理集合
运维·容器·kubernetes
Connie14512 天前
k8s多集群管理中的联邦和舰队如何理解?
云原生·容器·kubernetes
伤不起bb3 天前
Kubernetes 服务发布基础
云原生·容器·kubernetes
别骂我h3 天前
Kubernetes服务发布基础
云原生·容器·kubernetes
weixin_399380693 天前
k8s一键部署tongweb企业版7049m6(by why+lqw)
java·linux·运维·服务器·云原生·容器·kubernetes
斯普信专业组3 天前
K8s环境下基于Nginx WebDAV与TLS/SSL的文件上传下载部署指南
nginx·kubernetes·ssl