定位
目前 OpenYurt 提供边缘节点自治功能:即在云边网络断连时,边缘节点重启或者业务容器重启的情况下,业务容器能够在边缘节点自动恢复,而不会被云端驱逐、重新调度。
- 节点维度的 SideCar,节点上组件和 kube-apiserver 之间的流量代理,有边缘(edge)和云端(cloud)两种运行模式。其中边缘 YurtHub 会缓存云端返回的数据。
- 部署形态:以 Static Pod 形态运行在每个节点上。
边缘自治
如果网络情况不好,时断时续,边缘节点无法正确上报自己的状态给中心,就会导致节点的驱逐
当 kubelet、KubeProxy、Flannel 等这些需要访问 KAS 的时候,会先访问 YurtHub,YurtHub 会将请求转发给KAS,之后会将请求后的结果放在缓存中(缓存在本地);如果后面存在断网的情况,则 kubelet 等这些直接就从Yurthub的缓存中拿数据,不去请求KAS了
YurtHub 通过本地缓存资源,使得在云边网络断连的情况下,Pod 以及 Kubelet 也能够通过 YurtHub 获取所需资源而保持其正常运行
Pod 无缝迁移
在原生 k8s 集群中,默认情况下,Pod 通过 InClusterConfig 地址访问 Kube APIServer。在云边场景中,云端与边缘可能位于不同的网络平面;此时,Pod 就无法通过 InClusterConfig 地址访问到 Kube APIServer。同时,在云边断连的情况下,边缘 Pod 需要重启时,由于无法连接 Kube APIServer 获取资源配置而重启失败
为了解决以上两个问题,YurtHub 提供了 Pod 零修改迁移到边缘环境中的能力。对于使用 InClusterConfig 访问 Kube APIServer 的 Pod,YurtHub 在不修改 Pod 本身的配置的前提下,自动调整节点上 Pod 的访问地址,将 Pod 的请求转发至 YurtHub,使得 Pod 在云边断网时也能够正常运行,实现了 Pod 在云场景下到云边场景的无缝迁移
实现原理
基本架构
Edge 模式的 Yurthub 访问方式
HTTP Server
代理所有的客户端请求,按照云边网络的联通性进行转发到对应的handler进行处理
Load Balancer
将边缘的请求转发到云上的一个具体的server,转发的方式有轮训和优先级调度这两种
Cache Manager
将从云端 api-server 返回的响应结果存储到本地 (local storage);如果边缘节点离线则直接由 Local-proxy 直接查询本地缓存来返回响应
Health Check
健康检查,周期性的检查边缘和云上的连接情况
Filter
过滤器用于修改从云端返回的响应response
vbnet
YurtHubServerAddr: "127.0.0.1:10267"
YurtHubProxyServerAddr: "127.0.0.1:10261",
YurtHubProxyServerSecureAddr: "127.0.0.1:10268",
YurtHubProxyServerDummyAddr: "169.254.2.1:10261",
YurtHubProxyServerSecureDummyAddr: "169.254.2.1:10268"
Yurthub 默认会在本地起这么几个地址进行监听,我们查看本机上的这几个地址在进行监听
ruby
root@root:~# netstat -tnpl | grep -i yurthub
tcp 0 0 127.0.0.1:10267 0.0.0.0:* LISTEN 1974065/yurthub
tcp 0 0 169.254.2.1:10268 0.0.0.0:* LISTEN 1974065/yurthub
tcp 0 0 127.0.0.1:10268 0.0.0.0:* LISTEN 1974065/yurthub
tcp 0 0 169.254.2.1:10261 0.0.0.0:* LISTEN 1974065/yurthub
tcp 0 0 127.0.0.1:10261 0.0.0.0:* LISTEN 1974065/yurthub
源码解析
我们进入到容器中实际观察,确实容器里面k8s的访问地址被修改成为了访问yurthub的地址
启动 transport manager
go
klog.Infof("%d. new transport manager", trace)
transportManager, err := transport.NewTransportManager(cfg.CertManager, ctx.Done())
if err != nil {
return fmt.Errorf("could not new transport manager, %w", err)
}
trace++
创建健康检查
go
// 创建Health-check
var cloudHealthChecker healthchecker.MultipleBackendsHealthChecker
if cfg.WorkingMode == util.WorkingModeEdge {
// Health Check 负责定期检测 Kube APIServer 是否可以访问,设置 Kube APIServer 的健康状态,作为请求转发给云端或者本地处理的依据
// 此外,Health Check 还负责将 YurtHub 的心跳信息更新到云端。
klog.Infof("%d. create health checkers for remote servers and pool coordinator", trace)
cloudHealthChecker, err = healthchecker.NewCloudAPIServerHealthChecker(cfg, cloudClients, ctx.Done())
if err != nil {
return fmt.Errorf("could not new cloud health checker, %w", err)
}
} else {
klog.Infof("%d. disable health checker for node %s because it is a cloud node", trace, cfg.NodeName)
// In cloud mode, cloud health checker is not needed.
// This fake checker will always report that the cloud is healthy and pool coordinator is unhealthy.
cloudHealthChecker = healthchecker.NewFakeChecker(true, make(map[string]int))
}
trace++
健康检查的关键点:
go
func (p *prober) Probe(phase string) bool {
if p.kubeletStopped() {
p.markAsUnhealthy(phase)
return false
}
baseLease := p.getLastNodeLease() 1.先获取节点的租约信息
lease, err := p.nodeLease.Update(baseLease) 2.更新租约
if err == nil {
if err := p.setLastNodeLease(lease); err != nil {
klog.Errorf("failed to store last node lease: %v", err)
}
p.markAsHealthy(phase) 3.更新成功,标记为健康状态
return true
}
klog.Errorf("failed to probe: %v, remote server %s", err, p.ServerName())
p.markAsUnhealthy(phase)
return false
}
进行节点健康检查的逻辑,定时启动 Probe 函数
go
func (hc *cloudAPIServerHealthChecker) run(stopCh <-chan struct{}) {
intervalTicker := time.NewTicker(time.Duration(hc.heartbeatInterval) * time.Second)
defer intervalTicker.Stop()
for {
select {
case <-stopCh:
klog.Infof("exit normally in health check loop.")
return
case <-intervalTicker.C:
// Ensure that the node heartbeat can be reported when there is a healthy remote server.
// Try to detect all remote server in a loop, if there is a remote server can update nodeLease, exit the loop.
for i := 0; i < len(hc.remoteServers); i++ {
p := hc.getProber()
if p.Probe(ProbePhaseNormal) {
break
}
}
}
}
}
创建Cache-Manager
只需要为边缘节点创建 cache-manager,不需要为云上节点创建
perl
var cacheMgr cachemanager.CacheManager
if cfg.WorkingMode == util.WorkingModeEdge {
klog.Infof("%d. new cache manager with storage wrapper and serializer manager", trace)
cacheMgr = cachemanager.NewCacheManager(cfg.StorageWrapper, cfg.SerializerManager, cfg.RESTMapperManager, cfg.SharedFactory)
} else {
klog.Infof("%d. disable cache manager for node %s because it is a cloud node", trace, cfg.NodeName)
}
我们分别看两个缓存Response和查询缓存的例子
- 缓存 Response
go
func (cm *cacheManager) CacheResponse(req *http.Request, prc io.ReadCloser, stopCh <-chan struct{}) error {
ctx := req.Context()
这里会从对应的 httpReq ctx 里面拿到对应的资源信息,然后判断是否是 watch 资源
是的话便会存储到本地的cache中
info, _ := apirequest.RequestInfoFrom(ctx)
if isWatch(ctx) {
return cm.saveWatchObject(ctx, info, prc, stopCh)
}
....
}
func (cm *cacheManager) saveWatchObject(ctx context.Context, info *apirequest.RequestInfo, r io.ReadCloser, stopCh <-chan struct{}) error {
....
....
for {
watchType, obj, err := d.Decode()
if err != nil {
klog.Errorf("%s %s watch decode ended with: %v", comp, info.Path, err)
return err
}
switch watchType {
case watch.Added, watch.Modified, watch.Deleted:
....
key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
Component: comp,
Namespace: ns,
Name: name,
Resources: info.Resource,
Group: info.APIGroup,
Version: info.APIVersion,
})
if err != nil {
klog.Errorf("failed to get cache path, %v", err)
continue
}
switch watchType {
这里会判断是否是添加或者修改事件,如果是便会将其存入到本地的cache
这个cache可以是内存,也可以是磁盘
case watch.Added, watch.Modified:
err = cm.storeObjectWithKey(key, obj)
if watchType == watch.Added {
addObjCnt++
} else {
updateObjCnt++
}
如果是删除事件,则删除本地对应的缓存
case watch.Deleted:
err = cm.storage.Delete(key)
delObjCnt++
default:
// impossible go to here
}
....
}
}
我们到对应的缓存目录下面可以发现很多缓存到本地的信息,比如 nodes, pods, secrets, services
ruby
root@root:/etc/kubernetes/cache/kubelet# ls
configmaps csidrivers csinodes events leases nodes pods secrets services
- 查询缓存
go
func (cm *cacheManager) QueryCache(req *http.Request) (runtime.Object, error) {
ctx := req.Context()
info, ok := apirequest.RequestInfoFrom(ctx)
if !ok || info == nil || info.Resource == "" {
return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
}
if !info.IsResourceRequest {
return nil, fmt.Errorf("failed to QueryCache for getting non-resource request %s", util.ReqString(req))
}
判断是什么操作
switch info.Verb {
case "list":
return cm.queryListObject(req)
case "get", "patch", "update":
return cm.queryOneObject(req)
default:
return nil, fmt.Errorf("failed to QueryCache, unsupported verb %s of request %s", info.Verb, util.ReqString(req))
}
}
func (cm *cacheManager) queryOneObject(req *http.Request) (runtime.Object, error) {
ctx := req.Context()
info, ok := apirequest.RequestInfoFrom(ctx)
if !ok || info == nil || info.Resource == "" {
return nil, fmt.Errorf("failed to get request info for request %s", util.ReqString(req))
}
comp, _ := util.ClientComponentFrom(ctx)
// query in-memory cache first
var isInMemoryCache = isInMemeoryCache(ctx)
var isInMemoryCacheMiss bool
if isInMemoryCache {
if obj, err := cm.queryInMemeryCache(info); err != nil {
....
}
}
// fall back to normal query
key, err := cm.storage.KeyFunc(storage.KeyBuildInfo{
Component: comp,
Namespace: info.Namespace,
Name: info.Name,
Resources: info.Resource,
Group: info.APIGroup,
Version: info.APIVersion,
})
if err != nil {
return nil, err
}
klog.V(4).Infof("component: %s try to get key: %s", comp, key.Key())
obj, err := cm.storage.Get(key)
if err != nil {
klog.Errorf("failed to get obj %s from storage, %v", key.Key(), err)
return nil, err
}
....
}
注意这里的区别
In order to accelerate kubelet get node and lease object, we cache them
node 和 lease 的信息是直接缓存内存中的,这样是为了 kubelet 能够快速获取;但是看本地的缓存的文件,也会在本地文件中也缓存一份
创建GC
GC 的作用主要在于两点
- Yurthub 启动的时候就会删除本地上缓存的,但是云端没有的pod的信息
- 定时回收事件 Event
其基本的逻辑都是先判断本地的缓存,然后获取云上的缓存,然后本地存在但是云上不存在就会被GC回收掉
go
func (m *GCManager) gcPodsWhenRestart() {
localPodKeys, err := m.store.ListResourceKeysOfComponent("kubelet", schema.GroupVersionResource{
Group: "",
Version: "v1",
Resource: "pods",
})
....
....
listOpts := metav1.ListOptions{FieldSelector: fields.OneTermEqualSelector("spec.nodeName", m.nodeName).String()}
podList, err := kubeClient.CoreV1().Pods(v1.NamespaceAll).List(context.Background(), listOpts)
if err != nil {
klog.Errorf("could not list pods for node(%s), %v", m.nodeName, err)
return
}
....
这里执行主要的Diff逻辑
deletedPods := make([]storage.Key, 0)
for i := range localPodKeys {
if _, ok := currentPodKeys[localPodKeys[i]]; !ok {
deletedPods = append (deletedPods, localPodKeys[i])
}
}
if len(deletedPods) == len(localPodKeys) {
klog.Infof("it's dangerous to gc all cache pods, so skip gc")
return
}
for _, key := range deletedPods {
if err := m.store.Delete(key); err != nil {
klog.Errorf("failed to gc pod %s, %v", key, err)
} else {
klog.Infof("gc pod %s successfully", key)
}
}
}
构造 Yurt 反向代理 handler
作用
用于 proxy 客户端的请求
源码解析
go
klog.Infof("%d. new reverse proxy handler for remote servers", trace)
yurtProxyHandler, err := proxy.NewYurtReverseProxyHandler( ... )
if err != nil {
return fmt.Errorf("could not create reverse proxy handler, %w", err)
}
Yurthub 的反向代理处理的逻辑一共只有下面这几种
- 如果是 Cloud 模式,直接通过 LoabBalancer 进行访问
- 如果是 Edge 模式,需要先判断请求的类型(比如是否是 kubelet 的 leaseReq 或者事件的创建请求 EventCreateReq )
go
func (p *yurtReverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
if p.workingMode == hubutil.WorkingModeCloud {
p.loadBalancer.ServeHTTP(rw, req)
return
}
switch {
case util.IsKubeletLeaseReq(req):
p.handleKubeletLease(rw, req)
case util.IsEventCreateRequest(req):
p.eventHandler(rw, req)
case util.IsPoolScopedResouceListWatchRequest(req):
p.poolScopedResouceHandler(rw, req)
case util.IsSubjectAccessReviewCreateGetRequest(req):
p.subjectAccessReviewHandler(rw, req)
default:
// For resource request that do not need to be handled by pool-coordinator,
// handling the request with cloud apiserver or local cache.
if p.cloudHealthChecker.IsHealthy() {
p.loadBalancer.ServeHTTP(rw, req)
} else {
p.localProxy.ServeHTTP(rw, req)
}
}
}
针对边缘发起的请求,yurthub 会加一下对应的中间件 chain
每个中间件的作用各不相同
ini
func (p *yurtReverseProxy) buildHandlerChain(handler http.Handler) http.Handler {
1. 记录请求从yurthub代理出去到处理完毕的耗时
handler = util.WithRequestTrace(handler)
2. 记录请求头部的Content类型
handler = util.WithRequestContentType(handler)
3. 是否需要缓存Response的头部header检查
if p.workingMode == hubutil.WorkingModeEdge {
handler = util.WithCacheHeaderCheck(handler)
}
........
return handler
}
一定要注意上面这个中间件的顺序。越往下的chain是越先被调用,所以记录是实际的请求处理延迟的chain要放在第一位,这样计算得才准确
启动 yurthub Servers
整体逻辑
Server 对应的 handler 主要有两类
- hubServerHandler
- proxyHandler
其中 proxyHandler 又分为 nonResourceReq 和 resouceReq 的处理逻辑
- nonResourceReq 直接从本地的cache里面拿
- resouceReq 转发到远端的 kube-apiserver 进行处理
源码解析
go
// start yurthub proxy servers for forwarding requests to cloud kube-apiserver
if cfg.WorkingMode == util.WorkingModeEdge {
proxyHandler = wrapNonResourceHandler(proxyHandler, cfg, rest)
}
if cfg.YurtHubProxyServerServing != nil {
if err := cfg.YurtHubProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
return err
}
}
if cfg.YurtHubDummyProxyServerServing != nil {
if err := cfg.YurtHubDummyProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
return err
}
}
if cfg.YurtHubSecureProxyServerServing != nil {
if _, err := cfg.YurtHubSecureProxyServerServing.Serve(proxyHandler, 0, stopCh); err != nil {
return err
}
}
对如下的 non-resource 资源的的 handler 进行了额外的处理
go
var nonResourceReqPaths = map[string]storage.ClusterInfoType{
"/version": storage.Version,
"/apis/discovery.k8s.io/v1": storage.APIResourcesInfo,
"/apis/discovery.k8s.io/v1beta1": storage.APIResourcesInfo,
}
直接从本地的缓存中进行 handler 处理
go
func wrapNonResourceHandler(proxyHandler http.Handler, config *config.YurtHubConfiguration, restMgr *rest.RestConfigManager) http.Handler {
wrapMux := mux.NewRouter()
// register handler for non resource requests
for path := range nonResourceReqPaths {
wrapMux.Handle(path, localCacheHandler(nonResourceHandler, restMgr, config.StorageWrapper, path)).Methods("GET")
}
// register handler for other requests
wrapMux.PathPrefix("/").Handler(proxyHandler)
return wrapMux
}
FAQ
OpenYurt 将 kubelet 的请求接收到后,怎么识别具体是什么样的请求才进行转发到云上的api-server
总结逻辑如下
OpenYurt 安装成功后会自动修改掉 kubelet 的配置信息,让其不要访问 api-server
而是访问本地的 10261 端口,这个端口就是 yurthub 服务监听的本地端口
scss
root@root:~ cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
server: http://127.0.0.1:10261
name: default-cluster
contexts:
- context:
cluster: default-cluster
namespace: default
user: default-auth
name: default-context
current-context: default-context
kind: Config
preferences: {}
本地服务收到请求之后,再从 http 的 Request 的 context 里面拿出对应的请求信息 RequestInfo
这个结构体会存储所有的客户端发送过来的HTTP请求的对 api-server 访问的具体信息
go
// RequestInfo holds information parsed from the http.Request
type RequestInfo struct {
// IsResourceRequest indicates whether or not the request is for an API resource or subresource
IsResourceRequest bool
// Path is the URL path of the request
Path string
// Verb is the kube verb associated with the request for API requests, not the http verb. This includes things like list and watch.
// for non-resource requests, this is the lowercase http verb
Verb string
APIPrefix string
APIGroup string
APIVersion string
Namespace string
// Resource is the name of the resource being requested. This is not the kind. For example: pods
Resource string
// Subresource is the name of the subresource being requested. This is a different resource, scoped to the parent resource, but it may have a different kind.
// For instance, /pods has the resource "pods" and the kind "Pod", while /pods/foo/status has the resource "pods", the sub resource "status", and the kind "Pod"
// (because status operates on pods). The binding resource for a pod though may be /pods/foo/binding, which has resource "pods", subresource "binding", and kind "Binding".
Subresource string
// Name is empty for some verbs, but if the request directly indicates a name (not in body content) then this field is filled in.
Name string
// Parts are the path parts for the request, always starting with /{resource}/{name}
Parts []string
}
那 yurthub 拿到这个 http context 里面的信息可以进行判断客户端操作的是哪种资源了