data:image/s3,"s3://crabby-images/20773/20773042de6b73a1ad4dbe29563916459a45a6ca" alt=""
写这篇文章是来填 很久之前挖下的坑。
本文涉及组件的源码版本如下:
- Kubernetes 1.24
- CRI 0.25.0
- Containerd 1.6
容器运行时(Container Runtime)是负责管理和执行容器的组件。它负责将容器镜像转化为在主机上运行的实际容器进程,提供镜像管理、容器的生命周期管理、资源隔离、文件系统、网络配置等功能。
data:image/s3,"s3://crabby-images/d2aee/d2aee191627c518cd07bff71fdc1cd0e9563c4e5" alt=""
常见容器运行时有下面这几种,这些容器运行时都提供了不同程度的功能和性能。但他们都遵循容器运行时接口(CRI),以便能够与 Kubernetes 或其他容器编排系统集成,实现容器的调度和管理。
有了 CRI,我们也可以"随意"地在几种容器运行时之间进行切换,而无需重新编译 Kubernetes。简单来讲,CRI 定义了所有对容器的操作,作为容器编排系统与容器运行时间的标准接口存在。
CRI 的前生今世
data:image/s3,"s3://crabby-images/3f20e/3f20e5980a26bbd06d9412e9fc285ad134a985c2" alt=""
CRI 的首次引入是在 Kubernets 1.5,初始版本是 v1alpha1
。在这之前,Kubernetes 需要在 kubelet 源码中维护对各个容器运行时的支持。
有了 CRI 之后,在 kubelet 中仅需支持 CRI 即可,然后通过一个中间层 CRI shim(grpc 服务器)与容器运行时进行交互。因为此时各家容器运行时实现还未支持 CRI。
在去年发布的 Kubernetes 1.24 中,正式移除了 Dockershim,与容易运行时的交互得到了简化。
Kubernetes 目前支持 CRI 的
v1alpha2
和v1
。其中v1
版本是在 Kubernetes 1.23 版本中引入的。
每次 kubelet 启动时,首先会尝试使用v1
的 API 与容器运行时进行连接。如果失败,才会尝试使用v1alpha2
。
kubelet 与 CRI
在之前做过的 kubelet 源码分析 中曾提到 Kubelet#syncLoop()
会持续监控来自 文件 、apiserver 、http 的变更,来更新 pod 的状态。写那篇文章的时候,分析到这里就结束了。因为这之后的工作就交给 容器运行时 来完成 sandbox 和各种容器的创建和运行,见 kubeGenericRuntimeManager#SyncPod()
。
kubelet 启动时便会 初始化 CRI 客户端,与容器运行时建立连接并确认 CRI 的版本。
创建 pod 的过程中,都会通过 CRI 与容器运行时进行交互:
- 创建 sandbox
- 创建容器
- 拉取镜像
参考源码
- pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L39
- pkg/kubelet/kuberuntime/kuberuntime_container.go#L176
- pkg/kubelet/images/image_manager.go#L89
接下来我们以 Containerd 为例,看下如何处理 kubelet 的请求。
Containerd 与 CRI
Containerd 的 criService
实现了 CRI 接口 RuntimeService
和 ImageService
的 RuntimeServiceServer
和 ImageServiceServer
。
cirService
会进一步包装成instrumentedService
,保证所有的操作都是在k8s.io
命名空间下执行的
RuntimeServiceServer
golang
type RuntimeServiceServer interface {
// Version returns the runtime name, runtime version, and runtime API version.
Version(context.Context, *VersionRequest) (*VersionResponse, error)
// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
// ListPodSandbox returns a list of PodSandboxes.
ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
// CreateContainer creates a new container in specified PodSandbox
CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
// StartContainer starts the container.
StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// The runtime must forcibly kill the container after the grace period is
// reached.
StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
// ListContainers lists all containers by filters.
ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
// UpdateContainerResources updates ContainerConfig of the container synchronously.
// If runtime fails to transactionally update the requested resources, an error is returned.
UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
// ExecSync runs a command in a container synchronously.
ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
// Exec prepares a streaming endpoint to execute a command in the container.
Exec(context.Context, *ExecRequest) (*ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container.
Attach(context.Context, *AttachRequest) (*AttachResponse, error)
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
// ListContainerStats returns stats of all running containers.
ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
// PodSandboxStats returns stats of the pod sandbox. If the pod sandbox does not
// exist, the call returns an error.
PodSandboxStats(context.Context, *PodSandboxStatsRequest) (*PodSandboxStatsResponse, error)
// ListPodSandboxStats returns stats of the pod sandboxes matching a filter.
ListPodSandboxStats(context.Context, *ListPodSandboxStatsRequest) (*ListPodSandboxStatsResponse, error)
// UpdateRuntimeConfig updates the runtime configuration based on the given request.
UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
// Status returns the status of the runtime.
Status(context.Context, *StatusRequest) (*StatusResponse, error)
// CheckpointContainer checkpoints a container
CheckpointContainer(context.Context, *CheckpointContainerRequest) (*CheckpointContainerResponse, error)
// GetContainerEvents gets container events from the CRI runtime
GetContainerEvents(*GetEventsRequest, RuntimeService_GetContainerEventsServer) error
}
ImageServiceServer
golang
type ImageServiceServer interface {
// ListImages lists existing images. ListImages(context.Context, *ListImagesRequest) (*ListImagesResponse, error)
// ImageStatus returns the status of the image. If the image is not // present, returns a response with ImageStatusResponse.Image set to // nil. ImageStatus(context.Context, *ImageStatusRequest) (*ImageStatusResponse, error)
// PullImage pulls an image with authentication config. PullImage(context.Context, *PullImageRequest) (*PullImageResponse, error)
// RemoveImage removes the image. // This call is idempotent, and must not return an error if the image has // already been removed. RemoveImage(context.Context, *RemoveImageRequest) (*RemoveImageResponse, error)
// ImageFSInfo returns information of the filesystem that is used to store images.
ImageFsInfo(context.Context, *ImageFsInfoRequest) (*ImageFsInfoResponse, error)
}
下面以创建 sandbox 为例看一下 Containerd 的源码。
Containerd 源码分析
创建 sandbox 容器的请求通过 CRI 的 UDS(Unix domain socket) 接口 /runtime.v1.RuntimeService/RunPodSandbox
,进入到 criService
的处理流程中。在 criService#RunPodSandbox()
,负责创建和运行 sandbox 容器,并保证容器状态正常。
- 下载 sandobx 容器镜像
- 初始化容器元数据
- 初始化 pod 网络命名空间,详细内容可参考之前的文章 源码解析:从 kubelet、容器运行时看 CNI 的使用
- 更新容器元数据
- 写入文件系统
参考源码
总结
CRI 提供了一种标准化的接口,用于与底层容器运行时进行交互。这对与发展和状大 Kubernetes 生态系统非常重要:
- Kubernetes 控制平面与容器管理的具体实现解耦,可以独立升级或者切换容器运行时,方便扩展和优化。
- Kubernetes 作为一个跨云、跨平台和多环境的容器编排系统,在不同的环境和场景下使用不同的容器平台。CRI 的出现,保证平台的多样性和灵活性。
关注"云原生指北"微信公众号 (转载本站文章请注明作者和出处乱世浮生,请勿用于任何商业用途)