前言
K8S使用的API风格为声明式API ,声明式API更加容易理解与维护,因为它们隐藏了底层的实现细节、我们通过yaml文件这种方式去创建、声明对应的对象,在K8S中的每一个对象都对应着一个yaml文件,即通过文件完成一次声明,而不是像命令式API一样通过一条条命令进行API请求交互。
每一个创建对象的yaml文件由又四部分组成,分别是APIVersion、Kind、Metadata和Spec,而一个已经存在的对象则会多一个status部分用于标识对象的当前状态。
实战
我们以创建一个最简单的Deployment对象来进行实验,因为我们不直接部署Pod,而是通过更高一层的抽象:Workload Resource (老版本也叫Controller)来进行对Pod的控制。
首先创建一个如下的my-app-deployment.yaml,这个Deployment旨在部署一个pod,这个pod只有一个容器,容器中运行着nginx服务。工作负载资源在创建Pod的时候,会按照pod模板里面的内容来创建实际的pod 。
yaml
# 指定apiVersion
apiVersion: apps/v1
# 指定对象类型
kind: Deployment
# 元数据 这里只给了必须的name
metadata:
name: my-app-deployment
# 规约:即期望状态
spec:
# pod数量=1
replicas: 1
# 选择标签为"my-app"的pod
selector:
matchLabels:
app: my-app
# pod 模板 同样包含了元数据和规约
template:
metadata:
# 带上标签
labels:
app: my-app
spec:
containers:
- name: nginx-container
image: nginx:latest
ports:
- containerPort: 80
kubectl apply -f my-app-deployment.yaml创建这个Deployment。
然后kubectl get deployment -n default查看default命名空间下的所有deployment 。

再通过kubectl get pods -n default来查看命名空间下的所有pod。

能看到我们这边已经成功创建了Deployment:my-app-deployment,以及对应的Pod,这里腾讯云的策略是在deployment后面加上两个随机的字符串来给Pod命名。
创建完成之后,我们可以看到deployment的yaml如下,附上详细注释,其中有一些字段是TKE腾讯云平台附加上去的,并给上了默认值。因为上面用来创建deployment的yaml非常的简陋,并没有满足TKE对于一个deployment所有字段的要求。
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"my-app-deployment","namespace":"default"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"my-app"}},"template":{"metadata":{"labels":{"app":"my-app"}},"spec":{"containers":[{"image":"nginx:latest","name":"nginx-container","ports":[{"containerPort":80}]}]}}}}
creationTimestamp: "2023-10-27T04:04:12Z"
//变更次数
generation: 1
//管理历史和变更历史
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
f:progressDeadlineSeconds: {}
f:replicas: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:strategy:
f:rollingUpdate:
.: {}
f:maxSurge: {}
f:maxUnavailable: {}
f:type: {}
f:template:
f:metadata:
f:labels:
.: {}
f:app: {}
f:spec:
f:containers:
k:{"name":"nginx-container"}:
.: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":80,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:protocol: {}
f:resources: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:terminationGracePeriodSeconds: {}
manager: kubectl-client-side-apply
operation: Update
time: "2023-10-27T04:04:12Z"
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:deployment.kubernetes.io/revision: {}
f:status:
f:availableReplicas: {}
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:observedGeneration: {}
f:readyReplicas: {}
f:replicas: {}
f:updatedReplicas: {}
manager: kube-controller-manager
operation: Update
time: "2023-10-27T08:08:46Z"
//名字 命名空间 乐观锁版本号和 UID
name: my-app-deployment
namespace: default
resourceVersion: "3031776370"
uid: d16672ca-d399-479b-acb3-67c34e0c2955
spec:
//退出等待时间
progressDeadlineSeconds: 600
//pod 副本数
replicas: 1
//保留旧副本数的数量
revisionHistoryLimit: 10
//选择器 选择管理的Pod
selector:
matchLabels:
app: my-app
//滚动升级策略
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
//Pod 模板
template:
metadata:
creationTimestamp: null
labels:
app: my-app
spec:
containers:
- image: nginx:latest
imagePullPolicy: Always
name: nginx-container
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
//当前状态
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2023-10-27T04:04:12Z"
lastUpdateTime: "2023-10-27T04:04:25Z"
message: ReplicaSet "my-app-deployment-bc747959b" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2023-10-27T08:08:46Z"
lastUpdateTime: "2023-10-27T08:08:46Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
再来看看Deployment管理的Pod的yaml文件,附上详细注释。
yaml
apiVersion: v1
kind: Pod
metadata:
// tke带上的注释
annotations:
tke.cloud.tencent.com/networks-status: |-
[{
"name": "tke-route-eni",
"interface": "eth1",
"ips": [
"30.170.46.185"
],
"mac": "62:ed:f6:ce:ed:f2",
"default": true,
"dns": {}
}]
creationTimestamp: "2023-10-27T08:08:41Z"
//一些元信息 标签、名字、命名空间等等
generateName: my-app-deployment-bc747959b-
labels:
app: my-app
pod-template-hash: bc747959b
name: my-app-deployment-bc747959b-8lb6x
namespace: default
//指定拥有这个Pod的资源,即Deployment
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: my-app-deployment-bc747959b
uid: 0224f611-f1bf-4381-9cbf-bdef635017d5
resourceVersion: "3031776358"
uid: d73aa562-bd14-4450-8516-eb9e56b97388
spec:
//期望的容器状态,如容器名称 镜像版本 拉取镜像策略等等
containers:
- image: nginx:latest
imagePullPolicy: Always
name: nginx-container
// 协议和端口
ports:
- containerPort: 80
protocol: TCP
// 资源的请求和限制
resources:
limits:
tke.cloud.tencent.com/eni-ip: "1"
requests:
tke.cloud.tencent.com/eni-ip: "1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
//数据卷
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-fxt2h
readOnly: true
//pod的一些策略,如dns策略、节点名称、抢占策略、优先级、重启策略、调度器名称、上下文配置、
//终止等待时间、容忍配置、数据卷等等
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: 11.147.33.67
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-fxt2h
secret:
defaultMode: 420
secretName: default-token-fxt2h
// 当前状态
status:
// pod条件
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-10-27T08:08:41Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-10-27T08:08:46Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-10-27T08:08:46Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-10-27T08:08:41Z"
status: "True"
type: PodScheduled
// 容器状态
containerStatuses:
- containerID: docker://295552b19dcf42f6848bdede8e0b6f24e67b0b8dd1d8e4d62c41b56d939ac8bc
image: nginx:latest
imageID: docker-pullable://nginx@sha256:add4792d930c25dd2abf2ef9ea79de578097a1c175a16ab25814332fe33622de
lastState: {}
name: nginx-container
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-10-27T08:08:46Z"
//pod的阶段、节点IP、自身IP、qos类别、开始时间等等
hostIP: 11.147.33.67
phase: Running
podIP: 30.170.46.185
podIPs:
- ip: 30.170.46.185
qosClass: BestEffort
startTime: "2023-10-27T08:08:41Z"
不难看出,pod最后的实际状态会和deployment的pod template规定的状态一致,包括pod的数量、pod中容器的数量、pod的各类配置等等(腾讯云附加)。
Pod属性与模板
如果是直接部署的pod,修改已经存在的pod template并不会对已经存在的pod实例产生直接影响。
但是如果你修改的是workload resource中的pod template,那么这个资源会重新根据新的模板修改pod,直到和模板中期望的状况一致
pod的某些属性是可以修改的,比如上述的一些策略、配置等等,我们认为这些属性是插拔式赋值的。
而有些属性是只读的,比如namespace\name\uid\createTimestamp 这些自动生成的固有属性,是只读且不可修改的,只能被K8S内部进行修改。
Pod共享资源
主要还是存储资源和网络资源两大块 ,存储方面,pod可以通过挂载volume,同一pod下的容器都可以访问volume。网络方面,容器共享同一IP地址和端口族,并且同一个Pod中的container可以使用localhost进行通信。当Pod内的容器访问外部的时候,就需要先沟通好如何使用这些网络资源。
pod的任何容器都可以以特权形式来运行操作系统的指令,如在Linux操作系统重,需要使用privileged标签来实现pod容器的特权形式。
Static Pod
static pod只由Kubelet进行直接管理,以一个守护进程的形式运行在指定的节点上。API server不需要观察他们,当pod状态错误,kubelet会重启他们,控制平面就是通过static pod的方式运行在某个结点上的。
kubelet会自动的创建一个镜像Pod在API-SERVER层面,以保证这些static pod对于Api server来说是可见的,但是它不能够管理它们
Pod源码实现
通过源码,我们能看到Pod结构体的定义由四部分组成
TypeMeta类型元数据Metadata对象元数据Spec期望规约状态Status当前具体状态
ruby
type Pod struct {
//类型元数据
metav1.TypeMeta `json:",inline"`
// Standard object's metadata.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// Specification of the desired behavior of the pod.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
// +optional
Spec PodSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
// Most recently observed status of the pod.
// This data may not be up to date.
// Populated by the system.
// Read-only.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
// +optional
Status PodStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}
TypeMeta
c
type TypeMeta struct {
// Kind is a string value representing the REST resource this object represents.
// Servers may infer this from the endpoint the client submits requests to.
// Cannot be updated.
// In CamelCase.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
// +optional
Kind string `json:"kind,omitempty" protobuf:"bytes,1,opt,name=kind"`
// APIVersion defines the versioned schema of this representation of an object.
// Servers should convert recognized schemas to the latest internal value, and
// may reject unrecognized values.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
// +optional
APIVersion string `json:"apiVersion,omitempty" protobuf:"bytes,2,opt,name=apiVersion"`
}
TypeMeta由kind和APIVersion两部分组成,K8S的API是整个K8S系统最核心的部分,所有内部的管理、同步、编排、调度,本质上都是通过API来完成的,不管是通过yaml文件以声明式API的方式创建对象,还是workload 管理 pod、 pod被调度到node、pod管理容器,K8S内部的所有编排、管理、调度,本质上都是一次次JSON数据格式的HTTP 请求打到Api server,然后被处理后返回。
想一想,你想要发送 API Server能够处理的请求,那么是不是就得按照一定的格式进行发送?
最起码要保证Api Server能看懂对不对,想要Api server能够看懂你的请求,就得保证请求是以这几部分构成的:apiversion kind metadata spec,只有带上这些, Api server才能够看懂你的请求。
kind:pod、 service、 depoyment等等,表示对象是哪一种资源。
api-version: 在K8S早期,对象没有那么多,所有的内置资源都在核心的API组中,可随着K8S的发展,新增的资源越来越多,为了更好的组织这些资源,相关API更加模块化增加可管理性,K8S于是开始了API的分组。
例如与应用相关的资源(Deployment、StatefulSet、DaemonSet)等被放在apps组中,它们的版本号是apps/v1,与批处理相关的资源(Job/CornJob)等放在batch组中,与扩缩容相关的放在autoscaling组中。
Metadata
c
type ObjectMeta struct {
// Name must be unique within a namespace. Is required when creating resources, although
// some resources may allow a client to request the generation of an appropriate name
// automatically. Name is primarily intended for creation idempotence and configuration
// definition.
// Cannot be updated.
// More info: http://kubernetes.io/docs/user-guide/identifiers#names
// +optional
Name string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
// GenerateName is an optional prefix, used by the server, to generate a unique
// name ONLY IF the Name field has not been provided.
// If this field is used, the name returned to the client will be different
// than the name passed. This value will also be combined with a unique suffix.
// The provided value has the same validation rules as the Name field,
// and may be truncated by the length of the suffix required to make the value
// unique on the server.
//
// If this field is specified and the generated name exists, the server will
// NOT return a 409 - instead, it will either return 201 Created or 500 with Reason
// ServerTimeout indicating a unique name could not be found in the time allotted, and the client
// should retry (optionally after the time indicated in the Retry-After header).
//
// Applied only if Name is not specified.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#idempotency
// +optional
GenerateName string `json:"generateName,omitempty" protobuf:"bytes,2,opt,name=generateName"`
// Namespace defines the space within which each name must be unique. An empty namespace is
// equivalent to the "default" namespace, but "default" is the canonical representation.
// Not all objects are required to be scoped to a namespace - the value of this field for
// those objects will be empty.
//
// Must be a DNS_LABEL.
// Cannot be updated.
// More info: http://kubernetes.io/docs/user-guide/namespaces
// +optional
Namespace string `json:"namespace,omitempty" protobuf:"bytes,3,opt,name=namespace"`
// SelfLink is a URL representing this object.
// Populated by the system.
// Read-only.
//
// DEPRECATED
// Kubernetes will stop propagating this field in 1.20 release and the field is planned
// to be removed in 1.21 release.
// +optional
SelfLink string `json:"selfLink,omitempty" protobuf:"bytes,4,opt,name=selfLink"`
// UID is the unique in time and space value for this object. It is typically generated by
// the server on successful creation of a resource and is not allowed to change on PUT
// operations.
//
// Populated by the system.
// Read-only.
// More info: http://kubernetes.io/docs/user-guide/identifiers#uids
// +optional
UID types.UID `json:"uid,omitempty" protobuf:"bytes,5,opt,name=uid,casttype=k8s.io/kubernetes/pkg/types.UID"`
// An opaque value that represents the internal version of this object that can
// be used by clients to determine when objects have changed. May be used for optimistic
// concurrency, change detection, and the watch operation on a resource or set of resources.
// Clients must treat these values as opaque and passed unmodified back to the server.
// They may only be valid for a particular resource or set of resources.
//
// Populated by the system.
// Read-only.
// Value must be treated as opaque by clients and .
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
// +optional
ResourceVersion string `json:"resourceVersion,omitempty" protobuf:"bytes,6,opt,name=resourceVersion"`
// A sequence number representing a specific generation of the desired state.
// Populated by the system. Read-only.
// +optional
Generation int64 `json:"generation,omitempty" protobuf:"varint,7,opt,name=generation"`
// CreationTimestamp is a timestamp representing the server time when this object was
// created. It is not guaranteed to be set in happens-before order across separate operations.
// Clients may not set this value. It is represented in RFC3339 form and is in UTC.
//
// Populated by the system.
// Read-only.
// Null for lists.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
CreationTimestamp Time `json:"creationTimestamp,omitempty" protobuf:"bytes,8,opt,name=creationTimestamp"`
// DeletionTimestamp is RFC 3339 date and time at which this resource will be deleted. This
// field is set by the server when a graceful deletion is requested by the user, and is not
// directly settable by a client. The resource is expected to be deleted (no longer visible
// from resource lists, and not reachable by name) after the time in this field, once the
// finalizers list is empty. As long as the finalizers list contains items, deletion is blocked.
// Once the deletionTimestamp is set, this value may not be unset or be set further into the
// future, although it may be shortened or the resource may be deleted prior to this time.
// For example, a user may request that a pod is deleted in 30 seconds. The Kubelet will react
// by sending a graceful termination signal to the containers in the pod. After that 30 seconds,
// the Kubelet will send a hard termination signal (SIGKILL) to the container and after cleanup,
// remove the pod from the API. In the presence of network partitions, this object may still
// exist after this timestamp, until an administrator or automated process can determine the
// resource is fully terminated.
// If not set, graceful deletion of the object has not been requested.
//
// Populated by the system when a graceful deletion is requested.
// Read-only.
// More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
// +optional
DeletionTimestamp *Time `json:"deletionTimestamp,omitempty" protobuf:"bytes,9,opt,name=deletionTimestamp"`
// Number of seconds allowed for this object to gracefully terminate before
// it will be removed from the system. Only set when deletionTimestamp is also set.
// May only be shortened.
// Read-only.
// +optional
DeletionGracePeriodSeconds *int64 `json:"deletionGracePeriodSeconds,omitempty" protobuf:"varint,10,opt,name=deletionGracePeriodSeconds"`
// Map of string keys and values that can be used to organize and categorize
// (scope and select) objects. May match selectors of replication controllers
// and services.
// More info: http://kubernetes.io/docs/user-guide/labels
// +optional
Labels map[string]string `json:"labels,omitempty" protobuf:"bytes,11,rep,name=labels"`
// Annotations is an unstructured key value map stored with a resource that may be
// set by external tools to store and retrieve arbitrary metadata. They are not
// queryable and should be preserved when modifying objects.
// More info: http://kubernetes.io/docs/user-guide/annotations
// +optional
Annotations map[string]string `json:"annotations,omitempty" protobuf:"bytes,12,rep,name=annotations"`
// List of objects depended by this object. If ALL objects in the list have
// been deleted, this object will be garbage collected. If this object is managed by a controller,
// then an entry in this list will point to this controller, with the controller field set to true.
// There cannot be more than one managing controller.
// +optional
// +patchMergeKey=uid
// +patchStrategy=merge
OwnerReferences []OwnerReference `json:"ownerReferences,omitempty" patchStrategy:"merge" patchMergeKey:"uid" protobuf:"bytes,13,rep,name=ownerReferences"`
// Must be empty before the object is deleted from the registry. Each entry
// is an identifier for the responsible component that will remove the entry
// from the list. If the deletionTimestamp of the object is non-nil, entries
// in this list can only be removed.
// Finalizers may be processed and removed in any order. Order is NOT enforced
// because it introduces significant risk of stuck finalizers.
// finalizers is a shared field, any actor with permission can reorder it.
// If the finalizer list is processed in order, then this can lead to a situation
// in which the component responsible for the first finalizer in the list is
// waiting for a signal (field value, external system, or other) produced by a
// component responsible for a finalizer later in the list, resulting in a deadlock.
// Without enforced ordering finalizers are free to order amongst themselves and
// are not vulnerable to ordering changes in the list.
// +optional
// +patchStrategy=merge
Finalizers []string `json:"finalizers,omitempty" patchStrategy:"merge" protobuf:"bytes,14,rep,name=finalizers"`
// The name of the cluster which the object belongs to.
// This is used to distinguish resources with same name and namespace in different clusters.
// This field is not set anywhere right now and apiserver is going to ignore it if set in create or update request.
// +optional
ClusterName string `json:"clusterName,omitempty" protobuf:"bytes,15,opt,name=clusterName"`
// ManagedFields maps workflow-id and version to the set of fields
// that are managed by that workflow. This is mostly for internal
// housekeeping, and users typically shouldn't need to set or
// understand this field. A workflow can be the user's name, a
// controller's name, or the name of a specific apply path like
// "ci-cd". The set of fields is always in the version that the
// workflow used when modifying the object.
//
// +optional
ManagedFields []ManagedFieldsEntry `json:"managedFields,omitempty" protobuf:"bytes,17,rep,name=managedFields"`
}
Metadata中字段就比较多了,其中要注意的有:
Name:在同一namespace下必须是唯一的,表示这个pod定义的唯一标识,同时pod会带上Namespace\ClusterName这两个字段来表示自己当前的命名空间。UID:是pod实例的唯一标识,与Name不同,Name只能作为pod的唯一标识,但是在pod重启时,其实是创建一个新的pod来替换掉老的pod,所以只有UID能标识pod实例ResourceVersion:是pod的版本号,用来做乐观控制Labels:元数据键值对,用来存储标签的,主要用于标识和选择资源对象,比如想把pod调度到哪个node,或者在deployment里选择要管理的pod,通过service把流量路由到具有指定标签的pod等等。Annotations:元数据键值对,用来存注释,还有一些其他的只读字段,用来做K8S内部细节控制,或者自动生成。可存储我们自己需要的很多信息,比如联系信息、版本等冗余信息,也可以用来实现特殊功能,比如加锁或者实现简易优先级等等,我们可以认为带有某个key的对象是被锁住的,带有某个key的对象应该是被删除的等等。
Spec
c
// PodSpec is a description of a pod.
type PodSpec struct {
// List of volumes that can be mounted by containers belonging to the pod.
// More info: https://kubernetes.io/docs/concepts/storage/volumes
// +optional
// +patchMergeKey=name
// +patchStrategy=merge,retainKeys
Volumes []Volume `json:"volumes,omitempty" patchStrategy:"merge,retainKeys" patchMergeKey:"name" protobuf:"bytes,1,rep,name=volumes"`
// List of initialization containers belonging to the pod.
// Init containers are executed in order prior to containers being started. If any
// init container fails, the pod is considered to have failed and is handled according
// to its restartPolicy. The name for an init container or normal container must be
// unique among all containers.
// Init containers may not have Lifecycle actions, Readiness probes, Liveness probes, or Startup probes.
// The resourceRequirements of an init container are taken into account during scheduling
// by finding the highest request/limit for each resource type, and then using the max of
// of that value or the sum of the normal containers. Limits are applied to init containers
// in a similar fashion.
// Init containers cannot currently be added or removed.
// Cannot be updated.
// More info: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
// +patchMergeKey=name
// +patchStrategy=merge
InitContainers []Container `json:"initContainers,omitempty" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,20,rep,name=initContainers"`
// List of containers belonging to the pod.
// Containers cannot currently be added or removed.
// There must be at least one container in a Pod.
// Cannot be updated.
// +patchMergeKey=name
// +patchStrategy=merge
Containers []Container `json:"containers" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,2,rep,name=containers"`
// List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
// pod to perform user-initiated actions such as debugging. This list cannot be specified when
// creating a pod, and it cannot be modified by updating the pod spec. In order to add an
// ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
// This field is alpha-level and is only honored by servers that enable the EphemeralContainers feature.
// +optional
// +patchMergeKey=name
// +patchStrategy=merge
EphemeralContainers []EphemeralContainer `json:"ephemeralContainers,omitempty" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,34,rep,name=ephemeralContainers"`
// Restart policy for all containers within the pod.
// One of Always, OnFailure, Never.
// Default to Always.
// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
// +optional
RestartPolicy RestartPolicy `json:"restartPolicy,omitempty" protobuf:"bytes,3,opt,name=restartPolicy,casttype=RestartPolicy"`
// Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
// Value must be non-negative integer. The value zero indicates delete immediately.
// If this value is nil, the default grace period will be used instead.
// The grace period is the duration in seconds after the processes running in the pod are sent
// a termination signal and the time when the processes are forcibly halted with a kill signal.
// Set this value longer than the expected cleanup time for your process.
// Defaults to 30 seconds.
// +optional
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty" protobuf:"varint,4,opt,name=terminationGracePeriodSeconds"`
// Optional duration in seconds the pod may be active on the node relative to
// StartTime before the system will actively try to mark it failed and kill associated containers.
// Value must be a positive integer.
// +optional
ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty" protobuf:"varint,5,opt,name=activeDeadlineSeconds"`
// Set DNS policy for the pod.
// Defaults to "ClusterFirst".
// Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
// DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
// To have DNS options set along with hostNetwork, you have to specify DNS policy
// explicitly to 'ClusterFirstWithHostNet'.
// +optional
DNSPolicy DNSPolicy `json:"dnsPolicy,omitempty" protobuf:"bytes,6,opt,name=dnsPolicy,casttype=DNSPolicy"`
// NodeSelector is a selector which must be true for the pod to fit on a node.
// Selector which must match a node's labels for the pod to be scheduled on that node.
// More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
// +optional
NodeSelector map[string]string `json:"nodeSelector,omitempty" protobuf:"bytes,7,rep,name=nodeSelector"`
// ServiceAccountName is the name of the ServiceAccount to use to run this pod.
// More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
// +optional
ServiceAccountName string `json:"serviceAccountName,omitempty" protobuf:"bytes,8,opt,name=serviceAccountName"`
// DeprecatedServiceAccount is a depreciated alias for ServiceAccountName.
// Deprecated: Use serviceAccountName instead.
// +k8s:conversion-gen=false
// +optional
DeprecatedServiceAccount string `json:"serviceAccount,omitempty" protobuf:"bytes,9,opt,name=serviceAccount"`
// AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
// +optional
AutomountServiceAccountToken *bool `json:"automountServiceAccountToken,omitempty" protobuf:"varint,21,opt,name=automountServiceAccountToken"`
// NodeName is a request to schedule this pod onto a specific node. If it is non-empty,
// the scheduler simply schedules this pod onto that node, assuming that it fits resource
// requirements.
// +optional
NodeName string `json:"nodeName,omitempty" protobuf:"bytes,10,opt,name=nodeName"`
// Host networking requested for this pod. Use the host's network namespace.
// If this option is set, the ports that will be used must be specified.
// Default to false.
// +k8s:conversion-gen=false
// +optional
HostNetwork bool `json:"hostNetwork,omitempty" protobuf:"varint,11,opt,name=hostNetwork"`
// Use the host's pid namespace.
// Optional: Default to false.
// +k8s:conversion-gen=false
// +optional
HostPID bool `json:"hostPID,omitempty" protobuf:"varint,12,opt,name=hostPID"`
// Use the host's ipc namespace.
// Optional: Default to false.
// +k8s:conversion-gen=false
// +optional
HostIPC bool `json:"hostIPC,omitempty" protobuf:"varint,13,opt,name=hostIPC"`
// Share a single process namespace between all of the containers in a pod.
// When this is set containers will be able to view and signal processes from other containers
// in the same pod, and the first process in each container will not be assigned PID 1.
// HostPID and ShareProcessNamespace cannot both be set.
// Optional: Default to false.
// +k8s:conversion-gen=false
// +optional
ShareProcessNamespace *bool `json:"shareProcessNamespace,omitempty" protobuf:"varint,27,opt,name=shareProcessNamespace"`
// SecurityContext holds pod-level security attributes and common container settings.
// Optional: Defaults to empty. See type description for default values of each field.
// +optional
SecurityContext *PodSecurityContext `json:"securityContext,omitempty" protobuf:"bytes,14,opt,name=securityContext"`
// ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
// If specified, these secrets will be passed to individual puller implementations for them to use. For example,
// in the case of docker, only DockerConfig type secrets are honored.
// More info: https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
// +optional
// +patchMergeKey=name
// +patchStrategy=merge
ImagePullSecrets []LocalObjectReference `json:"imagePullSecrets,omitempty" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,15,rep,name=imagePullSecrets"`
// Specifies the hostname of the Pod
// If not specified, the pod's hostname will be set to a system-defined value.
// +optional
Hostname string `json:"hostname,omitempty" protobuf:"bytes,16,opt,name=hostname"`
// If specified, the fully qualified Pod hostname will be "<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>".
// If not specified, the pod will not have a domainname at all.
// +optional
Subdomain string `json:"subdomain,omitempty" protobuf:"bytes,17,opt,name=subdomain"`
// If specified, the pod's scheduling constraints
// +optional
Affinity *Affinity `json:"affinity,omitempty" protobuf:"bytes,18,opt,name=affinity"`
// If specified, the pod will be dispatched by specified scheduler.
// If not specified, the pod will be dispatched by default scheduler.
// +optional
SchedulerName string `json:"schedulerName,omitempty" protobuf:"bytes,19,opt,name=schedulerName"`
// If specified, the pod's tolerations.
// +optional
Tolerations []Toleration `json:"tolerations,omitempty" protobuf:"bytes,22,opt,name=tolerations"`
// HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
// file if specified. This is only valid for non-hostNetwork pods.
// +optional
// +patchMergeKey=ip
// +patchStrategy=merge
HostAliases []HostAlias `json:"hostAliases,omitempty" patchStrategy:"merge" patchMergeKey:"ip" protobuf:"bytes,23,rep,name=hostAliases"`
// If specified, indicates the pod's priority. "system-node-critical" and
// "system-cluster-critical" are two special keywords which indicate the
// highest priorities with the former being the highest priority. Any other
// name must be defined by creating a PriorityClass object with that name.
// If not specified, the pod priority will be default or zero if there is no
// default.
// +optional
PriorityClassName string `json:"priorityClassName,omitempty" protobuf:"bytes,24,opt,name=priorityClassName"`
// The priority value. Various system components use this field to find the
// priority of the pod. When Priority Admission Controller is enabled, it
// prevents users from setting this field. The admission controller populates
// this field from PriorityClassName.
// The higher the value, the higher the priority.
// +optional
Priority *int32 `json:"priority,omitempty" protobuf:"bytes,25,opt,name=priority"`
// Specifies the DNS parameters of a pod.
// Parameters specified here will be merged to the generated DNS
// configuration based on DNSPolicy.
// +optional
DNSConfig *PodDNSConfig `json:"dnsConfig,omitempty" protobuf:"bytes,26,opt,name=dnsConfig"`
// If specified, all readiness gates will be evaluated for pod readiness.
// A pod is ready when all its containers are ready AND
// all conditions specified in the readiness gates have status equal to "True"
// More info: https://git.k8s.io/enhancements/keps/sig-network/0007-pod-ready%2B%2B.md
// +optional
ReadinessGates []PodReadinessGate `json:"readinessGates,omitempty" protobuf:"bytes,28,opt,name=readinessGates"`
// RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
// to run this pod. If no RuntimeClass resource matches the named class, the pod will not be run.
// If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
// empty definition that uses the default runtime handler.
// More info: https://git.k8s.io/enhancements/keps/sig-node/runtime-class.md
// This is a beta feature as of Kubernetes v1.14.
// +optional
RuntimeClassName *string `json:"runtimeClassName,omitempty" protobuf:"bytes,29,opt,name=runtimeClassName"`
// EnableServiceLinks indicates whether information about services should be injected into pod's
// environment variables, matching the syntax of Docker links.
// Optional: Defaults to true.
// +optional
EnableServiceLinks *bool `json:"enableServiceLinks,omitempty" protobuf:"varint,30,opt,name=enableServiceLinks"`
// PreemptionPolicy is the Policy for preempting pods with lower priority.
// One of Never, PreemptLowerPriority.
// Defaults to PreemptLowerPriority if unset.
// This field is beta-level, gated by the NonPreemptingPriority feature-gate.
// +optional
PreemptionPolicy *PreemptionPolicy `json:"preemptionPolicy,omitempty" protobuf:"bytes,31,opt,name=preemptionPolicy"`
// Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
// This field will be autopopulated at admission time by the RuntimeClass admission controller. If
// the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
// The RuntimeClass admission controller will reject Pod create requests which have the overhead already
// set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
// defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
// More info: https://git.k8s.io/enhancements/keps/sig-node/20190226-pod-overhead.md
// This field is alpha-level as of Kubernetes v1.16, and is only honored by servers that enable the PodOverhead feature.
// +optional
Overhead ResourceList `json:"overhead,omitempty" protobuf:"bytes,32,opt,name=overhead"`
// TopologySpreadConstraints describes how a group of pods ought to spread across topology
// domains. Scheduler will schedule pods in a way which abides by the constraints.
// All topologySpreadConstraints are ANDed.
// +optional
// +patchMergeKey=topologyKey
// +patchStrategy=merge
// +listType=map
// +listMapKey=topologyKey
// +listMapKey=whenUnsatisfiable
TopologySpreadConstraints []TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty" patchStrategy:"merge" patchMergeKey:"topologyKey" protobuf:"bytes,33,opt,name=topologySpreadConstraints"`
// If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
// In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
// In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters to FQDN.
// If a pod does not have FQDN, this has no effect.
// Default to false.
// +optional
SetHostnameAsFQDN *bool `json:"setHostnameAsFQDN,omitempty" protobuf:"varint,35,opt,name=setHostnameAsFQDN"`
}
Spec的字段有很多,主要也是由几大部分组成
容器相关字段
初始化容器:这些容器肯定会先启动,启动完成之后才会去启动apps container。
容器列表:pod管理的所有容器列表,包括初始化容器、业务容器、暂时容器。
暂时容器列表:可以临时添加和删除,不需要重启整个Pod,通常用来做一些临时操作或者短暂任务。
容器重启策略:在容器失败的时候通过对应的策略来进行重启,有Always、OnFailure、Never三种。
时间相关字段
优雅退出时间:Pod的终止期限,在这个时间段内容器可以进行清理操作,比如释放数据库连接等等,如果超时了未能正常终止,那么K8S将会强制终止容器。
终止期限:Pod的最大执行时间,保证了在运行一段时间之后Pod会终止,而不会无限时间运行。
资源
数据卷:同一个pod的数据卷对于多个容器来说是共享的。
其他资源:Service相关、网络相关字段,比如Ip地址以及Port族等等。
调度相关
Node选择器:选择具有相关标签的Node,将pod调度到对应的结点上。
容忍点:在特定情况下仍然允许Pod调度到Node上,比如存在污点,或者Node处于维护期等等。
亲和性:调度时偏向或者远离某些Node,取决于这是正亲和性还是负亲和性。
调度器:Pod被那个调度器调度,可以是集群默认的调度器,也可以是自己编写的调度器。
Status
c
type PodStatus struct {
// The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle.
// The conditions array, the reason and message fields, and the individual container status
// arrays contain more detail about the pod's status.
// There are five possible phase values:
//
// Pending: The pod has been accepted by the Kubernetes system, but one or more of the
// container images has not been created. This includes time before being scheduled as
// well as time spent downloading images over the network, which could take a while.
// Running: The pod has been bound to a node, and all of the containers have been created.
// At least one container is still running, or is in the process of starting or restarting.
// Succeeded: All containers in the pod have terminated in success, and will not be restarted.
// Failed: All containers in the pod have terminated, and at least one container has
// terminated in failure. The container either exited with non-zero status or was terminated
// by the system.
// Unknown: For some reason the state of the pod could not be obtained, typically due to an
// error in communicating with the host of the pod.
//
// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase
// +optional
Phase PodPhase `json:"phase,omitempty" protobuf:"bytes,1,opt,name=phase,casttype=PodPhase"`
// Current service state of pod.
// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-conditions
// +optional
// +patchMergeKey=type
// +patchStrategy=merge
Conditions []PodCondition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,2,rep,name=conditions"`
// A human readable message indicating details about why the pod is in this condition.
// +optional
Message string `json:"message,omitempty" protobuf:"bytes,3,opt,name=message"`
// A brief CamelCase message indicating details about why the pod is in this state.
// e.g. 'Evicted'
// +optional
Reason string `json:"reason,omitempty" protobuf:"bytes,4,opt,name=reason"`
// nominatedNodeName is set only when this pod preempts other pods on the node, but it cannot be
// scheduled right away as preemption victims receive their graceful termination periods.
// This field does not guarantee that the pod will be scheduled on this node. Scheduler may decide
// to place the pod elsewhere if other nodes become available sooner. Scheduler may also decide to
// give the resources on this node to a higher priority pod that is created after preemption.
// As a result, this field may be different than PodSpec.nodeName when the pod is
// scheduled.
// +optional
NominatedNodeName string `json:"nominatedNodeName,omitempty" protobuf:"bytes,11,opt,name=nominatedNodeName"`
// IP address of the host to which the pod is assigned. Empty if not yet scheduled.
// +optional
HostIP string `json:"hostIP,omitempty" protobuf:"bytes,5,opt,name=hostIP"`
// IP address allocated to the pod. Routable at least within the cluster.
// Empty if not yet allocated.
// +optional
PodIP string `json:"podIP,omitempty" protobuf:"bytes,6,opt,name=podIP"`
// podIPs holds the IP addresses allocated to the pod. If this field is specified, the 0th entry must
// match the podIP field. Pods may be allocated at most 1 value for each of IPv4 and IPv6. This list
// is empty if no IPs have been allocated yet.
// +optional
// +patchStrategy=merge
// +patchMergeKey=ip
PodIPs []PodIP `json:"podIPs,omitempty" protobuf:"bytes,12,rep,name=podIPs" patchStrategy:"merge" patchMergeKey:"ip"`
// RFC 3339 date and time at which the object was acknowledged by the Kubelet.
// This is before the Kubelet pulled the container image(s) for the pod.
// +optional
StartTime *metav1.Time `json:"startTime,omitempty" protobuf:"bytes,7,opt,name=startTime"`
// The list has one entry per init container in the manifest. The most recent successful
// init container will have ready = true, the most recently started container will have
// startTime set.
// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-and-container-status
InitContainerStatuses []ContainerStatus `json:"initContainerStatuses,omitempty" protobuf:"bytes,10,rep,name=initContainerStatuses"`
// The list has one entry per container in the manifest. Each entry is currently the output
// of `docker inspect`.
// More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-and-container-status
// +optional
ContainerStatuses []ContainerStatus `json:"containerStatuses,omitempty" protobuf:"bytes,8,rep,name=containerStatuses"`
// The Quality of Service (QOS) classification assigned to the pod based on resource requirements
// See PodQOSClass type for available QOS classes
// More info: https://git.k8s.io/community/contributors/design-proposals/node/resource-qos.md
// +optional
QOSClass PodQOSClass `json:"qosClass,omitempty" protobuf:"bytes,9,rep,name=qosClass"`
// Status for any ephemeral containers that have run in this pod.
// This field is alpha-level and is only populated by servers that enable the EphemeralContainers feature.
// +optional
EphemeralContainerStatuses []ContainerStatus `json:"ephemeralContainerStatuses,omitempty" protobuf:"bytes,13,rep,name=ephemeralContainerStatuses"`
}
相比于Spec,Status的字段会少很多,这是因为它只代表着当前状态,所以只有一些必要的字段,还有很多详情状态在ContainerStatus层面保存,主要由以下几部分组成
- 状态相关:
pod处于什么阶段、有什么条件。 - 信息相关:当前有什么信息,产生的原因是什么。
- 当前资源:
HostIP,即Node节点Ip地址。PodIP,即当前Pod可用的IP族。初始化容器的状态列表,所有容器的状态列表,暂时容器的列表状态等等,Pod QOS优先级类型等等。
结语
今天这篇博客主要从yaml和Go源码两个方面向大家剖析Pod底层究竟有哪些字段,每个字段有什么含义。
我们首先通过一个简单的Deployment来创建了一个简单的Pod,然后探索这两个对象的yaml文件,向大家解释了K8S对象的五大部分,这些部分对应的含义以及具体的字段,以及为什么需要这些部分。
其中ApiVersion、Kind和Metadata是各大对象都通用的,而Spec和Status则每个对象都不同,之后深入K8S项目源码,一窥Pod底层实现及具体字段含义。
《每天十分钟,轻松入门K8S》的第四篇04.源码级别Pod详解(一):K8S对象字段及Pod源码实现 到这里就结束了,感谢您看到这里。
之后的几讲都会和Pod相关,深入源码级别探索K8S核心概念Pod相关内容,感兴趣的小伙伴欢迎点赞、评论、收藏,您的支持就是对我最大的鼓励。