metric-server概述:
Metrics Server从kubelets收集资源指标,并通过Metrics API将它们暴露在Kubernetes apiserver中,以供HPA(Horizontal Pod Autoscaler)和VPA(Vertical Pod Autoscaler)使用。
Metrics API也可以通过kubectl top访问,从而更容易调试自动缩放管道。
参考链接:
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server
https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/
https://github.com/kubernetes-sigs/metrics-server
部署metric-server:
cat > deoloyment-metric-server.yaml <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-server:system:auth-delegator
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metrics-server-auth-reader
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
apiVersion: v1
kind: ConfigMap
metadata:
name: metrics-server-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server-v0.3.3
namespace: kube-system
labels:
k8s-app: metrics-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.3.3
spec:
selector:
matchLabels:
k8s-app: metrics-server
version: v0.3.3
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
version: v0.3.3
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
tolerations:
- operator: Exists
containers:
- name: metrics-server
image: registry.aliyuncs.com/google_containers/metrics-server-amd64:v0.3.3
image: k8s151.oldboyedu.com:5000/metrics-server-amd64:v0.3.3
command:
-
/metrics-server
-
--metric-resolution=30s
These are needed for GKE, which doesn't support secure communication yet.
Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
#- --kubelet-port=10255
#- --deprecated-kubelet-completely-insecure=true
-
--kubelet-insecure-tls
-
--kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
ports:
- containerPort: 443
name: https
protocol: TCP
- name: metrics-server-nanny
image: registry.aliyuncs.com/google_containers/addon-resizer:1.8.5
image: k8s151.oldboyedu.com:5000/addon-resizer:1.8.5
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 5m
memory: 50Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: metrics-server-config-volume
mountPath: /etc/config
command:
-
/pod_nanny
-
--config-dir=/etc/config
#- --cpu=80m
- --extra-cpu=0.5m
#- --memory=80Mi
#- --extra-memory=8Mi
-
--threshold=5
-
--deployment=metrics-server-v0.3.3
-
--container=metrics-server
-
--poll-period=300000
-
--estimator=exponential
-
--minClusterSize=2
Specifies the smallest cluster (defined in number of nodes)
resources will be scaled to.
#- --minClusterSize={{ metrics_server_min_cluster_size }}
volumes:
- name: metrics-server-config-volume
configMap:
name: metrics-server-config
apiVersion: v1
kind: Service
metadata:
name: metrics-server
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Metrics-server"
spec:
selector:
k8s-app: metrics-server
ports:
- port: 443
protocol: TCP
targetPort: https
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
-
apiGroups:
-
""
resources:
-
pods
-
nodes
-
nodes/stats
-
namespaces
verbs:
-
get
-
list
-
watch
-
apiGroups:
-
"extensions"
resources:
- deployments
verbs:
-
get
-
list
-
update
-
watch
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:metrics-server
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
EOF
面试题: 请一个Pod做了资源限制,当CPU资源使用率高达90%时如何解决?能否实现自动扩容呢?请说下思路?
---> hpa
Pod水平自动伸缩HPA案例:
1.编写资源清单
cat > deploy-tomcat.yaml <<'EOF'
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mysql
spec:
replicas: 1
template:
metadata:
labels:
app: oldboyedu-mysql
spec:
tolerations:
- operator: Exists
containers:
- name: mysql
image: k8s151.oldboyedu.com:5000/oldboyedu-db/mysql:5.7
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: '123456'
apiVersion: v1
kind: Service
metadata:
name: oldboyedu-mysql
spec:
selector:
app: oldboyedu-mysql
ports:
- port: 3306
targetPort: 3306
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: oldboyedu-tomcat-app
spec:
replicas: 1
template:
metadata:
labels:
app: oldboyedu-tomcat-app
spec:
tolerations:
- operator: Exists
containers:
- name: myweb
image: jasonyin2020/tomcat-app:v1
image: k8s151.oldboyedu.com:5000/oldboyedu-tomcat/tomcat-app:v1
resources:
limits:
cpu: "100m"
requests:
cpu: "100m"
ports:
- containerPort: 8080
env:
- name: MYSQL_SERVICE_HOST
value: oldboyedu-mysql
- name: MYSQL_SERVICE_PORT
value: '3306'
apiVersion: v1
kind: Service
metadata:
name: oldboyedu-tomcat-app
spec:
type: NodePort
selector:
app: oldboyedu-tomcat-app
ports:
- port: 8080
targetPort: 8080
nodePort: 30888
EOF
2.部署tomcat应用
kubectl apply -f deploy-tomcat.yaml
3.测试
1.支持中文输出
cat >> /etc/mysql/my.cnf <<EOF
mysql
default-character-set = utf8
EOF
2.插入测试数据
INSERT INTO HPE_APP.T_USERS (USER_NAME,LEVEL) VALUES ('WangJianPing',888888888),('LiuDong',999999999999);
3.查看webUI
4.创建HPA规则
kubectl autoscale deployment oldboyedu-tomcat-app --max=10 --min=2 --cpu-percent=75
相关参数说明:
--max:
指定最大的Pod数量,如果指定的数量越大,则弹性伸缩的资源创建的就越多,对服务器资源会进行消耗。
--min:
指定最小的Pod数量。
--cpu-percent:
指定CPU的百分比。
温馨提示:
(1)测试时建议修改为CPU使用百分比为5%,生产环节建议设置成75%.
(2)测试时最大Pod数量建议为5个即可,生产环境根据需求而定,通常情况下,10是一个不错的建议;
5.压力测试tomcat,观察Pod的水平伸缩
(1)安装测试工具
yum -y install httpd-tools
(2)使用ab工具进行测试
ab -c 1000 -n 2000000 http://10.0.0.153:30888/
ab -c 100 -n 2000000 http://10.0.0.153:30888/
相关参数说明:
-n:
指定总共压测的次数。
-c:
每次压测发起的并发请求数。
ceph集群环境部署:
1.准备3台节点,所有节点都添加映射关系
root@ceph201 \~\]# cat \>\> /etc/hosts \<\
name=oldboyedu linux82 ceph 2022
baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-octopus/el7/x86_64/
gpgcheck=0
enable=1
oldboyedu-ceph-tools
name=oldboyedu linux82 ceph tools
baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-octopus/el7/noarch/
gpgcheck=0
enable=1
[ceph]
name=ceph
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/x86_64/
gpgcheck=0
priority=1
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/
gpgcheck=0
priority=1
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/SRPMS
gpgcheck=0
EOF
(3)"ceph201"节点安装"ceph-deploy"工具,用于后期部署ceph集群
root@ceph201 \~\]# sed -ri 's#(keepcache=)0#\\11#' /etc/yum.conf
\[root@ceph201 \~\]# yum -y install ceph-deploy
(4)将rpm软件源推送到其它节点
\[root@ceph201 \~\]# scp /etc/yum.repos.d/\*.repo ceph202:/etc/yum.repos.d/
\[root@ceph201 \~\]# scp /etc/yum.repos.d/\*.repo ceph203:/etc/yum.repos.d/
4."ceph201"节点ceph环境准备
(1)"ceph201"节点安装ceph环境
\[root@ceph201 \~\]# yum -y install ceph ceph-mon ceph-mgr ceph-radosgw ceph-mds ceph-osd
相关软件包功能说明如下:
ceph:
ceph通用模块软件包。
ceph-mon:
ceph数据的监控和分配数据存储,其中mon是monitor的简写。
ceph-mgr:
管理集群状态的组件。其中mgr是manager的简写。基于该组件我们可以让zabbix来监控ceph集群哟。
ceph-radosgw:
对象存储网关,多用于ceph对象存储相关模块软件包。
ceph-mds:
ceph的文件存储相关模块软件包,即"metadata server"
ceph-osd
ceph的块存储相关模块软件包。
(2)将软件打包到本地并推送到其它节点
\[root@ceph201 \~\]# mkdir ceph-rpm
\[root@ceph201 \~\]# find /var/cache/yum/ -type f -name "\*.rpm" \| xargs mv -t ceph-rpm/
\[root@ceph201 \~\]# tar zcf oldboyedu-ceph.tar.gz ceph-rpm
\[root@ceph201 \~\]# scp oldboyedu-ceph.tar.gz ceph202:\~
\[root@ceph201 \~\]# scp oldboyedu-ceph.tar.gz ceph203:\~
(3)其它节点安装ceph环境
\[root@ceph202 \~\]# tar xf oldboyedu-ceph.tar.gz \&\& cd \~/ceph-rpm \&\& yum -y localinstall \*.rpm
\[root@ceph203 \~\]# tar xf oldboyedu-ceph.tar.gz \&\& cd \~/ceph-rpm \&\& yum -y localinstall \*.rpm
5."ceph201"节点初始化ceph的配置文件
(1)自定义创建ceph的配置文件目录\~
\[root@ceph201 \~\]# mkdir -pv /oldboyedu/ceph/cluster \&\& cd /oldboyedu/ceph/cluster/
(2)所有节点安装"distribute"软件包,提供"pkg_resources"模块,
yum -y install gcc python-setuptools python-devel wget
wget https://pypi.python.org/packages/source/d/distribute/distribute-0.7.3.zip --no-check-certificate
unzip distribute-0.7.3.zip
cd distribute-0.7.3
python setup.py install
(3)初始化ceph的配置文件,注意观察执行命令的所在目录文件变化哟\~
\[root@ceph201 \~\]# cd /oldboyedu/ceph/cluster/ \&\& ceph-deploy new --public-network 10.0.0.0/24 ceph201 ceph202 ceph203
6."ceph201"节点安装ceph-monitor并启动ceph-mon
\[root@ceph201 \~\]# cd /oldboyedu/ceph/cluster/ \&\& ceph-deploy mon create-initial
7."ceph201"节点将配置和client.admin密钥推送到指定的远程主机以便于管理集群。
\[root@ceph201 \~\]# cd /oldboyedu/ceph/cluster/ \&\& ceph-deploy admin ceph201 ceph202 ceph203
8."ceph201"节点安装并启动ceph-mgr组件
\[root@ceph201 \~\]# cd /oldboyedu/ceph/cluster/ \&\& ceph-deploy mgr create ceph201 ceph202 ceph203
9.卸载版本,然后更换指定版本,否则可能在做第十步会出现"Could not locate executable 'ceph-disk' ..."(注意哈,这个步骤需要求将yum源切换为阿里源,之前是清华源)
ceph-deploy --version
yum -y remove ceph-deploy
yum -y install python-pip
pip install ceph-deploy==2.0.1 -i https://mirrors.aliyun.com/pypi/simple
ceph-deploy --version
10."ceph201"节点安装OSD设备
\[root@ceph201 \~\]# cd /oldboyedu/ceph/cluster
\[root@ceph201 \~\]# ceph-deploy osd create ceph201 --data /dev/sdb
\[root@ceph201 \~\]# ceph-deploy osd create ceph201 --data /dev/sdc
\[root@ceph201 \~\]# ceph-deploy osd create ceph202 --data /dev/sdb
\[root@ceph201 \~\]# ceph-deploy osd create ceph202 --data /dev/sdc
\[root@ceph201 \~\]# ceph-deploy osd create ceph203 --data /dev/sdb
\[root@ceph201 \~\]# ceph-deploy osd create ceph203 --data /dev/sdc
创建OSD是要注意的相关参数说明:
--data DATA:
指定逻辑卷或者设备的绝对路径。
--journal JOURNAL:
指定逻辑卷或者GPT分区的分区路径。生产环境强烈指定该参数,并推荐使用SSD设备。
因为ceph需要先预写日志而后才会真实写入数据,从而达到防止数据丢失的目的。
若不指定该参数,则日志会写入"--data"指定的路径哟。从而降低性能。
对于一个4T的数据盘,其日志并不会占据特别多的空间,通常情况下分配50G足以,当然,LVM是支持动态扩容的。
温馨提示:
(1)如上图所示,指定的设备各个服务器必须存在哟;
(2)如下图所示,当我们创建OSD成功后,就可以通过"ceph -s"查看集群状态了哟;
(3)如下图所示,我们可以通过"ceph osd tree"查看集群各个osd的情况;
(4)如果一个ceph集群的某个主机上有多块磁盘都需要加入osd,只需重复执行上述步骤即可;
(5)如果添加硬盘未识别可执行以下语句:
for i in \`seq 0 2\`; do echo "- - -" \> /sys/class/scsi_host/host${i}/scan;done
选做部分,否则集群处于HEALTH_WARN状态,但并不影响正常使用:
11.禁用安全机制,否则会报错:"mons are allowing insecure global_id reclaim"
ceph config set mon auth_allow_insecure_global_id_reclaim false
12.集群时间同步,否则会报错"clock skew detected on mon.ceph202, mon.ceph203"
(1)所有节点安装chrony服务
\[root@ceph201 \~\]# yum -y install chrony
(2)备份默认配置文件
\[root@ceph201 \~\]# cp /etc/chrony.conf{,\`date +%F\`}
(3)创建配置文件
\[root@ceph201 \~\]# cat \> /etc/chrony.conf \<\<'EOF'
server ceph201 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 10.0.0.0/8
local stratum 10
logdir /var/log/chrony
EOF
(4)其他2个客户端配置
cat \> /etc/chrony.conf \<\<'EOF'
server ceph201 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF
(5)将chrony服务设置为开启自启动
systemctl enable --now chronyd
13.所有节点安装"pecan"模块,否则会报错"Module 'restful' has failed dependency: No module named 'pecan'"
pip3 install pecan werkzeug
reboot
14.开启池删除功能
ceph config set mon mon_allow_pool_delete true
ceph的存储池基本管理:
(1)查看现有的所有资源池
ceph osd pool ls
(2)创建资源池
ceph osd pool create oldboyedu-linux 128 128
(3)查看存储池的参数信息(这个值可能会不断的变化哟\~尤其是在重命名后!)
ceph osd pool get oldboyedu-linux pg_num
(4)资源池的重命名
ceph osd pool rename oldboyedu-linux oldboyedu-linux-2021
(5)查看资源池的状态
ceph osd pool stats
(6)删除资源池(需要开启删除的功能,否则会报错哟\~而且池的名字得写2次!)
ceph osd pool rm oldboyedu-linux oldboyedu-linux --yes-i-really-really-mean-it
其他使用方法:
"ceph osd pool --help"
rbd的块设备(镜像,image)管理:
(1)查看rbd的块(镜像,image)设备:
rbd list -p oldboyedu-linux
rbd ls -p oldboyedu-linux --long --format json --pretty-format # 可以指定输出的样式
(2)创建rbd的块设备:(下面两种写法等效哟)
rbd create --size 1024 --pool oldboyedu-linux linux82 # 不指定单位默认为M
rbd create --size 1024 oldboyedu-linux/linux83
rbd create -s 2G -p oldboyedu-linux linux84 # 也可以指定单位,比如G
(3)修改镜像设备的名称
rbd rename\|mv -p oldboyedu-linux oldboyedu-zhibo oldboyedu-linux
rbd rename oldboyedu-linux/linux82 oldboyedu-linux/linux82-2022
rbd mv --pool oldboyedu-linux linux82-2022 linux82
(4)查看镜像的状态
rbd status -p oldboyedu-linux oldboyedu-zhibo2021
rbd status oldboyedu-linux/linux82
(5)移除镜像文件
rbd remove\|rm -p oldboyedu-linux oldboyedu-zhibo2021
rbd remove oldboyedu-linux/linux83
rbd rm --pool oldboyedu-linux linux84
(6)查看rbd的详细信息:
rbd info -p oldboyedu-linux linux82 # 等效于"rbd info oldboyedu-linux/linux82"
rbd info oldboyedu-linux/linux82 --format json --pretty-format
温馨提示:
(1)如果是ceph的客户端,想要操作rbd必须安装该工具,其安装包可参考"rpm -qf /usr/bin/rbd"哟;
(2)以上参数有任何不懂的,可以直接使用'rbd help \ name=oldboyedu linux82 ceph 2022 baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-octopus/el7/x86_64/ gpgcheck=0 enable=1 oldboyedu-ceph-tools
name=oldboyedu linux82 ceph tools baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-octopus/el7/noarch/ gpgcheck=0 enable=1 EOF (2)master节点开启rpm软件包缓存 [email protected] \~\]# grep keepcache /etc/yum.conf
keepcache=1
\[[email protected] \~\]#
(3)安装ceph的基础包
\[[email protected] \~\]# yum -y install ceph-common
(4)将ceph软件包打包并下发到其它k8s node节点
\[[email protected] \~\]# mkdir k8s-ceph-common
\[[email protected] \~\]# find /var/cache/yum/ -type f -name "\*.rpm" \| xargs mv -t k8s-ceph-common
\[[email protected] \~\]# tar zcf oldboyedu-k8s-ceph-common.tar.gz k8s-ceph-common
\[[email protected] \~\]# scp oldboyedu-k8s-ceph-common.tar.gz k8s152.oldboyedu.com:\~
\[[email protected] \~\]# scp oldboyedu-k8s-ceph-common.tar.gz k8s153.oldboyedu.com:\~
(5)其它所有node节点安装ceph软件包
\[[email protected] \~\]# tar xf oldboyedu-k8s-ceph-common.tar.gz \&\& yum -y localinstall k8s-ceph-common/\*.rpm
\[[email protected] \~\]# tar xf oldboyedu-k8s-ceph-common.tar.gz \&\& yum -y localinstall k8s-ceph-common/\*.rpm
(6)将ceph集群的配置文件推送到k8s所有节点
\[root@ceph201 \~\]# scp /etc/ceph/ceph.conf 10.0.0.151:/etc/ceph/
\[root@ceph201 \~\]# scp /etc/ceph/ceph.conf 10.0.0.152:/etc/ceph/
\[root@ceph201 \~\]# scp /etc/ceph/ceph.conf 10.0.0.153:/etc/ceph/
参考连接:
https://kubernetes.io/docs/concepts/storage/volumes/#rbd
https://github.com/kubernetes/examples/tree/master/volumes/rbd
2.使用Ceph身份验证密钥
(1)如果提供了Ceph身份验证密钥,则该密钥应首先进行base64编码,然后将编码后的字符串放入密钥yaml中。
\[root@ceph201 \~\]# grep key /etc/ceph/ceph.client.admin.keyring \|awk '{printf "%s", $NF}'\|base64
QVFBTzlpTmplSXJZSmhBQS9EUGJYSURjYmYwSHR2Z1BxQUc3MUE9PQ==
\[root@ceph201 \~\]#
(2)创建Secret资源,注意替换key的值,要和ceph集群的key保持一致哟,上一步我已经取出来了。
\[[email protected] \~\]# cat \> 01-oldboyedu-ceph-secret.yaml \<\<'EOF'
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
type: "kubernetes.io/rbd"
data:
key: QVFBTzlpTmplSXJZSmhBQS9EUGJYSURjYmYwSHR2Z1BxQUc3MUE9PQ==
EOF
(3)在K8S集群中创建Secret资源,以便于后期Pod使用该资源。
\[[email protected] \~\]# kubectl apply -f 01-oldboyedu-ceph-secret.yaml
\[[email protected] \~\]# kubectl get secret
参考连接:
https://github.com/kubernetes/examples/blob/master/volumes/rbd/secret/ceph-secret.yaml
3.ceph集群创建k8s专用的存储池及镜像
\[root@ceph201 \~\]# ceph osd pool create k8s 128 128
\[root@ceph201 \~\]# rbd create -p k8s --size 2G --image-feature layering oldboyedu-linux
4.K8S使用ceph的rbd镜像案例
(1)创建kubernetes集群常用的资源池
\[root@ceph201 \~\]# ceph osd pool create k8s 128 128
pool 'k8s' created
\[root@ceph201 \~\]#
(2)在k8s资源池创建块设备,值得注意的是,我们可以指定该镜像的特性,否则由于内核版本过低而无法挂载哟
\[root@ceph201 \~\]# rbd create -p k8s --size 1024 --image-feature layering oldboyedu-linux
(3)创建K8S的资源,将MySQL的Pod持久化到ceph集群
\[[email protected] \~\]# cat \> 02-oldboyedu-mysql-deploy.yml \<\