基于Rook的Ceph云原生存储部署与实践指南(下)

#作者:任少近

文章目录

6Ceph资源对像管理

6.1查看services

复制代码
[root@k8s-master ~]# kubectl -n rook-ceph get services
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
rook-ceph-mgr             ClusterIP   10.110.141.201   <none>        9283/TCP            13h
rook-ceph-mgr-dashboard   ClusterIP   10.103.197.146   <none>        8443/TCP            13h
rook-ceph-mon-a           ClusterIP   10.110.163.61    <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-b           ClusterIP   10.100.49.10     <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-c           ClusterIP   10.96.193.162    <none>        6789/TCP,3300/TCP   13h

6.2查看Jobs

复制代码
[root@k8s-master]#kubectl -n rook-ceph get jobs
NAME                   COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-k8s-master   1/1          6s         11h
rook-ceph-osd-prepare-k8s-node1    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node2    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node3    1/1          6s         11h

6.3 查看deployments.apps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get deployments.apps
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
csi-cephfsplugin-provisioner          2/2     2            2           12h
csi-rbdplugin-provisioner             2/2     2            2           12h
rook-ceph-crashcollector-k8s-master   1/1     1            1           12h
rook-ceph-crashcollector-k8s-node1    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node2    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node3    1/1     1            1           12h
rook-ceph-mgr-a                       1/1     1            1           12h
rook-ceph-mgr-b                       1/1     1            1           12h
rook-ceph-mon-a                       1/1     1            1           12h
rook-ceph-mon-b                       1/1     1            1           12h
rook-ceph-mon-c                       1/1     1            1           12h
rook-ceph-operator                    1/1     1            1           12h
rook-ceph-osd-0                       1/1     1            1           12h
rook-ceph-osd-1                       1/1     1            1           12h
rook-ceph-osd-2                       1/1     1            1           12h
rook-ceph-osd-3                       1/1     1            1           12h

6.4查看daemonsets.apps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get daemonsets.apps
NAME    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin   4         4         4       4            4           <none>          12h
csi-rbdplugin      4         4         4       4            4           <none>          12h

6.5查看configmaps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get configmaps
NAME                    DATA   AGE
kube-root-ca.crt               1      13h
rook-ceph-csi-config           1      12h
rook-ceph-csi-mapping-config   1      12h
rook-ceph-mon-endpoints        5      12h
rook-ceph-operator-config      33     13h
rook-config-override           1      12h

6.6查看clusterroles.rbac.authorization.k8s.io

复制代码
[root@k8s-master # kubectl -n rook-ceph get clusterroles.rbac.authorization.k8s.io
NAME                                 CREATED AT
cephfs-csi-nodeplugin                     2023-06-13T13:56:29Z
cephfs-external-provisioner-runner           2023-06-13T13:56:29Z
rbd-csi-nodeplugin                        2023-06-13T13:56:29Z
rbd-external-provisioner-runner              2023-06-13T13:56:29Z
rook-ceph-cluster-mgmt                    2023-06-13T13:56:29Z
rook-ceph-global                          2023-06-13T13:56:29Z
rook-ceph-mgr-cluster                      2023-06-13T13:56:29Z
rook-ceph-mgr-system                      2023-06-13T13:56:29Z
rook-ceph-object-bucket                    2023-06-13T13:56:29Z
rook-ceph-osd                            2023-06-13T13:56:29Z
rook-ceph-system                         2023-06-13T13:56:29Z

6.7查看clusterrolebindings.rbac.authorization.k8s.io

复制代码
kubectl -n rook-ceph get clusterrolebindings.rbac.authorization.k8s.io

cephfs-csi-nodeplugin-role              ClusterRole/cephfs-csi-nodeplugin              
cephfs-csi-provisioner-role              ClusterRole/cephfs-external-provisioner-runner  
rbd-csi-nodeplugin                    ClusterRole/rbd-csi-nodeplugin                
rbd-csi-provisioner-role                ClusterRole/rbd-external-provisioner-runner      
rook-ceph-global                     ClusterRole/rook-ceph-global                  
rook-ceph-mgr-cluster               ClusterRole/rook-ceph-mgr-cluster                
rook-ceph-object-bucket               ClusterRole/rook-ceph-object-bucket             
rook-ceph-osd                        ClusterRole/rook-ceph-osd                   
rook-ceph-system                   ClusterRole/rook-ceph-system

6.8通过cephclusters.ceph查看OSD池信息

如果你使用了 Rook Ceph Operator 来管理 Ceph 集群,还可以查看 Rook 中的自定义资源来获取 OSD 池的信息

root@k8s-master \~\]# kubectl get cephclusters.ceph.rook.io rook-ceph -o yaml ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/4bb7cd69f1f54256afd512ad0248fa18.png) ## 7访问ceph ### 7.1Toolbox客户端 **部署** cd rook/deploy/examples/ kubectl apply -f toolbox.yaml **连接ceph 集群** [root@k8s-master ~]# kubectl -n rook-ceph exec -it rook-ceph-tools-7857bc9568-q9fjk /bin/bash bash-4.4$ ceph -s cluster: id: e320aa6c-0057-46ad-b2bf-5c49df8eba5a health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 13h) mgr: b(active, since 13h), standbys: a osd: 4 osds: 4 up (since 13h), 4 in (since 13h) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 45 MiB used, 200 GiB / 200 GiB avail pgs: 1 active+clean ### 7.1K8s节点访问ceph **在节点添加ceph.conf keyring** [root@k8s-master]#mkdir /etc/ceph [root@k8s-master]#cd /etc/ceph [root@k8s-master]#vi ceph.conf [global] mon_host = 10.110.163.61:6789,10.100.49.10:6789,10.96.193.162:6789 [client.admin] keyring = /etc/ceph/keyring [root@k8s-master]#vi keyring [client.admin] key = AQCGfYhkeMnEFRAAJnW4jUMwmJz2b1dPvdTOJg== **验证** telnet 10.110.163.61 6789 以上三个services地址任一个 **添加yum源** [ceph] name=ceph baseurl=https://mirrors.aliyun.com/ceph/rpm-quincy/el8/x86_64/ enabled=1 gpgcheck=0 **安装ceph-common (安装失败,详情见5.1)** \[root@k8s-master\]#yum install -y ceph-common **成功可在节点上直接操作如下:** ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/da4bc6c275dd4fa4a45523c3abcf1451.png) ### 7.2暴露端口web访问 **执行rook/deploy/examples/dashboard-external-https.yaml** [root@k8s-master examples]#kubectl apply -f rook/deploy/examples/dashboard-external-https.yaml rook-ceph-mgr-dashboard-external-https NodePort 10.106.127.224 8443:31555/TCP **获取密码:** kubectl -n rook-ceph get secrets rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 --decode > rook-ceph-dashboard-password.password G+LIkJwXQ/E*>/P&DbzB **访问,用户名为admin** https://192.168.123.194:31555/ ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/a74cfa178fa848b4ba6457845b31a947.png) ### 7.3删除OSD Deployment 如果cluster.yaml中removeOSDsIfOutAndSafeToRemove: true设置为true,则Rook Operator将自动清除Deployment。默认为false。 ### 7.4Ceph的Pool(多租户)创建pool设置pg的数量 以pool为颗粒度,如果不创建/指定,则数据会存放在默认的pool里。创建pool需要设置pg的数量,一般来说每个OSD为100个PG,也可以按照如下规则配置: 若少于5个OSD, 设置pg_num为128。 5\~10个OSD,设置pg_num为512。 10\~50个OSD,设置pg_num为4096。 超过50个OSD,可以参考pgcalc计算。 Pool上还需要设置CRUSH Rules策略,这是data如何分布式存储的策略。 此外,针对pool,还可以调整POOL副本数量、删除POOL、设置POOL配额、重命名POOL、查看POOL状态信息。 ### 7.5修改登录密码 登录kubectl exec -it rook-ceph-tools-7857bc9568-q9fjk bash bash-4.4$ echo -n '1qaz@WSX' \> /tmp/password.txt bash-4.4$ ceph dashboard ac-user-set-password admin --force-password -i /tmp/password.txt,以新密码登录。 以admin/1qaz@WSX为用户名密码登录。 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/330aceb26013401e9a1e86c9c830e234.png) ## 8安装错误汇总 ### 8.1quincy版的Ceph-coomon安装报错 原因:aliyuncs上无el7版本的quincy依赖包,只有el8依赖包有quincy版,尝试octopus版同样报错。 --> Finished Dependency Resolution Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit) Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit) Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.22)(64bit) 。。。。。 。。。。。 Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libicuuc.so.60()(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libthrift-0.13.0.so()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest ## 9故障处理 ### 9.1ceph集群提示daemons have recently crashed, health: HEALTH_WARN bash-4.4$ ceph status cluster: id: e320aa6c-0057-46ad-b2bf-5c49df8eba5a health: HEALTH_WARN 3 mgr modules have recently crashed services: mon: 3 daemons, quorum a,b,c (age 23h) mgr: b(active, since 23h), standbys: a osd: 4 osds: 4 up (since 23h), 4 in (since 23h) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 45 MiB used, 200 GiB / 200 GiB avail pgs: 1 active+clean #查看详细日志信息 bash-4.4$ ceph health detail HEALTH_WARN 3 mgr modules have recently crashed [WRN] RECENT_MGR_MODULE_CRASH: 3 mgr modules have recently crashed ceph 的 crash模块用来收集守护进程出现 crashdumps (崩溃)的信息,并将其存储在ceph集群中,以供以后分析。 crash查看一下 bash-4.4$ ceph crash ls ID ENTITY NEW 2023-06-14T13:56:38.064890Z_75a59d8c-9c99-47af-8cef-e632d8f0a010 mgr.b * 2023-06-14T13:56:53.252095Z_bc44e5d3-67e5-4c22-a872-e9c7f9799f55 mgr.b * 2023-06-14T13:57:38.564803Z_1f132169-793b-4ac6-a3c7-af48c91f5365 mgr.b * #带*号表示为最新,上面说mgr和osd有异常信息,接下来排查下osd和mgr,看看是不是因为没有归档的原因造成 bash-4.4$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.19519 root default -5 0.04880 host k8s-master 0 ssd 0.04880 osd.0 up 1.00000 1.00000 -3 0.04880 host k8s-node1 1 ssd 0.04880 osd.1 up 1.00000 1.00000 -9 0.04880 host k8s-node2 3 ssd 0.04880 osd.3 up 1.00000 1.00000 -7 0.04880 host k8s-node3 2 ssd 0.04880 osd.2 up 1.00000 1.00000 通过上面的命令,排查到集群状态是ok,判断crash没有归档,造成误报,接下来进行归档 #第一种方法,适合只有一两个没有归档的 #ceph crash ls #ceph crash archive #第二种方法,适合多个归档异常的,我们这边直接执行下面的命令 #ceph crash archive-all ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/f607571b9b5b4aef929ffa099f08ca55.png) ### 9.2osd down 直接看挂了,看日志。 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/6eb825ff00bf42168bfefcf76728c62e.png) 通过ceph osd tree,发现osd.3 down了。在k8s-node2上。 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/0816c3897ceb4908833054d453a56e7a.png) ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/ee09d465d09541aeb7f352b23883610e.png)

相关推荐
斯普信专业组10 小时前
k8s云原生rook-ceph pvc快照与恢复(下)
ceph·云原生·kubernetes
斯普信专业组17 小时前
k8s云原生rook-ceph pvc快照与恢复(上)
ceph·云原生·kubernetes
斯普信专业组1 天前
Ceph、K8s、CSI、PVC、PV 深入详解
ceph·容器·kubernetes
mixboot13 天前
Ceph OSD.419 故障分析
ceph·osd
赵成ccc13 天前
离线部署三节点 Ceph 分布式存储
分布式·ceph
赵成ccc13 天前
三节点Ceph分布式存储搭建指南
分布式·ceph
免檒13 天前
windows11下基于docker单机部署ceph集群
ceph·后端·docker·容器
mixboot13 天前
Ceph集群OSD崩溃恢复
ceph·osd
羌俊恩18 天前
分布式存储之Ceph使用指南--部署篇(未完待续)
分布式·ceph·pg·osd·rados
大咖分享课24 天前
深度剖析:Ceph分布式存储系统架构
分布式·ceph·架构·分布式存储