基于Rook的Ceph云原生存储部署与实践指南(下)

#作者:任少近

文章目录

6Ceph资源对像管理

6.1查看services

复制代码
[root@k8s-master ~]# kubectl -n rook-ceph get services
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
rook-ceph-mgr             ClusterIP   10.110.141.201   <none>        9283/TCP            13h
rook-ceph-mgr-dashboard   ClusterIP   10.103.197.146   <none>        8443/TCP            13h
rook-ceph-mon-a           ClusterIP   10.110.163.61    <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-b           ClusterIP   10.100.49.10     <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-c           ClusterIP   10.96.193.162    <none>        6789/TCP,3300/TCP   13h

6.2查看Jobs

复制代码
[root@k8s-master]#kubectl -n rook-ceph get jobs
NAME                   COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-k8s-master   1/1          6s         11h
rook-ceph-osd-prepare-k8s-node1    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node2    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node3    1/1          6s         11h

6.3 查看deployments.apps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get deployments.apps
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
csi-cephfsplugin-provisioner          2/2     2            2           12h
csi-rbdplugin-provisioner             2/2     2            2           12h
rook-ceph-crashcollector-k8s-master   1/1     1            1           12h
rook-ceph-crashcollector-k8s-node1    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node2    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node3    1/1     1            1           12h
rook-ceph-mgr-a                       1/1     1            1           12h
rook-ceph-mgr-b                       1/1     1            1           12h
rook-ceph-mon-a                       1/1     1            1           12h
rook-ceph-mon-b                       1/1     1            1           12h
rook-ceph-mon-c                       1/1     1            1           12h
rook-ceph-operator                    1/1     1            1           12h
rook-ceph-osd-0                       1/1     1            1           12h
rook-ceph-osd-1                       1/1     1            1           12h
rook-ceph-osd-2                       1/1     1            1           12h
rook-ceph-osd-3                       1/1     1            1           12h

6.4查看daemonsets.apps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get daemonsets.apps
NAME    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin   4         4         4       4            4           <none>          12h
csi-rbdplugin      4         4         4       4            4           <none>          12h

6.5查看configmaps

复制代码
[root@k8s-master]# kubectl -n rook-ceph get configmaps
NAME                    DATA   AGE
kube-root-ca.crt               1      13h
rook-ceph-csi-config           1      12h
rook-ceph-csi-mapping-config   1      12h
rook-ceph-mon-endpoints        5      12h
rook-ceph-operator-config      33     13h
rook-config-override           1      12h

6.6查看clusterroles.rbac.authorization.k8s.io

复制代码
[root@k8s-master # kubectl -n rook-ceph get clusterroles.rbac.authorization.k8s.io
NAME                                 CREATED AT
cephfs-csi-nodeplugin                     2023-06-13T13:56:29Z
cephfs-external-provisioner-runner           2023-06-13T13:56:29Z
rbd-csi-nodeplugin                        2023-06-13T13:56:29Z
rbd-external-provisioner-runner              2023-06-13T13:56:29Z
rook-ceph-cluster-mgmt                    2023-06-13T13:56:29Z
rook-ceph-global                          2023-06-13T13:56:29Z
rook-ceph-mgr-cluster                      2023-06-13T13:56:29Z
rook-ceph-mgr-system                      2023-06-13T13:56:29Z
rook-ceph-object-bucket                    2023-06-13T13:56:29Z
rook-ceph-osd                            2023-06-13T13:56:29Z
rook-ceph-system                         2023-06-13T13:56:29Z

6.7查看clusterrolebindings.rbac.authorization.k8s.io

复制代码
kubectl -n rook-ceph get clusterrolebindings.rbac.authorization.k8s.io

cephfs-csi-nodeplugin-role              ClusterRole/cephfs-csi-nodeplugin              
cephfs-csi-provisioner-role              ClusterRole/cephfs-external-provisioner-runner  
rbd-csi-nodeplugin                    ClusterRole/rbd-csi-nodeplugin                
rbd-csi-provisioner-role                ClusterRole/rbd-external-provisioner-runner      
rook-ceph-global                     ClusterRole/rook-ceph-global                  
rook-ceph-mgr-cluster               ClusterRole/rook-ceph-mgr-cluster                
rook-ceph-object-bucket               ClusterRole/rook-ceph-object-bucket             
rook-ceph-osd                        ClusterRole/rook-ceph-osd                   
rook-ceph-system                   ClusterRole/rook-ceph-system

6.8通过cephclusters.ceph查看OSD池信息

如果你使用了 Rook Ceph Operator 来管理 Ceph 集群,还可以查看 Rook 中的自定义资源来获取 OSD 池的信息

root@k8s-master \~# kubectl get cephclusters.ceph.rook.io rook-ceph -o yaml

7访问ceph

7.1Toolbox客户端

部署

cd rook/deploy/examples/

kubectl apply -f toolbox.yaml

连接ceph 集群

复制代码
[root@k8s-master ~]# kubectl -n rook-ceph exec -it rook-ceph-tools-7857bc9568-q9fjk /bin/bash
bash-4.4$ ceph -s
  cluster:
    id:     e320aa6c-0057-46ad-b2bf-5c49df8eba5a
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 13h)
    mgr: b(active, since 13h), standbys: a
    osd: 4 osds: 4 up (since 13h), 4 in (since 13h)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   45 MiB used, 200 GiB / 200 GiB avail
    pgs:     1 active+clean

7.1K8s节点访问ceph

在节点添加ceph.conf keyring

复制代码
[root@k8s-master]#mkdir /etc/ceph
[root@k8s-master]#cd /etc/ceph
[root@k8s-master]#vi ceph.conf
[global]
mon_host = 10.110.163.61:6789,10.100.49.10:6789,10.96.193.162:6789
[client.admin]
keyring = /etc/ceph/keyring

[root@k8s-master]#vi keyring
[client.admin]
key = AQCGfYhkeMnEFRAAJnW4jUMwmJz2b1dPvdTOJg==

验证

telnet 10.110.163.61 6789 以上三个services地址任一个

添加yum源

复制代码
[ceph]
name=ceph
baseurl=https://mirrors.aliyun.com/ceph/rpm-quincy/el8/x86_64/
enabled=1
gpgcheck=0

安装ceph-common (安装失败,详情见5.1)

root@k8s-master#yum install -y ceph-common

成功可在节点上直接操作如下:

7.2暴露端口web访问

执行rook/deploy/examples/dashboard-external-https.yaml

复制代码
[root@k8s-master examples]#kubectl apply -f rook/deploy/examples/dashboard-external-https.yaml

rook-ceph-mgr-dashboard-external-https   NodePort    10.106.127.224   <none>        8443:31555/TCP

获取密码:

复制代码
kubectl -n rook-ceph get secrets rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 --decode  > rook-ceph-dashboard-password.password

G+LIkJwXQ/E*>/P&DbzB

访问,用户名为admin

https://192.168.123.194:31555/

7.3删除OSD Deployment

如果cluster.yaml中removeOSDsIfOutAndSafeToRemove: true设置为true,则Rook Operator将自动清除Deployment。默认为false。

7.4Ceph的Pool(多租户)创建pool设置pg的数量

以pool为颗粒度,如果不创建/指定,则数据会存放在默认的pool里。创建pool需要设置pg的数量,一般来说每个OSD为100个PG,也可以按照如下规则配置:

若少于5个OSD, 设置pg_num为128。

5~10个OSD,设置pg_num为512。

10~50个OSD,设置pg_num为4096。

超过50个OSD,可以参考pgcalc计算。

Pool上还需要设置CRUSH Rules策略,这是data如何分布式存储的策略。

此外,针对pool,还可以调整POOL副本数量、删除POOL、设置POOL配额、重命名POOL、查看POOL状态信息。

7.5修改登录密码

登录kubectl exec -it rook-ceph-tools-7857bc9568-q9fjk bash

bash-4.4$ echo -n '1qaz@WSX' > /tmp/password.txt

bash-4.4$ ceph dashboard ac-user-set-password admin --force-password -i /tmp/password.txt,以新密码登录。

以admin/1qaz@WSX为用户名密码登录。

8安装错误汇总

8.1quincy版的Ceph-coomon安装报错

原因:aliyuncs上无el7版本的quincy依赖包,只有el8依赖包有quincy版,尝试octopus版同样报错。

复制代码
--> Finished Dependency Resolution
Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit)
Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit)
Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.22)(64bit)
。。。。。
。。。。。
 
           Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libicuuc.so.60()(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libthrift-0.13.0.so()(64bit)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

9故障处理

9.1ceph集群提示daemons have recently crashed, health: HEALTH_WARN

复制代码
bash-4.4$ ceph status
  cluster:
    id:     e320aa6c-0057-46ad-b2bf-5c49df8eba5a
    health: HEALTH_WARN
            3 mgr modules have recently crashed

  services:
    mon: 3 daemons, quorum a,b,c (age 23h)
    mgr: b(active, since 23h), standbys: a
    osd: 4 osds: 4 up (since 23h), 4 in (since 23h)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   45 MiB used, 200 GiB / 200 GiB avail
pgs:     1 active+clean

#查看详细日志信息
bash-4.4$ ceph health detail
HEALTH_WARN 3 mgr modules have recently crashed
[WRN] RECENT_MGR_MODULE_CRASH: 3 mgr modules have recently crashed

ceph 的 crash模块用来收集守护进程出现 crashdumps (崩溃)的信息,并将其存储在ceph集群中,以供以后分析。

crash查看一下

复制代码
bash-4.4$ ceph crash ls
ID                                                          ENTITY  NEW
2023-06-14T13:56:38.064890Z_75a59d8c-9c99-47af-8cef-e632d8f0a010  mgr.b    *
2023-06-14T13:56:53.252095Z_bc44e5d3-67e5-4c22-a872-e9c7f9799f55  mgr.b    *
2023-06-14T13:57:38.564803Z_1f132169-793b-4ac6-a3c7-af48c91f5365  mgr.b    *
#带*号表示为最新,上面说mgr和osd有异常信息,接下来排查下osd和mgr,看看是不是因为没有归档的原因造成

bash-4.4$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.19519      root default
-5         0.04880      host k8s-master
 0    ssd  0.04880          osd.0            up   1.00000     1.00000
-3         0.04880      host k8s-node1
 1    ssd  0.04880          osd.1            up   1.00000     1.00000
-9         0.04880      host k8s-node2
 3    ssd  0.04880          osd.3            up   1.00000     1.00000
-7         0.04880      host k8s-node3
 2    ssd  0.04880          osd.2            up   1.00000     1.00000

通过上面的命令,排查到集群状态是ok,判断crash没有归档,造成误报,接下来进行归档

复制代码
#第一种方法,适合只有一两个没有归档的
#ceph crash ls
#ceph crash archive <id>
#第二种方法,适合多个归档异常的,我们这边直接执行下面的命令
#ceph crash archive-all

9.2osd down

直接看挂了,看日志。

通过ceph osd tree,发现osd.3 down了。在k8s-node2上。

相关推荐
三十..2 天前
Ceph分布式存储核心技术精要与运维实践指南
运维·分布式·ceph
一个行走的民2 天前
Ceph OSD NUMA 亲和性、Page Cache 跨 NUMA 访问与绑核实践
ceph
潮起鲸落入海2 天前
ceph集群组件管理 ceph orch 和ceph config命令
ceph
bukeyiwanshui2 天前
20260529 Ceph 分布式存储 认证和授权管理
ceph
bukeyiwanshui2 天前
20260528 Ceph 分布式存储 池管理
ceph
一个行走的民2 天前
CephX 认证机制深度解析
ceph
马立杰2 天前
Ceph 集群手动部署
ceph·分布式存储
bukeyiwanshui2 天前
20260528 Ceph 分布式存储 集群配置
分布式·ceph
qq_356408662 天前
Kubernetes Rook-Ceph 高可用存储部署文档
ceph·容器·kubernetes
潮起鲸落入海2 天前
ceph集群mon 以及池管理
ceph