基于Rook的Ceph云原生存储部署与实践指南(下)

#作者:任少近

文章目录

6Ceph资源对像管理

6.1查看services

[root@k8s-master ~]# kubectl -n rook-ceph get services
NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
rook-ceph-mgr             ClusterIP   10.110.141.201   <none>        9283/TCP            13h
rook-ceph-mgr-dashboard   ClusterIP   10.103.197.146   <none>        8443/TCP            13h
rook-ceph-mon-a           ClusterIP   10.110.163.61    <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-b           ClusterIP   10.100.49.10     <none>        6789/TCP,3300/TCP   13h
rook-ceph-mon-c           ClusterIP   10.96.193.162    <none>        6789/TCP,3300/TCP   13h

6.2查看Jobs

[root@k8s-master]#kubectl -n rook-ceph get jobs
NAME                   COMPLETIONS   DURATION   AGE
rook-ceph-osd-prepare-k8s-master   1/1          6s         11h
rook-ceph-osd-prepare-k8s-node1    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node2    1/1          7s         11h
rook-ceph-osd-prepare-k8s-node3    1/1          6s         11h

6.3 查看deployments.apps

[root@k8s-master]# kubectl -n rook-ceph get deployments.apps
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
csi-cephfsplugin-provisioner          2/2     2            2           12h
csi-rbdplugin-provisioner             2/2     2            2           12h
rook-ceph-crashcollector-k8s-master   1/1     1            1           12h
rook-ceph-crashcollector-k8s-node1    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node2    1/1     1            1           12h
rook-ceph-crashcollector-k8s-node3    1/1     1            1           12h
rook-ceph-mgr-a                       1/1     1            1           12h
rook-ceph-mgr-b                       1/1     1            1           12h
rook-ceph-mon-a                       1/1     1            1           12h
rook-ceph-mon-b                       1/1     1            1           12h
rook-ceph-mon-c                       1/1     1            1           12h
rook-ceph-operator                    1/1     1            1           12h
rook-ceph-osd-0                       1/1     1            1           12h
rook-ceph-osd-1                       1/1     1            1           12h
rook-ceph-osd-2                       1/1     1            1           12h
rook-ceph-osd-3                       1/1     1            1           12h

6.4查看daemonsets.apps

[root@k8s-master]# kubectl -n rook-ceph get daemonsets.apps
NAME    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin   4         4         4       4            4           <none>          12h
csi-rbdplugin      4         4         4       4            4           <none>          12h

6.5查看configmaps

[root@k8s-master]# kubectl -n rook-ceph get configmaps
NAME                    DATA   AGE
kube-root-ca.crt               1      13h
rook-ceph-csi-config           1      12h
rook-ceph-csi-mapping-config   1      12h
rook-ceph-mon-endpoints        5      12h
rook-ceph-operator-config      33     13h
rook-config-override           1      12h

6.6查看clusterroles.rbac.authorization.k8s.io

[root@k8s-master # kubectl -n rook-ceph get clusterroles.rbac.authorization.k8s.io
NAME                                 CREATED AT
cephfs-csi-nodeplugin                     2023-06-13T13:56:29Z
cephfs-external-provisioner-runner           2023-06-13T13:56:29Z
rbd-csi-nodeplugin                        2023-06-13T13:56:29Z
rbd-external-provisioner-runner              2023-06-13T13:56:29Z
rook-ceph-cluster-mgmt                    2023-06-13T13:56:29Z
rook-ceph-global                          2023-06-13T13:56:29Z
rook-ceph-mgr-cluster                      2023-06-13T13:56:29Z
rook-ceph-mgr-system                      2023-06-13T13:56:29Z
rook-ceph-object-bucket                    2023-06-13T13:56:29Z
rook-ceph-osd                            2023-06-13T13:56:29Z
rook-ceph-system                         2023-06-13T13:56:29Z

6.7查看clusterrolebindings.rbac.authorization.k8s.io

kubectl -n rook-ceph get clusterrolebindings.rbac.authorization.k8s.io

cephfs-csi-nodeplugin-role              ClusterRole/cephfs-csi-nodeplugin              
cephfs-csi-provisioner-role              ClusterRole/cephfs-external-provisioner-runner  
rbd-csi-nodeplugin                    ClusterRole/rbd-csi-nodeplugin                
rbd-csi-provisioner-role                ClusterRole/rbd-external-provisioner-runner      
rook-ceph-global                     ClusterRole/rook-ceph-global                  
rook-ceph-mgr-cluster               ClusterRole/rook-ceph-mgr-cluster                
rook-ceph-object-bucket               ClusterRole/rook-ceph-object-bucket             
rook-ceph-osd                        ClusterRole/rook-ceph-osd                   
rook-ceph-system                   ClusterRole/rook-ceph-system

6.8通过cephclusters.ceph查看OSD池信息

如果你使用了 Rook Ceph Operator 来管理 Ceph 集群,还可以查看 Rook 中的自定义资源来获取 OSD 池的信息

[root@k8s-master ~]# kubectl get cephclusters.ceph.rook.io rook-ceph -o yaml

7访问ceph

7.1Toolbox客户端

部署

cd rook/deploy/examples/

kubectl apply -f toolbox.yaml

连接ceph 集群

[root@k8s-master ~]# kubectl -n rook-ceph exec -it rook-ceph-tools-7857bc9568-q9fjk /bin/bash
bash-4.4$ ceph -s
  cluster:
    id:     e320aa6c-0057-46ad-b2bf-5c49df8eba5a
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 13h)
    mgr: b(active, since 13h), standbys: a
    osd: 4 osds: 4 up (since 13h), 4 in (since 13h)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   45 MiB used, 200 GiB / 200 GiB avail
    pgs:     1 active+clean

7.1K8s节点访问ceph

在节点添加ceph.conf keyring

[root@k8s-master]#mkdir /etc/ceph
[root@k8s-master]#cd /etc/ceph
[root@k8s-master]#vi ceph.conf
[global]
mon_host = 10.110.163.61:6789,10.100.49.10:6789,10.96.193.162:6789
[client.admin]
keyring = /etc/ceph/keyring

[root@k8s-master]#vi keyring
[client.admin]
key = AQCGfYhkeMnEFRAAJnW4jUMwmJz2b1dPvdTOJg==

验证

telnet 10.110.163.61 6789 以上三个services地址任一个

添加yum源

[ceph]
name=ceph
baseurl=https://mirrors.aliyun.com/ceph/rpm-quincy/el8/x86_64/
enabled=1
gpgcheck=0

安装ceph-common (安装失败,详情见5.1)

[root@k8s-master]#yum install -y ceph-common

成功可在节点上直接操作如下:

7.2暴露端口web访问

执行rook/deploy/examples/dashboard-external-https.yaml

[root@k8s-master examples]#kubectl apply -f rook/deploy/examples/dashboard-external-https.yaml

rook-ceph-mgr-dashboard-external-https   NodePort    10.106.127.224   <none>        8443:31555/TCP

获取密码:

kubectl -n rook-ceph get secrets rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 --decode  > rook-ceph-dashboard-password.password

G+LIkJwXQ/E*>/P&DbzB

访问,用户名为admin

https://192.168.123.194:31555/

7.3删除OSD Deployment

如果cluster.yaml中removeOSDsIfOutAndSafeToRemove: true设置为true,则Rook Operator将自动清除Deployment。默认为false。

7.4Ceph的Pool(多租户)创建pool设置pg的数量

以pool为颗粒度,如果不创建/指定,则数据会存放在默认的pool里。创建pool需要设置pg的数量,一般来说每个OSD为100个PG,也可以按照如下规则配置:

若少于5个OSD, 设置pg_num为128。

5~10个OSD,设置pg_num为512。

10~50个OSD,设置pg_num为4096。

超过50个OSD,可以参考pgcalc计算。

Pool上还需要设置CRUSH Rules策略,这是data如何分布式存储的策略。

此外,针对pool,还可以调整POOL副本数量、删除POOL、设置POOL配额、重命名POOL、查看POOL状态信息。

7.5修改登录密码

登录kubectl exec -it rook-ceph-tools-7857bc9568-q9fjk bash

bash-4.4$ echo -n '1qaz@WSX' > /tmp/password.txt

bash-4.4$ ceph dashboard ac-user-set-password admin --force-password -i /tmp/password.txt,以新密码登录。

以admin/1qaz@WSX为用户名密码登录。

8安装错误汇总

8.1quincy版的Ceph-coomon安装报错

原因:aliyuncs上无el7版本的quincy依赖包,只有el8依赖包有quincy版,尝试octopus版同样报错。

--> Finished Dependency Resolution
Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit)
Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit)
Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.22)(64bit)
。。。。。
。。。。。
 
           Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libicuuc.so.60()(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit)
Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph)
           Requires: libthrift-0.13.0.so()(64bit)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

9故障处理

9.1ceph集群提示daemons have recently crashed, health: HEALTH_WARN

bash-4.4$ ceph status
  cluster:
    id:     e320aa6c-0057-46ad-b2bf-5c49df8eba5a
    health: HEALTH_WARN
            3 mgr modules have recently crashed

  services:
    mon: 3 daemons, quorum a,b,c (age 23h)
    mgr: b(active, since 23h), standbys: a
    osd: 4 osds: 4 up (since 23h), 4 in (since 23h)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   45 MiB used, 200 GiB / 200 GiB avail
pgs:     1 active+clean

#查看详细日志信息
bash-4.4$ ceph health detail
HEALTH_WARN 3 mgr modules have recently crashed
[WRN] RECENT_MGR_MODULE_CRASH: 3 mgr modules have recently crashed

ceph 的 crash模块用来收集守护进程出现 crashdumps (崩溃)的信息,并将其存储在ceph集群中,以供以后分析。

crash查看一下

bash-4.4$ ceph crash ls
ID                                                          ENTITY  NEW
2023-06-14T13:56:38.064890Z_75a59d8c-9c99-47af-8cef-e632d8f0a010  mgr.b    *
2023-06-14T13:56:53.252095Z_bc44e5d3-67e5-4c22-a872-e9c7f9799f55  mgr.b    *
2023-06-14T13:57:38.564803Z_1f132169-793b-4ac6-a3c7-af48c91f5365  mgr.b    *
#带*号表示为最新,上面说mgr和osd有异常信息,接下来排查下osd和mgr,看看是不是因为没有归档的原因造成

bash-4.4$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.19519      root default
-5         0.04880      host k8s-master
 0    ssd  0.04880          osd.0            up   1.00000     1.00000
-3         0.04880      host k8s-node1
 1    ssd  0.04880          osd.1            up   1.00000     1.00000
-9         0.04880      host k8s-node2
 3    ssd  0.04880          osd.3            up   1.00000     1.00000
-7         0.04880      host k8s-node3
 2    ssd  0.04880          osd.2            up   1.00000     1.00000

通过上面的命令,排查到集群状态是ok,判断crash没有归档,造成误报,接下来进行归档

#第一种方法,适合只有一两个没有归档的
#ceph crash ls
#ceph crash archive <id>
#第二种方法,适合多个归档异常的,我们这边直接执行下面的命令
#ceph crash archive-all

9.2osd down

直接看挂了,看日志。

通过ceph osd tree,发现osd.3 down了。在k8s-node2上。

相关推荐
深度Linux7 小时前
深入探讨Ceph:分布式存储架构的未来
分布式·ceph·架构·c/c++
斯普信专业组7 小时前
基于Rook的Ceph云原生存储部署与实践指南(上)
ceph
大新新大浩浩9 天前
k8s环境中的rook-ceph的osd报Permission denied无法正常运行问题的处理方式
java·ceph·kubernetes
大新新大浩浩12 天前
ceph部署-14版本(nautilus)-使用ceph-ansible部署实验记录
ceph·ansible
我科绝伦(Huanhuan Zhou)12 天前
共享存储-一步一步部署ceph分布式文件系统
ceph
野猪佩挤12 天前
Ceph集群搭建2025(squid版)
ceph
野猪佩挤12 天前
Rook-ceph(1.92最新版)
ceph
斯普信专业组16 天前
从零到一:基于Rook构建云原生Ceph存储的全面指南(上)
ceph·云原生
斯普信专业组17 天前
从零到一:基于Rook构建云原生Ceph存储的全面指南(下)
ceph·云原生