ceph -s //检查故障
k8s70132:~$ ceph -s
cluster:
id: d10b3028-b78d-4b23-bacb-ca26c0a35c99
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 5 daemons, quorum k8sceph70201,k8sceph70202,k8sceph70203,k8sceph70204,k8sceph70205 (age 4d)
mgr: k8sceph70204(active, since 11w), standbys: k8sceph70201, k8sceph70205
mds: cephfs:1 {0=k8sceph70204=up:active} 2 up:standby
osd: 23 osds: 22 up (since 4d), 22 in (since 4d)
rgw: 3 daemons active (k8sceph70201, k8sceph70204, k8sceph70205)
task status:
data:
pools: 11 pools, 281 pgs
objects: 809.23k objects, 24 GiB
usage: 159 GiB used, 38 TiB / 38 TiB avail
pgs: 281 active+clean
io:
client: 47 KiB/s wr, 0 op/s rd, 2 op/s wr
这里health有告警,查看故障点:
ceph crush ls
k8s70132:~$ ceph crash ls
ID ENTITY NEW
2023-04-30T19:46:01.008208Z_26692ab3-ba90-4129-9929-2ad8f29f0acb osd.1
2023-09-03T07:42:42.451722Z_d0e7268f-0da6-4d59-b706-35c49ee8617b osd.2
2023-12-13T22:17:16.706091Z_8dbfc488-1309-4e9a-b4b9-c1eadeb3016e osd.0 *
ceph osd tree
k8s70132:~$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 42.02631 root default
-3 10.91606 host k8sceph70201
0 hdd 3.63869 osd.0 down 0 1.00000
3 hdd 3.63869 osd.3 up 1.00000 1.00000
6 hdd 3.63869 osd.6 up 1.00000 1.00000
-5 2.72910 host k8sceph70202
1 hdd 0.90970 osd.1 up 1.00000 1.00000
4 hdd 0.90970 osd.4 up 1.00000 1.00000
7 hdd 0.90970 osd.7 up 1.00000 1.00000
-7 10.91606 host k8sceph70203
2 hdd 3.63869 osd.2 up 1.00000 1.00000
5 hdd 3.63869 osd.5 up 1.00000 1.00000
8 hdd 3.63869 osd.8 up 1.00000 1.00000
到k8sceph70201查看
sudo systmectl |grep ceph-osd
k8sceph70201:~$ sudo systemctl |grep ceph-osd
var-lib-ceph-osd-ceph\x2d0.mount loaded active mounted /var/lib/ceph/osd/ceph-0
var-lib-ceph-osd-ceph\x2d3.mount loaded active mounted /var/lib/ceph/osd/ceph-3
var-lib-ceph-osd-ceph\x2d6.mount loaded active mounted /var/lib/ceph/osd/ceph-6
● ceph-osd@0.service loaded failed failed Ceph object storage daemon osd.0
ceph-osd@3.service loaded active running Ceph object storage daemon osd.3
ceph-osd@6.service loaded active running Ceph object storage daemon osd.6
ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances
删除故障硬盘
//在k8sceph70201上关停服务
sudo systemctl stop ceph-osd@0.service
//在ceph客户端操作如下
ceph osd down osd.0
//回显内容 osd.0 is already down.
ceph osd out osd.0
//回显内容 osd.0 is already out.
ceph osd crush remove osd.0
//回显内容 removed item id 0 name 'osd.0' from crush map
ceph osd rm 0
//回显内容 removed osd.0
ceph auth del osd.0
//回显内容 updated
ceph osd tree
//回显内容
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 38.38762 root default
-3 7.27737 host k8sceph70201
3 hdd 3.63869 osd.3 up 1.00000 1.00000
6 hdd 3.63869 osd.6 up 1.00000 1.00000
-5 2.72910 host k8sceph70202
1 hdd 0.90970 osd.1 up 1.00000 1.00000
4 hdd 0.90970 osd.4 up 1.00000 1.00000
7 hdd 0.90970 osd.7 up 1.00000 1.00000
-7 10.91606 host k8sceph70203
2 hdd 3.63869 osd.2 up 1.00000 1.00000
5 hdd 3.63869 osd.5 up 1.00000 1.00000
8 hdd 3.63869 osd.8 down 1.00000 1.00000
添加硬盘
ceph-deploy disk zap /dev/sdb
ceph-deploy --overwrite-conf osd create /dev/sdb