问题概述:
ceph -s 显示pg 10.17 inconsistent
且命令ceph pg repair 10.17无法修复,/var/log/ceph/cep-osd.3.log报错内容如下:
pg 10.17 osd [3,4] 权威副本osd:3
repair 10.17 10:e889b16a:::rbd_data.88033092ad95.0000000000000012:b : is an unexpected clone
修复步骤:
0.备份
rados -p ceph-kvm-pool get rbd_data.88033092ad95.0000000000000012 /tmp/rbd_data.88033092ad95.0000000000000012
1.禁止集群均衡
ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub
2.禁止recover
ceph osd set nobackfill
ceph osd set norebalance
ceph osd set norecover
3.停止osd
systemctl stop ceph-osd@3
4.列出快照
ceph-objectstore-tool --pgid 10.17 \
--data-path /var/lib/ceph/osd/ceph-3/ \
--op list | grep rbd_data.88033092ad95.0000000000000012
5.删除有问题的快照
日志里:rbd_data.88033092ad95.0000000000000012:b : is an unexpected clone
b转为十进制为11,因此需要删除的为snapid为11的快照
ceph-objectstore-tool --pgid 10.17 \
--data-path /var/lib/ceph/osd/ceph-3/ \
'["10.17",{"oid":"rbd_data.88033092ad95.0000000000000012","key":"","snapid":11,"hash":1452118295,"max":0,"pool":10,"namespace":"","max":0}]' \
remove
6.启动osd
systemctl start ceph-osd@3
7.清除标记位
ceph osd unset norecover
ceph osd unset norebalance
ceph osd unset nobackfill
重复处理完全部osd(步骤2~7)
8.deep-scrub
ceph pg deep-scrub 10.17
9.处理完全部pg inconsistent后,清除其余标记位
ceph osd unset noscrub
ceph osd unset nodeep-scrub
ceph osd unset noout
其余指令:
列出不一致的pg:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --type bluestore --op list-pgs
列出不一致的object:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --type bluestore --pgid 10.17 --op list