在一台Ubuntu服务器中部署Ceph分布式存储

环境

OS：Linux 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

准备

bash 复制代码

#安装GPG证书 
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

此处流程与docker官网不同，由于docker官网推荐镜像在境外，速度会比较慢，我们这里使用阿里云镜像

来自 <https://blog.csdn.net/b1134977524/article/details/120442417> 


sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

部署

参考官网https://docs.ceph.com/en/latest/install/manual-deployment/

bash 复制代码

apt -y install cephadm
cephadm bootstrap --mon-ip 192.168.1.20 --single-host-defaults
cephadm add-repo --release reef
cephadm install ceph-common

使用

新增OSD

参考https://docs.ceph.com/en/latest/cephadm/services/osd/#cephadm-deploy-osds

Tell Ceph to consume any available and unused storage device:

查看

ceph orch device ls

方法1

ceph orch apply osd --all-available-devices

Create an OSD from a specific device on a specific host:

方法2

ceph orch daemon add osd :

For example:

ceph orch daemon add osd host1:/dev/sdb

删除OSD

bash 复制代码

# ceph orch device ls
# ceph orch osd rm 0
Scheduled OSD(s) for removal.
VG/LV for the OSDs won't be zapped (--zap wasn't passed).
Run the `ceph-volume lvm zap` command with `--destroy` against the VG/LV if you want them to be destroyed.

#上面这段话表明了以下几个信息：

#1.OSD 设备已经被计划移除。
#2.在移除 OSD 设备时，与这些设备相关的卷组（VG）和逻辑卷（LV）不会被删除。
#3.如果需要删除与这些 OSD 设备相关的 VG/LV，需要运行 ceph-volume lvm zap 命令，并添加 --destroy 参数。
#4.换句话说，这个命令告诉你 OSD 设备即将被删除，但是与这些设备相关的 VG/LV 不会被自动删除，如果需要彻底删除这些 VG/LV，需要运行ceph-volume lvm zap 命令，并在命令中添加 --destroy 参数

参考链接: https://www.xiaowangc.com/archives/40095af5.html

bash 复制代码

# cephadm shell
Inferring fsid def7078a-56ce-11ee-a479-1be9fab3deba
Inferring config /var/lib/ceph/def7078a-56ce-11ee-a479-1be9fab3deba/mon.test/config
Using ceph image with id '22cd8daf4d70' and tag 'v17' created on 2023-09-06 00:05:04 +0800 CST
quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232
# ceph-volume lvm zap --destroy /dev/nvme1n1
--> Zapping: /dev/nvme1n1
--> Zapping lvm member /dev/nvme1n1. lv_path is /dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0173921 s, 603 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
 stderr: Removing ceph--4229b334--54f8--4b21--80ed--3733cc2d4910-osd--block--866ec7e0--8b60--469e--8124--3d770608977e (253:3)
 stderr: Archiving volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" metadata (seqno 5).
 stderr: Releasing logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" (seqno 6).
 stdout: Logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e" successfully removed
 stderr: Removing physical volume "/dev/nvme1n1" from volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910"
 stdout: Volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" successfully removed
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/nvme1n1
 stdout: Labels on physical volume "/dev/nvme1n1" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/nvme1n1 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0312679 s, 335 MB/s
--> Zapping successful for: <Raw Device: /dev/nvme1n1>


# ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1              0  root default
-3              0      host test
# ceph status
  cluster:
    id:     def7078a-56ce-11ee-a479-1be9fab3deba
    health: HEALTH_WARN
            mon test is low on available space
            OSD count 0 < osd_pool_default_size 2

  services:
    mon: 1 daemons, quorum test (age 24h)
    mgr: test.ksgjsf(active, since 23h), standbys: test.lizbxa
    osd: 0 osds: 0 up (since 5m), 0 in (since 3h)

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

问题记录

1、 ssh 端口不是22 导致安装ceph失败

bash 复制代码

/usr/bin/ceph: stderr Error EINVAL: Traceback (most recent call last):
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command
/usr/bin/ceph: stderr     return self.handle_command(inbuf, cmd)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
/usr/bin/ceph: stderr     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
/usr/bin/ceph: stderr     return self.func(mgr, **kwargs)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
/usr/bin/ceph: stderr     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
/usr/bin/ceph: stderr     return func(*args, **kwargs)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 356, in _add_host
/usr/bin/ceph: stderr     return self._apply_misc([s], False, Format.plain)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 1092, in _apply_misc
/usr/bin/ceph: stderr     raise_if_exception(completion)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
/usr/bin/ceph: stderr     e = pickle.loads(c.serialized_exception)
/usr/bin/ceph: stderr TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
/usr/bin/ceph: stderr
ERROR: Failed to add host <test>: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=test -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/29dba80c-5146-11ee-a479-1be9fab3deba:/var/log/ceph:z -v /tmp/ceph-tmphvmle3fb:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmph9if039g:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch host add test 192.168.1.20

先把ssh端口修改成22，等Ceph安装后再将ssh改回原来的端口号（如2002），同时执行如下命令，将端口号加入ssh_config中：

bash 复制代码

# ceph cephadm get-ssh-config > ssh_config
# vi ssh_config
Host *
  User root
  StrictHostKeyChecking no
  Port                  2002
  UserKnownHostsFile /dev/null
  ConnectTimeout=30

# ceph cephadm set-ssh-config -i ssh_config
# ceph health detail

参考如下文章解决：

cephadm bootstrap fails with custom ssh port

来自 https://tracker.ceph.com/issues/48158

2、忘记初始密码

bash 复制代码

#创建密码文件
cat >/opt/secretkey<<EOF 
123123
EOF

ceph dashboard ac-user-set-password admin -i secretkey --force-password

参考 https://i4t.com/6075.html

3、重启systemctl restart ceph.target后报错

monclient(hunting): authenticate timed out after 300

bash 复制代码

systemctl --all | grep mon

发现：

bash 复制代码

ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service   loaded    failed   failed    Ceph mon.test for def7078a-56ce-11ee-a479-1be9f

继续查询日志

bash 复制代码

journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service

发现

bash 复制代码

error: monitor data filesystem reached concerning levels of available storage space (available: 3% 7.9 GiB

清理磁盘空间后

bash 复制代码

systemctl reset-failed  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl restart  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
ceph -s

参考https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/4EJN52JDTGI4D2PSQYHJEK5PZ5RQWM2H/

更进一步，可以将阈值从默认的5%改到3%

复制代码

ceph config set mon mon_data_avail_crit 3
ceph config show mon.test mon_data_avail_crit

上面命令中的mon.test要根据实际情况修改