在一台Ubuntu服务器中部署Ceph分布式存储

环境

OS:Linux 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)

准备

bash 复制代码
#安装GPG证书 
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

此处流程与docker官网不同,由于docker官网推荐镜像在境外,速度会比较慢,我们这里使用阿里云镜像

来自 <https://blog.csdn.net/b1134977524/article/details/120442417> 


sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

部署

参考官网https://docs.ceph.com/en/latest/install/manual-deployment/

bash 复制代码
apt -y install cephadm
cephadm bootstrap --mon-ip 192.168.1.20 --single-host-defaults
cephadm add-repo --release reef
cephadm install ceph-common

使用

新增OSD

参考https://docs.ceph.com/en/latest/cephadm/services/osd/#cephadm-deploy-osds

Tell Ceph to consume any available and unused storage device:

查看

ceph orch device ls

方法1

ceph orch apply osd --all-available-devices

Create an OSD from a specific device on a specific host:

方法2

ceph orch daemon add osd :

For example:

ceph orch daemon add osd host1:/dev/sdb

删除OSD

bash 复制代码
# ceph orch device ls
# ceph orch osd rm 0
Scheduled OSD(s) for removal.
VG/LV for the OSDs won't be zapped (--zap wasn't passed).
Run the `ceph-volume lvm zap` command with `--destroy` against the VG/LV if you want them to be destroyed.

#上面这段话表明了以下几个信息:

#1.OSD 设备已经被计划移除。
#2.在移除 OSD 设备时,与这些设备相关的卷组(VG)和逻辑卷(LV)不会被删除。
#3.如果需要删除与这些 OSD 设备相关的 VG/LV,需要运行 ceph-volume lvm zap 命令,并添加 --destroy 参数。
#4.换句话说,这个命令告诉你 OSD 设备即将被删除,但是与这些设备相关的 VG/LV 不会被自动删除,如果需要彻底删除这些 VG/LV,需要运行ceph-volume lvm zap 命令,并在命令中添加 --destroy 参数

参考链接: https://www.xiaowangc.com/archives/40095af5.html

bash 复制代码
# cephadm shell
Inferring fsid def7078a-56ce-11ee-a479-1be9fab3deba
Inferring config /var/lib/ceph/def7078a-56ce-11ee-a479-1be9fab3deba/mon.test/config
Using ceph image with id '22cd8daf4d70' and tag 'v17' created on 2023-09-06 00:05:04 +0800 CST
quay.io/ceph/ceph@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232
# ceph-volume lvm zap --destroy /dev/nvme1n1
--> Zapping: /dev/nvme1n1
--> Zapping lvm member /dev/nvme1n1. lv_path is /dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910/osd-block-866ec7e0-8b60-469e-8124-3d770608977e bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0173921 s, 603 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-4229b334-54f8-4b21-80ed-3733cc2d4910
 stderr: Removing ceph--4229b334--54f8--4b21--80ed--3733cc2d4910-osd--block--866ec7e0--8b60--469e--8124--3d770608977e (253:3)
 stderr: Archiving volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" metadata (seqno 5).
 stderr: Releasing logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" (seqno 6).
 stdout: Logical volume "osd-block-866ec7e0-8b60-469e-8124-3d770608977e" successfully removed
 stderr: Removing physical volume "/dev/nvme1n1" from volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910"
 stdout: Volume group "ceph-4229b334-54f8-4b21-80ed-3733cc2d4910" successfully removed
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/nvme1n1
 stdout: Labels on physical volume "/dev/nvme1n1" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/nvme1n1 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0312679 s, 335 MB/s
--> Zapping successful for: <Raw Device: /dev/nvme1n1>


# ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1              0  root default
-3              0      host test
# ceph status
  cluster:
    id:     def7078a-56ce-11ee-a479-1be9fab3deba
    health: HEALTH_WARN
            mon test is low on available space
            OSD count 0 < osd_pool_default_size 2

  services:
    mon: 1 daemons, quorum test (age 24h)
    mgr: test.ksgjsf(active, since 23h), standbys: test.lizbxa
    osd: 0 osds: 0 up (since 5m), 0 in (since 3h)

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

问题记录

1、 ssh 端口不是22 导致安装ceph失败

bash 复制代码
/usr/bin/ceph: stderr Error EINVAL: Traceback (most recent call last):
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command
/usr/bin/ceph: stderr     return self.handle_command(inbuf, cmd)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
/usr/bin/ceph: stderr     return dispatch[cmd['prefix']].call(self, cmd, inbuf)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
/usr/bin/ceph: stderr     return self.func(mgr, **kwargs)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
/usr/bin/ceph: stderr     wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
/usr/bin/ceph: stderr     return func(*args, **kwargs)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 356, in _add_host
/usr/bin/ceph: stderr     return self._apply_misc([s], False, Format.plain)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/module.py", line 1092, in _apply_misc
/usr/bin/ceph: stderr     raise_if_exception(completion)
/usr/bin/ceph: stderr   File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception
/usr/bin/ceph: stderr     e = pickle.loads(c.serialized_exception)
/usr/bin/ceph: stderr TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
/usr/bin/ceph: stderr
ERROR: Failed to add host <test>: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=test -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/29dba80c-5146-11ee-a479-1be9fab3deba:/var/log/ceph:z -v /tmp/ceph-tmphvmle3fb:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmph9if039g:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch host add test 192.168.1.20

先把ssh端口修改成22,等Ceph安装后再将ssh改回原来的端口号(如2002),同时执行如下命令,将端口号加入ssh_config中:

bash 复制代码
# ceph cephadm get-ssh-config > ssh_config
# vi ssh_config
Host *
  User root
  StrictHostKeyChecking no
  Port                  2002
  UserKnownHostsFile /dev/null
  ConnectTimeout=30

# ceph cephadm set-ssh-config -i ssh_config
# ceph health detail

参考如下文章解决:

cephadm bootstrap fails with custom ssh port

来自 https://tracker.ceph.com/issues/48158

2、忘记初始密码

bash 复制代码
#创建密码文件
cat >/opt/secretkey<<EOF 
123123
EOF

ceph dashboard ac-user-set-password admin -i secretkey --force-password

参考 https://i4t.com/6075.html

3、重启systemctl restart ceph.target后报错

monclient(hunting): authenticate timed out after 300

bash 复制代码
systemctl --all | grep mon

发现:

bash 复制代码
ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service   loaded    failed   failed    Ceph mon.test for def7078a-56ce-11ee-a479-1be9f

继续查询日志

bash 复制代码
journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service

发现

bash 复制代码
error: monitor data filesystem reached concerning levels of available storage space (available: 3% 7.9 GiB

清理磁盘空间后

bash 复制代码
systemctl reset-failed  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
journalctl -xeu ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl restart  ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
systemctl status ceph-def7078a-56ce-11ee-a479-1be9fab3deba@mon.test.service
ceph -s

参考https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/4EJN52JDTGI4D2PSQYHJEK5PZ5RQWM2H/

更进一步,可以将阈值从默认的5%改到3%

复制代码
ceph config set mon mon_data_avail_crit 3
ceph config show mon.test mon_data_avail_crit

上面命令中的mon.test要根据实际情况修改

相关推荐
Ditglu.10 分钟前
使用Prometheus + Grafana + node_exporter实现Linux服务器性能监控
服务器·grafana·prometheus
xkroy3 小时前
javaJVM ‘
服务器
fulangxisikexi3 小时前
bgp笔记
服务器·网络·笔记
笨鸟要努力6 小时前
Ubuntu 全盘备份
linux·运维·ubuntu
ChironW6 小时前
Ubuntu 22.04 离线环境下完整安装 Anaconda、CUDA 12.1、NVIDIA 驱动及 cuDNN 8.9.3 教程
linux·运维·人工智能·深度学习·yolo·ubuntu
你无法关注此用户7 小时前
CentOS7搭建安全FTP服务器指南
运维·服务器
小白的代码日记7 小时前
Linux常用指令
linux·运维·服务器
用户7227868123447 小时前
iptables服务详解
服务器
1990_super8 小时前
使用ceph-deploy安装和配置RADOS Gateway (RGW)并使用S3访问集群
ceph·gateway
小熊h8 小时前
【自动化备份全网服务器数据项目】
linux·服务器·自动化·备份数据