该方式不是官方推荐的安装方式,官方推荐安装方式为cephadm
服务器准备
系统 | 主机名 | Public IP | Cluster IP |
---|---|---|---|
Rocky9.2 | node1 | 192.168.202.129 | 192.168.142.128 |
Rocky9.2 | node2 | 192.168.202.130 | 192.168.142.129 |
Rocky9.2 | node3 | 192.168.202.131 | 192.168.142.130 |
Rocky9.2 | node4 | 192.168.202.132 | 192.168.142.131 |
Rocky9.2 | node5 | 192.168.202.134 | 192.168.142.132 |
每台服务器的磁盘如下
sh
[root@node2 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 1.5G 0 rom
nvme0n1 259:0 0 60G 0 disk
├─nvme0n1p1 259:1 0 1G 0 part /boot
└─nvme0n1p2 259:2 0 59G 0 part
├─rl_192-root 253:0 0 57G 0 lvm /
└─rl_192-swap 253:1 0 2G 0 lvm [SWAP]
nvme0n2 259:3 0 20G 0 disk
nvme0n3 259:4 0 20G 0 disk
关闭防火墙和selinux
sh
# 关闭防火墙并移除开机自启动
systemctl disable firewalld --now
# 关闭selinux
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
ceph中各个组件监听的端口
服务名 | 端口 | 描述 |
---|---|---|
Monitor | 6379/TCP | 和Ceph cluster通信 |
Manager | 7000/TCP | 和Ceph Manager dashboard 通信 |
8003/TCP | 通过HTTPS和Ceph Manager RESTful API通信 | |
OSD | 9283/TCP | 和Ceph Manager Prometheus 插件通信 |
6800-7300/TCP | 每一个OSD使用该范围中的三个端口:一个通过Public network和客户端与monitors通信;一个端口通过Cluster network其他OSD发送数据,第三个端口通过Cluster network发送心跳包 | |
RADOS Gateway | 7480/TCP(configurable) | RADOS Gateway 使用7480/TCP端口,但是可以改变该端口,例如改为80/TCP, 443/TCP |
服务器的时间要同步
对于集群来说,时间同步很重要,如果时间不同步的话很容易导致ceph集群健康状态不一致的情况
sh
# 启动chronyd服务进行自动时间同步
systemctl start chronyd --now
安装前准备
本示例中将使node1作为控制节点
sh
# 安装额外的软件源,需要安装epel-release,因为ceph需要的一些包在这里面
dnf install epel-release -y
克隆ceph-ansible
sh
git clone https://github.com/ceph/ceph-ansible.git
安装ceph ansible所需的一些第三方集合
sh
# 默认rocky9没有安装pip3,需要安装pip3
dnf install python3-pip
# pip 配置国内镜像
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# 安装python第三方包。例如ansible等
pip3 install -r requirements.txt
# 上面requirements.yml里面会定义ansible的版本,所以不需要单独安装ansible了
#安装ceph-ansible所需的第三方集合
ansible-galaxy install -r requirements.yml
让node1可以和所有主机免密钥登陆
sh
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
定义ansible主机清单
注意:ansible清单文件中的组名必须为mons和mgrs,因为ceph-ansible的playbook定义了使用这个名字,例如在site.yml.example中的配置
sh
# cat /etc/ansible/hosts
[mons]
192.168.202.129
192.168.202.130
192.168.202.131
[mgrs]
192.168.202.129
192.168.202.130
192.168.202.131
# 如果有dashbord的话必须设置monitorning主机组,并且不能在部署了mgrs的节点再部署monitoring
[monitoring]
192.168.202.134
拷贝ansible的入口文件
sh
[root@node1 ceph-ansible]# cp site.yml.sample site.yml
修改变量文件
和硬件相关密切的只有osd,所以可以只拷贝osds.yml出来,其他的可以不拷贝
sh
# 进入group_vars目录
[root@node1 group_vars]# pwd
/root/ceph-ansible/group_vars
# 将sample文件拷贝为yml文件
[root@node1 group_vars]# cp mons.yml.sample mons.yml
[root@node1 group_vars]# cp mgrs.yml.sample mgrs.yml
修改ceph集群的配置文件
all.yml为ceph集群的配置文件,该文件定义了安装ceph的包路径以及定义了yum仓库,以及ceph的一些配置信息
sh
# 将配置文件拷贝出来
[root@node1 group_vars]# pwd
/root/ceph-ansible/group_vars
[root@node1 group_vars]# cp all.yml.sample all.yml
# 修改all.yml文件
[root@node1 group_vars]# grep -vE '^$|^#' all.yml
---
dummy:
#fetch_directory: ~/ceph-ansible-keys # 将密钥文件拷贝到此目录
ntp_service_enabled: false # 禁止ntp服务器,因为系统开始已经配置好了ntp了,所以这里禁止
ceph_origin: repository # 使用本地镜像仓库
ceph_repository: community # 仓库名字
ceph_mirror: http://mirrors.aliyun.com/ceph
ceph_stable_key: http://mirrors.aliyun.com/ceph/keys/release.asc
ceph_stable_release: reef # 定义ceph安装的版本,与前面git切换的分支相对应,不同的分支只能安装指定的ceph版本
ceph_stable_repo: "{{ ceph_mirror }}/rpm-{{ ceph_stable_release }}"
rbd_cache: "true"
rbd_cache_writethrough_until_flush: "false"
rbd_client_directories: false # this will create rbd_client_log_path and rbd_client_admin_socket_path directories with proper permissions
dashboard_enabled: false
monitor_interface: ens160 # 监听的接口
journal_size: 5120 # OSD journal size in MB
public_network: 192.168.202.0/24 # 客户端访问的网络
cluster_network: 192.168.142.0/24 # 集群网络
ceph_conf_overrides:
global:
mon_osd_allow_primary_affinity: 1
mon_clock_drift_allowed: 0.5
osd_pool_default_size: 2
osd_pool_default_min_size: 1
mon_pg_warn_min_per_osd: 0
mon_pg_warn_max_per_osd: 0
mon_pg_warn_max_object_skew: 0
client:
rdb_default_features: 1
执行ansible
sh
# 不加-e yes_i_know=true会无法运行,会提示迁移到cephadm新的工具上
ansible-playbook site.yml -e yes_i_know=true
查看安装的结果
sh
[root@node1 ~]# ceph -s
cluster:
id: a82e91cb-dedc-4115-b44a-2bc8b7190afe
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
1 daemons have recently crashed
6 mgr modules have recently crashed
OSD count 0 < osd_pool_default_size 2
services:
mon: 3 daemons, quorum node1,node2,node3 (age 38m)
mgr: node3(active, since 36m), standbys: node2, node1
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
添加修改osd变量文件
sh
# 将osd.yml文件拷贝出来
[root@node1 ceph-ansible]# cp group_vars/osds.yml.sample group_vars/osds.yml
# 修改osds.yml文件
[root@node1 ceph-ansible]# grep -vE '^$|^#' group_vars/osds.yml
---
dummy:
devices:
- /dev/nvme0n2
- /dev/nvme0n3
定义osd的ansible清单文件
sh
[root@node1 ceph-ansible]# cat /etc/ansible/hosts
[mons]
192.168.202.129
192.168.202.130
192.168.202.131
[mgrs]
192.168.202.129
192.168.202.130
192.168.202.131
[osds]
192.168.202.129
192.168.202.130
192.168.202.131
192.168.202.132
执行ansible-playbook
sh
ansible-playbook site.yml -e yes_i_know=true
查看结果
sh
[root@node1 ceph-ansible]# ceph status
cluster:
id: a82e91cb-dedc-4115-b44a-2bc8b7190afe
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
1 daemons have recently crashed
9 mgr modules have recently crashed
services:
mon: 3 daemons, quorum node1,node2,node3 (age 2m)
mgr: node3(active, since 112s), standbys: node1, node2
osd: 8 osds: 8 up (since 46s), 8 in (since 57s)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 213 MiB used, 160 GiB / 160 GiB avail
pgs: 1 active+clean
查看服务器监听的IP
sh
[root@node1 ceph-ansible]# ss -tunlp
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 127.0.0.1:323 0.0.0.0:* users:(("chronyd",pid=800,fd=5))
udp UNCONN 0 0 [::1]:323 [::]:* users:(("chronyd",pid=800,fd=6))
tcp LISTEN 0 512 192.168.202.129:6789 0.0.0.0:* users:(("ceph-mon",pid=37151,fd=28))
tcp LISTEN 0 512 192.168.202.129:6802 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=22))
tcp LISTEN 0 512 192.168.202.129:6803 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=23))
tcp LISTEN 0 512 192.168.202.129:6800 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=18))
tcp LISTEN 0 512 192.168.202.129:6801 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=19))
tcp LISTEN 0 512 192.168.202.129:6806 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=22))
tcp LISTEN 0 512 192.168.202.129:6807 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=23))
tcp LISTEN 0 512 192.168.202.129:6804 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=18))
tcp LISTEN 0 512 192.168.202.129:6805 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=19))
tcp LISTEN 0 512 192.168.202.129:3300 0.0.0.0:* users:(("ceph-mon",pid=37151,fd=27))
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=813,fd=3))
tcp LISTEN 0 512 192.168.142.128:6802 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=24))
tcp LISTEN 0 512 192.168.142.128:6803 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=25))
tcp LISTEN 0 512 192.168.142.128:6800 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=20))
tcp LISTEN 0 512 192.168.142.128:6801 0.0.0.0:* users:(("ceph-osd",pid=47203,fd=21))
tcp LISTEN 0 512 192.168.142.128:6806 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=24))
tcp LISTEN 0 512 192.168.142.128:6807 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=25))
tcp LISTEN 0 512 192.168.142.128:6804 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=20))
tcp LISTEN 0 512 192.168.142.128:6805 0.0.0.0:* users:(("ceph-osd",pid=48625,fd=21))
tcp LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=813,fd=4))
安装client
使用node5作为客户端
修改clients的yml文件
sh
# 从模板文件拷贝出来clients.yml
[root@node1 ceph-ansible]# cp group_vars/clients.yml.sample group_vars/clients.yml
# 修改clients内容如下
[root@node1 ceph-ansible]# grep -vE '^$|^#' group_vars/clients.yml
---
dummy:
copy_admin_key: true # 将admin 的 key拷贝出来,没有key认证会失败
修改ansible清单文件
sh
[root@node1 ceph-ansible]# cat /etc/ansible/hosts
[mons]
192.168.202.129
192.168.202.130
192.168.202.131
[mgrs]
192.168.202.129
192.168.202.130
192.168.202.131
[osds]
192.168.202.129
192.168.202.130
192.168.202.131
192.168.202.132
[clients]
192.168.202.134
执行ansible-playbook
sh
ansible-playbook site.yml -e yes_i_know=true
遇到的问题
问题一:
sh
fatal: [192.168.202.131]: FAILED! => changed=false
attempts: 3
failures: []
msg: |-
Depsolve Error occurred:
Problem 1: conflicting requests
- nothing provides libtcmalloc.so.4()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides libthrift-0.14.0.so()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0(LIBOATH_1.10.0)(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0(LIBOATH_1.2.0)(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
Problem 2: conflicting requests
- nothing provides libtcmalloc.so.4()(64bit) needed by ceph-mon-2:17.2.6-0.el9.x86_64
rc: 1
results: []
fatal: [192.168.202.130]: FAILED! => changed=false
attempts: 3
failures: []
msg: |-
Depsolve Error occurred:
Problem 1: conflicting requests
- nothing provides libtcmalloc.so.4()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides libthrift-0.14.0.so()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0(LIBOATH_1.10.0)(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
- nothing provides liboath.so.0(LIBOATH_1.2.0)(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
Problem 2: conflicting requests
- nothing provides libtcmalloc.so.4()(64bit) needed by ceph-mon-2:17.2.6-0.el9.x86_64
rc: 1
results: []
fatal: [192.168.202.129]: FAILED! => changed=false
attempts: 3
failures: []
msg: |-
Depsolve Error occurred:
Problem 1: conflicting requests
- nothing provides libthrift-0.14.0.so()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64
Problem 2: package ceph-mon-2:17.2.6-0.el9.x86_64 requires ceph-base = 2:17.2.6-0.el9, but none of the providers can be installed
- package ceph-base-2:17.2.6-0.el9.x86_64 requires librgw2 = 2:17.2.6-0.el9, but none of the providers can be installed
- conflicting requests
- nothing provides libthrift-0.14.0.so()(64bit) needed by librgw2-2:17.2.6-0.el9.x86_64
rc: 1
results: []
解决办法:需要安装epel源
sh
dnf install epel-release -y
问题二
sh
TASK [ceph-infra : install firewalld python binding] *********************************************************************************************An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
fatal: [192.168.202.129]: FAILED! => changed=false
module_stderr: |-
Shared connection to 192.168.202.129 closed.
module_stdout: |-
Traceback (most recent call last):
File "/root/.ansible/tmp/ansible-tmp-1694568395.6913803-54958-96914655721864/AnsiballZ_dnf.py", line 107, in <module>
_ansiballz_main()
File "/root/.ansible/tmp/ansible-tmp-1694568395.6913803-54958-96914655721864/AnsiballZ_dnf.py", line 99, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/root/.ansible/tmp/ansible-tmp-1694568395.6913803-54958-96914655721864/AnsiballZ_dnf.py", line 47, in invoke_module
runpy.run_module(mod_name='ansible.modules.dnf', init_globals=dict(_module_fqn='ansible.modules.dnf', _modlib_path=modlib_path),
File "/usr/lib64/python3.9/runpy.py", line 225, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/tmp/ansible_ansible.legacy.dnf_payload_o8_ihnkm/ansible_ansible.legacy.dnf_payload.zip/ansible/modules/dnf.py", line 359, in <module> File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/tmp/ansible_ansible.legacy.dnf_payload_o8_ihnkm/ansible_ansible.legacy.dnf_payload.zip/ansible/module_utils/urls.py", line 115, in <module>
File "/usr/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py", line 50, in <module>
import OpenSSL.SSL
File "/usr/lib/python3.9/site-packages/OpenSSL/__init__.py", line 8, in <module>
from OpenSSL import crypto, SSL
File "/usr/lib/python3.9/site-packages/OpenSSL/crypto.py", line 3279, in <module>
_lib.OpenSSL_add_all_algorithms()
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
msg: |-
MODULE FAILURE
See stdout/stderr for the exact error
rc: 1
解决办法:升级升级 OpenSSL 库
sh
python3 -m pip install --upgrade pyOpenSSL
问题三
sh
TASK [ceph-mgr : add modules to ceph-mgr] ********************************************************************************************************failed: [192.168.202.131 -> 192.168.202.129] (item=dashboard) => changed=true
ansible_loop_var: item
cmd:
- ceph
- -n
- client.admin
- -k
- /etc/ceph/ceph.client.admin.keyring
- --cluster
- ceph
- mgr
- module
- enable
- dashboard
delta: '0:00:00.210799'
end: '2023-09-13 09:32:06.953160'
item: dashboard
rc: 2
start: '2023-09-13 09:32:06.742361'
stderr: 'Error ENOENT: module ''dashboard'' reports that it cannot run on the active manager daemon: PyO3 modules may only be initialized once per interpreter process (pass --force to force enablement)'
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
问题四
sh
[WARNING]: log file at /root/ansible/ansible.log is not writeable and we cannot create it, aborting
[DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks instead. This feature will be removed in version 2.16. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ERROR! couldn't resolve module/action 'openstack.config_template.config_template'. This often indicates a misspelling, missing collection, or incorrect module path.
The error appears to be in '/root/ceph-ansible/roles/ceph-config/tasks/main.yml': line 137, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: "generate {{ cluster }}.conf configuration file"
^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value. For instance:
with_items:
- {{ foo }}
Should be written as:
with_items:
- "{{ foo }}"
解决办法: 安装ceph-ansible所需的第三方包
sh
#安装ceph-ansible所需的第三方集合
ansible-galaxy install -r requirements.yml
问题五
sh
TASK [ceph-validate : fail if monitoring group doesn't exist] ************************************************************************************fatal: [192.168.202.131]: FAILED! => changed=false
msg: you must add a monitoring group and add at least one node.
fatal: [192.168.202.129]: FAILED! => changed=false
msg: you must add a monitoring group and add at least one node.
fatal: [192.168.202.130]: FAILED! => changed=false
msg: you must add a monitoring group and add at least one node.
解决办法一:关闭dashboard
sh
# 在group_vars/all.yml添加如下参数
dashboard_enabled: false