1. 环境规划
1.1 节点信息
| 角色 | 主机名 | IP 地址 | 操作系统 | 用途 |
|---|---|---|---|---|
| Ansible 控制节点 | ansible-master | 192.168.194.15 | RHEL 10.1 | 安装 Ansible,执行 Playbook |
| Docker 目标节点 | docker-node-01 | 192.168.194.16 | RHEL 10.1 | 安装 Docker CE |
| Docker 目标节点 | docker-node-02 | 192.168.194.17 | RHEL 10.1 | 安装 Docker CE |
1.2 前置条件
- 所有节点已安装 RHEL 10.1 最小化安装(Minimal Install)
- 所有节点已配置静态 IP 及主机名
- 控制节点可 SSH 免密登录所有目标节点
2. 控制节点环境准备
2.1 配置主机名与 hosts 解析
在控制节点上执行:
bash
# 设置主机名以及IP
sudo hostnamectl set-hostname ansible-master
sudo hostnamectl set-hostname docker-node-01
sudo hostnamectl set-hostname docker-node-02
nmcli c m ens160 ipv4.addresses 192.168.194.15/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c m ens160 ipv4.addresses 192.168.194.16/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c m ens160 ipv4.addresses 192.168.194.17/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c up ens160
# 编辑 /etc/hosts,添加所有节点解析
sudo tee -a /etc/hosts <<EOF
192.168.194.15 ansible-master
192.168.194.16 docker-node-01
192.168.194.17 docker-node-02
EOF
2.2 在 ansible 主机上生成密钥对
bash
[root@ansible .ssh]# ssh-keygen -t rsa
然后四个回车,生成成功后分别执行如下命令:
bash
[root@ansible-master ~]# ssh-copy-id docker-node-01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'docker-node-01 (192.168.194.16)' can't be established.
ED25519 key fingerprint is SHA256:cOcQwno9t4p5xBrZ5NwA0FVKmSFGPea8xF6/Xlgl8CQ.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@docker-node-01's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'docker-node-01'"
and check to make sure that only the key(s) you wanted were added.
[root@ansible-master ~]# ssh-copy-id docker-node-02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'docker-node-02 (192.168.194.17)' can't be established.
ED25519 key fingerprint is SHA256:cOcQwno9t4p5xBrZ5NwA0FVKmSFGPea8xF6/Xlgl8CQ.
This host key is known by the following other names/addresses:
~/.ssh/known_hosts:1: docker-node-01
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@docker-node-02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'docker-node-02'"
and check to make sure that only the key(s) you wanted were added.
3. 安装 Ansible
只需要在主节点安装即可。
3.1 方法一:通过 EPEL 仓库安装(推荐)
RHEL 10 需要安装 EPEL 10 仓库:
bash
# 安装 EPEL 仓库(使用 dnf5,RHEL 10 默认包管理器),使用国内阿里云来安装
dnf install https://mirrors.aliyun.com/epel/10/Everything/x86_64/Packages/e/epel-release-10-7.el10_1.noarch.rpm -y
# 安装 Ansible
dnf install -y ansible-core ansible-collection-ansible-posix
3.2 方法二:通过 pip 安装(获取最新版)
bash
sudo dnf install -y python3-pip
pip3 install --user ansible
验证安装:
bash
ansible --version
# 期望输出包含:
# ansible [core x.x.x]
# python version = 3.12.x
4. 创建 Ansible 项目结构
bash
mkdir -p ~/ansible-docker-deploy/{inventory,playbooks,roles}
cd ~/ansible-docker-deploy
最终目录结构:
ansible-docker-deploy/
├── ansible.cfg # Ansible 全局配置
├── inventory/
│ └── hosts # 主机清单
├── playbooks/
│ └── deploy-docker.yml # 主 Playbook
└── roles/ # (可选)Role 目录
5. 配置 Ansible
5.1 编写 ansible.cfg
在项目根目录创建 ansible.cfg:
bash
cat > ~/ansible-docker-deploy/ansible.cfg <<EOF
[defaults]
inventory = ./inventory/hosts
host_key_checking = False
remote_user = root
become = True
become_user = root
become_method = sudo
gathering = smart
retry_files_enabled = False
stdout_callback = yaml
EOF
5.2 编写主机清单
bash
cat > ~/ansible-docker-deploy/inventory/hosts <<EOF
[docker]
docker-node-01 ansible_host=192.168.194.16
docker-node-02 ansible_host=192.168.194.17
[docker:vars]
ansible_user=root
ansible_become=yes
EOF
5.3 验证连通性
bash
cd ~/ansible-docker-deploy
ansible all -m ping
期望输出:
yaml
docker-node-01 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
docker-node-02 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
6. 编写 Docker 部署 Playbook
6.1 完整 Playbook
创建 playbooks/deploy-docker.yml:
bash
sudo tee ~/ansible-docker-deploy/playbooks/deploy-docker.yml > /dev/null <<'EOF'
---
- name: 通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE
hosts: docker
gather_facts: yes
become: yes
vars:
docker_version: "latest"
docker_data_root: "/var/lib/docker"
docker_log_max_size: "100m"
docker_log_max_file: "3"
docker_registry_mirrors:
- "https://mirror.ccs.tencentyun.com"
- "https://registry.cn-hangzhou.aliyuncs.com"
tasks:
- name: 卸载旧版本 Docker 及相关包
ansible.builtin.dnf:
name:
- docker
- docker-client
- docker-client-latest
- docker-common
- docker-latest
- docker-latest-logrotate
- docker-logrotate
- docker-engine
- podman
- runc
- containerd
- containerd.io
state: absent
autoremove: yes
ignore_errors: yes
- name: 安装必要系统依赖
ansible.builtin.dnf:
name:
- dnf-plugins-core
- device-mapper-persistent-data
- lvm2
- curl
- ca-certificates
- tar
- iptables
- net-tools
- socat
state: present
- name: 添加 Docker 官方 GPG 密钥
ansible.builtin.rpm_key:
key: https://download.docker.com/linux/rhel/gpg
state: present
- name: 配置阿里云 Docker CE 仓库
ansible.builtin.yum_repository:
name: docker-ce-stable-aliyun
description: Docker CE Stable - Aliyun Mirror for RHEL 10
baseurl: https://mirrors.aliyun.com/docker-ce/linux/centos/10/$basearch/stable
enabled: yes
gpgcheck: yes
gpgkey: https://download.docker.com/linux/rhel/gpg
module_hotfixes: true
- name: 清理并重建 DNF 缓存
shell: |
dnf clean all
dnf makecache --timer
args:
executable: /bin/bash
changed_when: false
- name: 安装 Docker CE 及相关组件
ansible.builtin.dnf:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-buildx-plugin
- docker-compose-plugin
state: latest
allow_downgrade: yes
register: docker_install_result
retries: 5
delay: 10
until: docker_install_result is succeeded
- name: 创建 Docker 配置目录
ansible.builtin.file:
path: /etc/docker
state: directory
mode: "0755"
- name: 生成 Docker daemon.json 配置文件
ansible.builtin.copy:
content: |
{
"data-root": "{{ docker_data_root }}",
"log-driver": "json-file",
"log-opts": {
"max-size": "{{ docker_log_max_size }}",
"max-file": "{{ docker_log_max_file }}"
},
"registry-mirrors": {{ docker_registry_mirrors | to_json }},
"exec-opts": ["native.cgroupdriver=systemd"],
"storage-driver": "overlay2",
"live-restore": true,
"iptables": true,
"ip-forward": true,
"max-concurrent-downloads": 10,
"max-concurrent-uploads": 5
}
dest: /etc/docker/daemon.json
owner: root
group: root
mode: "0644"
- name: 加载 overlay 和 br_netfilter 内核模块
community.general.modprobe:
name: "{{ item }}"
state: present
loop:
- overlay
- br_netfilter
- name: 持久化内核模块配置
ansible.builtin.copy:
content: |
overlay
br_netfilter
dest: /etc/modules-load.d/docker.conf
owner: root
group: root
mode: "0644"
- name: 配置 sysctl 网络参数
ansible.posix.sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_set: yes
loop:
- { key: "net.bridge.bridge-nf-call-iptables", value: "1" }
- { key: "net.bridge.bridge-nf-call-ip6tables", value: "1" }
- { key: "net.ipv4.ip_forward", value: "1" }
- { key: "net.ipv4.conf.all.forwarding", value: "1" }
- name: 开放 Docker 所需端口
ansible.posix.firewalld:
port: "{{ item }}"
permanent: yes
state: enabled
immediate: yes
loop:
- 2375/tcp
- 2376/tcp
- 2377/tcp
- 7946/tcp
- 7946/udp
- 4789/udp
when: ansible_facts.services['firewalld.service'] is defined and ansible_facts.services['firewalld.service'].state == 'running'
ignore_errors: yes
- name: 重载 systemd 配置
ansible.builtin.systemd:
daemon_reload: yes
- name: 启动 Docker 并设置开机自启
ansible.builtin.systemd:
name: docker
state: started
enabled: yes
- name: 等待 Docker Socket 就绪
ansible.builtin.wait_for:
path: /var/run/docker.sock
state: present
timeout: 30
delay: 2
- name: 将 root 用户加入 docker 组
ansible.builtin.user:
name: root
groups: docker
append: yes
- name: 将 ansible 用户也加入 docker 组
ansible.builtin.user:
name: "{{ ansible_user }}"
groups: docker
append: yes
when: ansible_user != 'root'
- name: 检查 Docker 客户端版本
ansible.builtin.command: docker --version
register: docker_version_output
changed_when: false
- name: 检查 Docker 服务端详细信息
ansible.builtin.shell: |
docker info 2>/dev/null | grep -E "Server Version|Storage Driver|Registry Mirrors" | sed 's/^ *//'
register: docker_info_output
changed_when: false
- name: 运行 hello-world 测试容器
ansible.builtin.command:
cmd: timeout 60 docker run --rm hello-world
register: hello_world_output
changed_when: false
ignore_errors: yes
- name: 输出部署结果摘要
ansible.builtin.debug:
msg:
- "🎉 Docker 部署完成!"
- "📌 客户端版本: {{ docker_version_output.stdout }}"
- "📌 服务端信息:"
- "{{ docker_info_output.stdout_lines }}"
- "📌 测试容器: {{ '✅ 成功' if hello_world_output.rc == 0 else '⚠️ 失败(可能网络问题,可稍后重试)' }}"
- "📌 镜像加速器: {{ docker_registry_mirrors | join(', ') }}"
EOF
6.2 Playbook 执行步骤概要
| 步骤 | 操作 | 说明 |
|---|---|---|
| 第一步 | 卸载旧版本 | 清理 Docker/Podman 残留 |
| 第二步 | 安装依赖 | dnf-plugins-core、lvm2 等 |
| 第三步 | 添加仓库 | Docker CE YUM 源 + GPG 密钥 |
| 第四步 | 安装 Docker | docker-ce + cli + compose 插件 |
| 第五步 | 配置 daemon.json | 日志、存储、镜像加速 |
| 第六步 | 内核参数 | overlay + bridge 转发 |
| 第七步 | 防火墙 | 开放 Docker/Swarm 端口 |
| 第八步 | 启动服务 | 启动 + 开机自启 |
| 第九步 | 用户权限 | ansible 用户加入 docker 组 |
| 第十步 | 验证 | 版本检查 + hello-world 测试 |
7. 安装依赖 Collection
执行 Playbook 前,需要安装依赖的 Ansible Collection:
bash
ansible-galaxy collection install ansible.posix community.general
验证已安装的 Collection:
bash
ansible-galaxy collection list | grep -E "ansible.posix|community.general"
8. 执行部署
8.1 语法检查(Dry Run)
先在检查模式下运行,确认 Playbook 没有语法错误且变更符合预期:
bash
cd ~/ansible-docker-deploy
ansible-playbook playbooks/deploy-docker.yml --check
# 报错
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --check
[WARNING]: Collection community.general does not support Ansible version 2.16.14
ERROR! [DEPRECATED]: community.general.yaml has been removed. The plugin has been superseded by the option `result_format=yaml` in callback plugin ansible.builtin.default from ansible-core 2.13 onwards. This feature was removed from community.general in version 12.0.0. Please update your playbooks.
## 是环境配置的小冲突,community.general版本太新了
## 降低版本
[root@ansible-master ansible-docker-deploy]# ansible-galaxy collection install community.general:9.5.0
Starting galaxy collection install process
[WARNING]: Collection community.general does not support Ansible version 2.16.14
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/community-general-9.5.0.tar.gz to /root/.ansible/tmp/ansible-local-3446c0kh8jpz/tmpn8z_v25q/community-general-9.5.0-5gk99d9f
Installing 'community.general:9.5.0' to '/root/.ansible/collections/ansible_collections/community/general'
community.general:9.5.0 was installed successfully
# 再次验证
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --check
PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]
TASK [添加 Docker 官方 GPG 密钥] ******************************************************************************************************************************************************
fatal: [docker-node-02]: FAILED! => changed=false
msg: 'failed to fetch key at https://download.docker.com/linux/rhel/gpg , error was: Request failed: <urlopen error [Errno 104] Connection reset by peer>'
fatal: [docker-node-01]: FAILED! => changed=false
msg: 'failed to fetch key at https://download.docker.com/linux/rhel/gpg , error was: Request failed: <urlopen error [Errno 104] Connection reset by peer>'
PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01 : ok=3 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
docker-node-02 : ok=3 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
# 从阿里云下载 GPG 密钥(推荐)
[root@ansible-master ansible-docker-deploy]# curl -o /tmp/docker-gpg https://mirrors.aliyun.com/docker-ce/linux/rhel/gpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1627 100 1627 0 0 18619 0 --:--:-- --:--:-- --:--:-- 18701
# 验证下载成功
[root@ansible-master ansible-docker-deploy]# file /tmp/docker-gpg
# 应该输出: /tmp/docker-gpg: PGP public key block Public-Key (old)
/tmp/docker-gpg: PGP public key block Public-Key (old)
[root@ansible-master ansible-docker-deploy]#
# 把原来"在线添加密钥"的任务,改成从文件读取,这样就不依赖目标主机的网络了。
# 执行下面这条命令,它会自动修改你的 Playbook:
[root@ansible-master ansible-docker-deploy]# sed -i '/ansible.builtin.rpm_key:/,/state: present/ {
s|key: https://download.docker.com/linux/rhel/gpg|key: /tmp/docker-gpg|
/state: present/a\ file: /tmp/docker-gpg
}' ~/ansible-docker-deploy/playbooks/deploy-docker.yml
# 然后用这条命令,把下载好的密钥文件分发到所有目标主机的 /tmp 目录:
[root@ansible-master ansible-docker-deploy]# ansible docker -m copy -a "src=/tmp/docker-gpg dest=/tmp/docker-gpg" --become
docker-node-01 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": true,
"checksum": "f271af33d2a950ca31782debd8fabe5fee77b176",
"dest": "/tmp/docker-gpg",
"gid": 0,
"group": "root",
"md5sum": "72ccb88a3e48418a07d01f7f9aeb45b1",
"mode": "0644",
"owner": "root",
"secontext": "unconfined_u:object_r:admin_home_t:s0",
"size": 1627,
"src": "/root/.ansible/tmp/ansible-tmp-1777969259.351255-3765-95521442440540/source",
"state": "file",
"uid": 0
}
docker-node-02 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": true,
"checksum": "f271af33d2a950ca31782debd8fabe5fee77b176",
"dest": "/tmp/docker-gpg",
"gid": 0,
"group": "root",
"md5sum": "72ccb88a3e48418a07d01f7f9aeb45b1",
"mode": "0644",
"owner": "root",
"secontext": "unconfined_u:object_r:admin_home_t:s0",
"size": 1627,
"src": "/root/.ansible/tmp/ansible-tmp-1777969259.3707564-3766-221830823470738/source",
"state": "file",
"uid": 0
}
# 使用sed命令会破坏原有yaml文件的格式,所以重新覆盖源文件
cat > ~/ansible-docker-deploy/playbooks/deploy-docker.yml <<'PLAYBOOK_EOF'
---
- name: 通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE
hosts: docker
gather_facts: yes
become: yes
vars:
docker_version: "latest"
docker_data_root: "/var/lib/docker"
docker_log_max_size: "100m"
docker_log_max_file: "3"
docker_registry_mirrors:
- "https://mirror.ccs.tencentyun.com"
- "https://registry.cn-hangzhou.aliyuncs.com"
tasks:
- name: 卸载旧版本 Docker 及相关包
ansible.builtin.dnf:
name:
- docker
- docker-client
- docker-client-latest
- docker-common
- docker-latest
- docker-latest-logrotate
- docker-logrotate
- docker-engine
- podman
- runc
- containerd
- containerd.io
state: absent
autoremove: yes
ignore_errors: yes
- name: 安装必要系统依赖
ansible.builtin.dnf:
name:
- dnf-plugins-core
- device-mapper-persistent-data
- lvm2
- curl
- ca-certificates
- tar
- iptables
- net-tools
- socat
state: present
- name: 添加 Docker 官方 GPG 密钥(从本地文件)
ansible.builtin.rpm_key:
key: /tmp/docker-gpg
state: present
- name: 配置阿里云 Docker CE 仓库
ansible.builtin.yum_repository:
name: docker-ce-stable-aliyun
description: Docker CE Stable - Aliyun Mirror for RHEL 10
baseurl: https://mirrors.aliyun.com/docker-ce/linux/centos/10/$basearch/stable
enabled: yes
gpgcheck: yes
gpgkey: file:///tmp/docker-gpg
module_hotfixes: true
- name: 清理并重建 DNF 缓存
shell: |
dnf clean all
dnf makecache --timer
args:
executable: /bin/bash
changed_when: false
- name: 安装 Docker CE 及相关组件
ansible.builtin.dnf:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-buildx-plugin
- docker-compose-plugin
state: latest
allow_downgrade: yes
register: docker_install_result
retries: 5
delay: 10
until: docker_install_result is succeeded
- name: 创建 Docker 配置目录
ansible.builtin.file:
path: /etc/docker
state: directory
mode: "0755"
- name: 生成 Docker daemon.json 配置文件
ansible.builtin.copy:
content: |
{
"data-root": "{{ docker_data_root }}",
"log-driver": "json-file",
"log-opts": {
"max-size": "{{ docker_log_max_size }}",
"max-file": "{{ docker_log_max_file }}"
},
"registry-mirrors": {{ docker_registry_mirrors | to_json }},
"exec-opts": ["native.cgroupdriver=systemd"],
"storage-driver": "overlay2",
"live-restore": true,
"iptables": true,
"ip-forward": true,
"max-concurrent-downloads": 10,
"max-concurrent-uploads": 5
}
dest: /etc/docker/daemon.json
owner: root
group: root
mode: "0644"
notify: restart docker
- name: 加载 overlay 和 br_netfilter 内核模块
community.general.modprobe:
name: "{{ item }}"
state: present
loop:
- overlay
- br_netfilter
- name: 持久化内核模块配置
ansible.builtin.copy:
content: |
overlay
br_netfilter
dest: /etc/modules-load.d/docker.conf
owner: root
group: root
mode: "0644"
- name: 配置 sysctl 网络参数
ansible.posix.sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_set: yes
loop:
- { key: "net.bridge.bridge-nf-call-iptables", value: "1" }
- { key: "net.bridge.bridge-nf-call-ip6tables", value: "1" }
- { key: "net.ipv4.ip_forward", value: "1" }
- { key: "net.ipv4.conf.all.forwarding", value: "1" }
- name: 开放 Docker 所需端口
ansible.posix.firewalld:
port: "{{ item }}"
permanent: yes
state: enabled
immediate: yes
loop:
- 2375/tcp
- 2376/tcp
- 2377/tcp
- 7946/tcp
- 7946/udp
- 4789/udp
when: ansible_facts.services['firewalld.service'] is defined and ansible_facts.services['firewalld.service'].state == 'running'
ignore_errors: yes
- name: 重载 systemd 配置
ansible.builtin.systemd:
daemon_reload: yes
- name: 启动 Docker 并设置开机自启
ansible.builtin.systemd:
name: docker
state: started
enabled: yes
- name: 等待 Docker Socket 就绪
ansible.builtin.wait_for:
path: /var/run/docker.sock
state: present
timeout: 30
delay: 2
- name: 将 root 用户加入 docker 组
ansible.builtin.user:
name: root
groups: docker
append: yes
- name: 将 ansible 用户也加入 docker 组
ansible.builtin.user:
name: "{{ ansible_user }}"
groups: docker
append: yes
when: ansible_user != 'root'
- name: 检查 Docker 客户端版本
ansible.builtin.command: docker --version
register: docker_version_output
changed_when: false
- name: 检查 Docker 服务端详细信息
ansible.builtin.shell: |
docker info 2>/dev/null | grep -E "Server Version|Storage Driver|Registry Mirrors" | sed 's/^ *//'
register: docker_info_output
changed_when: false
- name: 运行 hello-world 测试容器
ansible.builtin.command:
cmd: timeout 60 docker run --rm hello-world
register: hello_world_output
changed_when: false
ignore_errors: yes
- name: 输出部署结果摘要
ansible.builtin.debug:
msg:
- "🎉 Docker 部署完成!"
- "📌 客户端版本: {{ docker_version_output.stdout }}"
- "📌 服务端信息:"
- "{{ docker_info_output.stdout_lines }}"
- "📌 测试容器: {{ '✅ 成功' if hello_world_output.rc == 0 else '⚠️ 失败(可能网络问题,可稍后重试)' }}"
- "📌 镜像加速器: {{ docker_registry_mirrors | join(', ') }}"
PLAYBOOK_EOF
8.2 正式执行
bash
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml
PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]
TASK [添加 Docker 官方 GPG 密钥(从本地文件)] ****************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]
TASK [配置阿里云 Docker CE 仓库] ******************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]
TASK [清理并重建 DNF 缓存] ************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [安装 Docker CE 及相关组件] ******************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]
TASK [创建 Docker 配置目录] ***********************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ERROR! The requested handler 'restart docker' was not found in either the main handlers list nor in the listening handlers list
# 这个错误是因为 Playbook 中使用了 notify: restart docker,但没有定义对应的 handlers。这个功能我们目前可以安全移除,不影响部署结果。
[root@ansible-master ansible-docker-deploy]# sed -i '/notify: restart docker/d' ~/ansible-docker-deploy/playbooks/deploy-docker.yml
# 再次执行
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --start-at-task="生成 Docker daemon.json 配置文件"
PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [加载 overlay 和 br_netfilter 内核模块] ******************************************************************************************************************************************
changed: [docker-node-01] => (item=overlay)
changed: [docker-node-02] => (item=overlay)
changed: [docker-node-01] => (item=br_netfilter)
changed: [docker-node-02] => (item=br_netfilter)
TASK [持久化内核模块配置] *************************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]
TASK [配置 sysctl 网络参数] ***********************************************************************************************************************************************************
changed: [docker-node-02] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
changed: [docker-node-02] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
ok: [docker-node-02] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
changed: [docker-node-02] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})
TASK [开放 Docker 所需端口] ***********************************************************************************************************************************************************
skipping: [docker-node-01] => (item=2375/tcp)
skipping: [docker-node-01] => (item=2376/tcp)
skipping: [docker-node-01] => (item=2377/tcp)
skipping: [docker-node-01] => (item=7946/tcp)
skipping: [docker-node-01] => (item=7946/udp)
skipping: [docker-node-02] => (item=2375/tcp)
skipping: [docker-node-01] => (item=4789/udp)
skipping: [docker-node-01]
skipping: [docker-node-02] => (item=2376/tcp)
skipping: [docker-node-02] => (item=2377/tcp)
skipping: [docker-node-02] => (item=7946/tcp)
skipping: [docker-node-02] => (item=7946/udp)
skipping: [docker-node-02] => (item=4789/udp)
skipping: [docker-node-02]
TASK [重载 systemd 配置] **************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [启动 Docker 并设置开机自启] *****************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]
TASK [等待 Docker Socket 就绪] ********************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]
TASK [将 root 用户加入 docker 组] *****************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]
TASK [将 ansible 用户也加入 docker 组] ************************************************************************************************************************************************
skipping: [docker-node-01]
skipping: [docker-node-02]
TASK [检查 Docker 客户端版本] *********************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [检查 Docker 服务端详细信息] *****************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]
TASK [运行 hello-world 测试容器] ******************************************************************************************************************************************************
fatal: [docker-node-01]: FAILED! => changed=false
cmd:
- timeout
- '60'
- docker
- run
- --rm
- hello-world
delta: '0:00:15.903577'
end: '2026-05-05 16:26:35.113201'
msg: non-zero return code
rc: 125
start: '2026-05-05 16:26:19.209624'
stderr: |-
Unable to find image 'hello-world:latest' locally
docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Run 'docker run --help' for more information
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
...ignoring
fatal: [docker-node-02]: FAILED! => changed=false
cmd:
- timeout
- '60'
- docker
- run
- --rm
- hello-world
delta: '0:00:15.940468'
end: '2026-05-05 16:26:35.170722'
msg: non-zero return code
rc: 125
start: '2026-05-05 16:26:19.230254'
stderr: |-
Unable to find image 'hello-world:latest' locally
docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Run 'docker run --help' for more information
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
...ignoring
TASK [输出部署结果摘要] ***************************************************************************************************************************************************************
ok: [docker-node-01] =>
msg:
- "\U0001F389 Docker 部署完成!"
- "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
- "\U0001F4CC 服务端信息:"
- - 'Server Version: 29.4.2'
- 'Storage Driver: overlay2'
- 'Registry Mirrors:'
- "\U0001F4CC 测试容器: ⚠️ 失败(可能网络问题,可稍后重试)"
- "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"
ok: [docker-node-02] =>
msg:
- "\U0001F389 Docker 部署完成!"
- "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
- "\U0001F4CC 服务端信息:"
- - 'Server Version: 29.4.2'
- 'Storage Driver: overlay2'
- 'Registry Mirrors:'
- "\U0001F4CC 测试容器: ⚠️ 失败(可能网络问题,可稍后重试)"
- "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"
PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01 : ok=13 changed=5 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
docker-node-02 : ok=13 changed=5 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
Docker 已经部署成功了! 🎉
虽然最后 hello-world 测试因为网络问题没拉下来,但核心组件已经完全正常:
- ✅ Docker 29.4.2 已安装并运行
- ✅ overlay2 存储驱动正常
- ✅ 镜像加速器 已配置(腾讯云 + 阿里云)
- ✅ 两台机器状态都是
failed=0,没有硬错误
8.3 针对单台节点先行验证
如果环境中有多台目标节点,建议先在一台上验证:
bash
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --limit docker-node-01
PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]
TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
changed: [docker-node-01]
TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
ok: [docker-node-01]
TASK [添加 Docker 官方 GPG 密钥(从本地文件)] ****************************************************************************************************************************************
ok: [docker-node-01]
TASK [配置阿里云 Docker CE 仓库] ******************************************************************************************************************************************************
ok: [docker-node-01]
TASK [清理并重建 DNF 缓存] ************************************************************************************************************************************************************
ok: [docker-node-01]
TASK [安装 Docker CE 及相关组件] ******************************************************************************************************************************************************
changed: [docker-node-01]
TASK [创建 Docker 配置目录] ***********************************************************************************************************************************************************
ok: [docker-node-01]
TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ok: [docker-node-01]
TASK [加载 overlay 和 br_netfilter 内核模块] ******************************************************************************************************************************************
ok: [docker-node-01] => (item=overlay)
ok: [docker-node-01] => (item=br_netfilter)
TASK [持久化内核模块配置] *************************************************************************************************************************************************************
ok: [docker-node-01]
TASK [配置 sysctl 网络参数] ***********************************************************************************************************************************************************
ok: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})
TASK [开放 Docker 所需端口] ***********************************************************************************************************************************************************
skipping: [docker-node-01] => (item=2375/tcp)
skipping: [docker-node-01] => (item=2376/tcp)
skipping: [docker-node-01] => (item=2377/tcp)
skipping: [docker-node-01] => (item=7946/tcp)
skipping: [docker-node-01] => (item=7946/udp)
skipping: [docker-node-01] => (item=4789/udp)
skipping: [docker-node-01]
TASK [重载 systemd 配置] **************************************************************************************************************************************************************
ok: [docker-node-01]
TASK [启动 Docker 并设置开机自启] *****************************************************************************************************************************************************
changed: [docker-node-01]
TASK [等待 Docker Socket 就绪] ********************************************************************************************************************************************************
ok: [docker-node-01]
TASK [将 root 用户加入 docker 组] *****************************************************************************************************************************************************
ok: [docker-node-01]
TASK [将 ansible 用户也加入 docker 组] ************************************************************************************************************************************************
skipping: [docker-node-01]
TASK [检查 Docker 客户端版本] *********************************************************************************************************************************************************
ok: [docker-node-01]
TASK [检查 Docker 服务端详细信息] *****************************************************************************************************************************************************
ok: [docker-node-01]
TASK [运行 hello-world 测试容器] ******************************************************************************************************************************************************
fatal: [docker-node-01]: FAILED! => changed=false
cmd:
- timeout
- '60'
- docker
- run
- --rm
- hello-world
delta: '0:00:15.870846'
end: '2026-05-05 16:30:31.808594'
msg: non-zero return code
rc: 125
start: '2026-05-05 16:30:15.937748'
stderr: |-
Unable to find image 'hello-world:latest' locally
docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded
Run 'docker run --help' for more information
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
...ignoring
TASK [输出部署结果摘要] ***************************************************************************************************************************************************************
ok: [docker-node-01] =>
msg:
- "\U0001F389 Docker 部署完成!"
- "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
- "\U0001F4CC 服务端信息:"
- - 'Server Version: 29.4.2'
- 'Storage Driver: overlay2'
- 'Registry Mirrors:'
- "\U0001F4CC 测试容器: ⚠️ 失败(可能网络问题,可稍后重试)"
- "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"
PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01 : ok=20 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=1
8.4 常用执行参数
| 参数 | 作用 |
|---|---|
--check |
Dry run,不实际执行变更 |
--diff |
显示文件变更内容 |
--limit <host> |
限定目标主机 |
-v / -vv / -vvv |
调整输出详细程度 |
-e "variable=value" |
传递额外变量 |
8.5 配置国内加速镜像源
bash
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": [
"https://docker.m.daocloud.io",
"https://ccr.ccs.tencentyun.com",
"https://docker.1ms.run",
"https://hub.xdark.top",
"https://dhub.kubesre.xyz",
"https://docker.kejilion.pro",
"https://docker.xuanyuan.me",
"https://docker.hlmirror.com",
"https://run-docker.cn",
"https://docker.sunzishaokao.com",
"https://image.cloudlayer.icu"
]
}
EOF
# 重新加载配置
systemctl daemon-reload
# 重启docker
systemctl restart docker
测试拉取镜像
bash
[root@docker-node-01 ~]# docker pull mysql:8.4.8
8.4.8: Pulling from library/mysql
bb5107df7baa: Pull complete
de0e913f3fda: Pull complete
20e9fc37ca7f: Pull complete
eea3d1b96bc6: Pull complete
4c72b297542d: Pull complete
c02b00ae5ee3: Pull complete
5d786b1680fa: Pull complete
120c8f895019: Pull complete
a955715b8cd4: Pull complete
57ccccafb277: Pull complete
Digest: sha256:2952e3be7807f06fc18de50b3ea1a632d5c70d63482ff7d7376fe3aa8999babf
Status: Downloaded newer image for mysql:8.4.8
docker.io/library/mysql:8.4.8
[root@docker-node-01 ~]# docker images
i Info → U In Use
IMAGE ID DISK USAGE CONTENT SIZE EXTRA
mysql:8.4.8 7791889374e7 790MB 0B
9. 验证部署结果
9.1 在目标节点上手动验证
SSH 登录目标节点:
bash
# 检查 Docker 版本
docker --version
# 检查 Docker 服务状态
sudo systemctl status docker
# 查看 Docker 详细信息
docker info
# 运行测试容器
docker run --rm hello-world
期望 hello-world 输出包含:
Hello from Docker!
This message shows that your installation appears to be working correctly.
9.2 使用 Ansible Ad-Hoc 批量验证
bash
# 检查所有节点 Docker 版本
ansible docker -m command -a "docker --version"
# 检查所有节点 Docker 服务状态
ansible docker -m command -a "systemctl is-active docker"
# 检查所有节点 Docker 信息(存储驱动、Cgroup 驱动等)
ansible docker -m command -a "docker info --format '{{ '{{' }}.ServerVersion{{ '}}' }}'"
10. 部署后配置(可选)
10.1 配置 Docker Compose 命令别名
bash
# 检查 docker compose 插件是否已安装
ansible docker -m command -a "docker compose version"
Docker Compose V2 已作为插件安装,可直接使用 docker compose(注意中间是空格,不是连字符)。
10.2 配置日志轮转(Playbook 已包含)
Playbook 的 daemon.json 中已配置 JSON File 日志驱动,参数如下:
max-size: 100m--- 单个日志文件最大 100MBmax-file: 3--- 最多保留 3 个日志文件
可按需调整这两个参数。
10.3 配置 Docker Swarm(如需集群)
bash
# 在管理节点初始化 Swarm
ansible docker-node-01 -m command -a "docker swarm init --advertise-addr 192.168.10.101"
# 获取 Worker 加入 Token
# docker swarm join-token worker
# 在其他节点加入 Swarm
# ansible docker-node-02 -m command -a "docker swarm join --token <TOKEN> 192.168.10.101:2377"
11. 常见问题排查
11.1 GPG 密钥验证失败
现象:下载 Docker 仓库元数据时报 GPG 错误。
bash
# 手动导入 GPG 密钥
sudo rpm --import https://download.docker.com/linux/rhel/gpg
11.2 RHEL 10 仓库兼容性问题
现象 :RHEL 10 的 $releasever 变量导致 Docker CE 仓库 404。
Docker 官方尚未发布 RHEL 10 专用仓库。当前 Playbook 中已使用 RHEL 9 的仓库路径作为兼容方案。如果遇到包依赖问题,可尝试使用 CentOS Stream 10 的仓库:
bash
# 备选:添加 Docker 官方 CentOS 仓库
sudo dnf config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
11.3 Podman 冲突
RHEL 10 默认预装 Podman。Playbook 第一步已卸载 Podman。如果仍存在冲突:
bash
sudo dnf remove -y podman podman-docker
11.4 内核模块加载失败
现象 :modprobe overlay 失败。
bash
# 检查内核是否支持 overlay
lsmod | grep overlay
# 如果不存在,升级内核
sudo dnf update -y kernel
sudo reboot
11.5 firewalld 规则未生效
现象:容器间网络不通。
bash
# 检查 firewalld 是否在运行
sudo systemctl status firewalld
# 手动开放端口
sudo firewall-cmd --permanent --add-port=2375-2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reload
# 或直接关闭 firewalld(仅测试环境)
sudo systemctl stop firewalld && sudo systemctl disable firewalld
11.6 cgroup v2 兼容性
RHEL 10 默认使用 cgroup v2。确认 Docker 使用 systemd 作为 cgroup 驱动:
bash
docker info | grep -i cgroup
# 期望: Cgroup Driver: systemd
Playbook 的 daemon.json 已配置 "native.cgroupdriver=systemd"。
12. 卸载 Docker
如需彻底移除 Docker:
bash
# 使用 Ansible Ad-Hoc 批量卸载
ansible docker -m dnf -a "name=docker-ce,docker-ce-cli,containerd.io,docker-buildx-plugin,docker-compose-plugin state=absent autoremove=yes" -b
# 清理数据目录(谨慎操作)
ansible docker -m file -a "path=/var/lib/docker state=absent" -b
ansible docker -m file -a "path=/etc/docker state=absent" -b