RedHat10-Ansible部署Docker操作

1. 环境规划

1.1 节点信息

角色 主机名 IP 地址 操作系统 用途
Ansible 控制节点 ansible-master 192.168.194.15 RHEL 10.1 安装 Ansible,执行 Playbook
Docker 目标节点 docker-node-01 192.168.194.16 RHEL 10.1 安装 Docker CE
Docker 目标节点 docker-node-02 192.168.194.17 RHEL 10.1 安装 Docker CE

1.2 前置条件

  • 所有节点已安装 RHEL 10.1 最小化安装(Minimal Install)
  • 所有节点已配置静态 IP 及主机名
  • 控制节点可 SSH 免密登录所有目标节点

2. 控制节点环境准备

2.1 配置主机名与 hosts 解析

在控制节点上执行:

bash 复制代码
# 设置主机名以及IP
sudo hostnamectl set-hostname ansible-master
sudo hostnamectl set-hostname docker-node-01
sudo hostnamectl set-hostname docker-node-02
nmcli c m ens160 ipv4.addresses 192.168.194.15/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c m ens160 ipv4.addresses 192.168.194.16/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c m ens160 ipv4.addresses 192.168.194.17/24 ipv4.gateway 192.168.194.2 ipv4.dns 223.5.5.5 ipv4.method manual connection.autoconnect yes
nmcli c up ens160


# 编辑 /etc/hosts,添加所有节点解析
sudo tee -a /etc/hosts <<EOF
192.168.194.15  ansible-master
192.168.194.16 docker-node-01
192.168.194.17 docker-node-02
EOF

2.2 在 ansible 主机上生成密钥对

bash 复制代码
[root@ansible .ssh]# ssh-keygen -t rsa

然后四个回车,生成成功后分别执行如下命令:

bash 复制代码
[root@ansible-master ~]# ssh-copy-id docker-node-01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'docker-node-01 (192.168.194.16)' can't be established.
ED25519 key fingerprint is SHA256:cOcQwno9t4p5xBrZ5NwA0FVKmSFGPea8xF6/Xlgl8CQ.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@docker-node-01's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'docker-node-01'"
and check to make sure that only the key(s) you wanted were added.

[root@ansible-master ~]# ssh-copy-id docker-node-02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'docker-node-02 (192.168.194.17)' can't be established.
ED25519 key fingerprint is SHA256:cOcQwno9t4p5xBrZ5NwA0FVKmSFGPea8xF6/Xlgl8CQ.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:1: docker-node-01
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@docker-node-02's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'docker-node-02'"
and check to make sure that only the key(s) you wanted were added.

3. 安装 Ansible

只需要在主节点安装即可。

3.1 方法一:通过 EPEL 仓库安装(推荐)

RHEL 10 需要安装 EPEL 10 仓库:

bash 复制代码
# 安装 EPEL 仓库(使用 dnf5,RHEL 10 默认包管理器),使用国内阿里云来安装
dnf install https://mirrors.aliyun.com/epel/10/Everything/x86_64/Packages/e/epel-release-10-7.el10_1.noarch.rpm -y

# 安装 Ansible
dnf install -y ansible-core ansible-collection-ansible-posix

3.2 方法二:通过 pip 安装(获取最新版)

bash 复制代码
sudo dnf install -y python3-pip
pip3 install --user ansible

验证安装:

bash 复制代码
ansible --version
# 期望输出包含:
# ansible [core x.x.x]
# python version = 3.12.x

4. 创建 Ansible 项目结构

bash 复制代码
mkdir -p ~/ansible-docker-deploy/{inventory,playbooks,roles}
cd ~/ansible-docker-deploy

最终目录结构:

复制代码
ansible-docker-deploy/
├── ansible.cfg              # Ansible 全局配置
├── inventory/
│   └── hosts                # 主机清单
├── playbooks/
│   └── deploy-docker.yml    # 主 Playbook
└── roles/                   # (可选)Role 目录

5. 配置 Ansible

5.1 编写 ansible.cfg

在项目根目录创建 ansible.cfg

bash 复制代码
cat > ~/ansible-docker-deploy/ansible.cfg <<EOF
[defaults]
inventory      = ./inventory/hosts
host_key_checking = False
remote_user    = root
become         = True
become_user    = root
become_method  = sudo
gathering      = smart
retry_files_enabled = False
stdout_callback = yaml
EOF

5.2 编写主机清单

bash 复制代码
cat > ~/ansible-docker-deploy/inventory/hosts <<EOF
[docker]
docker-node-01 ansible_host=192.168.194.16
docker-node-02 ansible_host=192.168.194.17

[docker:vars]
ansible_user=root
ansible_become=yes
EOF

5.3 验证连通性

bash 复制代码
cd ~/ansible-docker-deploy
ansible all -m ping

期望输出:

yaml 复制代码
docker-node-01 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
docker-node-02 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

6. 编写 Docker 部署 Playbook

6.1 完整 Playbook

创建 playbooks/deploy-docker.yml

bash 复制代码
sudo tee ~/ansible-docker-deploy/playbooks/deploy-docker.yml > /dev/null <<'EOF'
---
- name: 通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE
  hosts: docker
  gather_facts: yes
  become: yes

  vars:
    docker_version: "latest"
    docker_data_root: "/var/lib/docker"
    docker_log_max_size: "100m"
    docker_log_max_file: "3"
    docker_registry_mirrors:
      - "https://mirror.ccs.tencentyun.com"
      - "https://registry.cn-hangzhou.aliyuncs.com"

  tasks:
    - name: 卸载旧版本 Docker 及相关包
      ansible.builtin.dnf:
        name:
          - docker
          - docker-client
          - docker-client-latest
          - docker-common
          - docker-latest
          - docker-latest-logrotate
          - docker-logrotate
          - docker-engine
          - podman
          - runc
          - containerd
          - containerd.io
        state: absent
        autoremove: yes
      ignore_errors: yes

    - name: 安装必要系统依赖
      ansible.builtin.dnf:
        name:
          - dnf-plugins-core
          - device-mapper-persistent-data
          - lvm2
          - curl
          - ca-certificates
          - tar
          - iptables
          - net-tools
          - socat
        state: present

    - name: 添加 Docker 官方 GPG 密钥
      ansible.builtin.rpm_key:
        key: https://download.docker.com/linux/rhel/gpg
        state: present

    - name: 配置阿里云 Docker CE 仓库
      ansible.builtin.yum_repository:
        name: docker-ce-stable-aliyun
        description: Docker CE Stable - Aliyun Mirror for RHEL 10
        baseurl: https://mirrors.aliyun.com/docker-ce/linux/centos/10/$basearch/stable
        enabled: yes
        gpgcheck: yes
        gpgkey: https://download.docker.com/linux/rhel/gpg
        module_hotfixes: true

    - name: 清理并重建 DNF 缓存
      shell: |
        dnf clean all
        dnf makecache --timer
      args:
        executable: /bin/bash
      changed_when: false

    - name: 安装 Docker CE 及相关组件
      ansible.builtin.dnf:
        name:
          - docker-ce
          - docker-ce-cli
          - containerd.io
          - docker-buildx-plugin
          - docker-compose-plugin
        state: latest
        allow_downgrade: yes
      register: docker_install_result
      retries: 5
      delay: 10
      until: docker_install_result is succeeded

    - name: 创建 Docker 配置目录
      ansible.builtin.file:
        path: /etc/docker
        state: directory
        mode: "0755"

    - name: 生成 Docker daemon.json 配置文件
      ansible.builtin.copy:
        content: |
          {
            "data-root": "{{ docker_data_root }}",
            "log-driver": "json-file",
            "log-opts": {
              "max-size": "{{ docker_log_max_size }}",
              "max-file": "{{ docker_log_max_file }}"
            },
            "registry-mirrors": {{ docker_registry_mirrors | to_json }},
            "exec-opts": ["native.cgroupdriver=systemd"],
            "storage-driver": "overlay2",
            "live-restore": true,
            "iptables": true,
            "ip-forward": true,
            "max-concurrent-downloads": 10,
            "max-concurrent-uploads": 5
          }
        dest: /etc/docker/daemon.json
        owner: root
        group: root
        mode: "0644"

    - name: 加载 overlay 和 br_netfilter 内核模块
      community.general.modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: 持久化内核模块配置
      ansible.builtin.copy:
        content: |
          overlay
          br_netfilter
        dest: /etc/modules-load.d/docker.conf
        owner: root
        group: root
        mode: "0644"

    - name: 配置 sysctl 网络参数
      ansible.posix.sysctl:
        name: "{{ item.key }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
        sysctl_set: yes
      loop:
        - { key: "net.bridge.bridge-nf-call-iptables", value: "1" }
        - { key: "net.bridge.bridge-nf-call-ip6tables", value: "1" }
        - { key: "net.ipv4.ip_forward", value: "1" }
        - { key: "net.ipv4.conf.all.forwarding", value: "1" }

    - name: 开放 Docker 所需端口
      ansible.posix.firewalld:
        port: "{{ item }}"
        permanent: yes
        state: enabled
        immediate: yes
      loop:
        - 2375/tcp
        - 2376/tcp
        - 2377/tcp
        - 7946/tcp
        - 7946/udp
        - 4789/udp
      when: ansible_facts.services['firewalld.service'] is defined and ansible_facts.services['firewalld.service'].state == 'running'
      ignore_errors: yes

    - name: 重载 systemd 配置
      ansible.builtin.systemd:
        daemon_reload: yes

    - name: 启动 Docker 并设置开机自启
      ansible.builtin.systemd:
        name: docker
        state: started
        enabled: yes

    - name: 等待 Docker Socket 就绪
      ansible.builtin.wait_for:
        path: /var/run/docker.sock
        state: present
        timeout: 30
        delay: 2

    - name: 将 root 用户加入 docker 组
      ansible.builtin.user:
        name: root
        groups: docker
        append: yes

    - name: 将 ansible 用户也加入 docker 组
      ansible.builtin.user:
        name: "{{ ansible_user }}"
        groups: docker
        append: yes
      when: ansible_user != 'root'

    - name: 检查 Docker 客户端版本
      ansible.builtin.command: docker --version
      register: docker_version_output
      changed_when: false

    - name: 检查 Docker 服务端详细信息
      ansible.builtin.shell: |
        docker info 2>/dev/null | grep -E "Server Version|Storage Driver|Registry Mirrors" | sed 's/^ *//'
      register: docker_info_output
      changed_when: false

    - name: 运行 hello-world 测试容器
      ansible.builtin.command:
        cmd: timeout 60 docker run --rm hello-world
      register: hello_world_output
      changed_when: false
      ignore_errors: yes

    - name: 输出部署结果摘要
      ansible.builtin.debug:
        msg:
          - "🎉 Docker 部署完成!"
          - "📌 客户端版本: {{ docker_version_output.stdout }}"
          - "📌 服务端信息:"
          - "{{ docker_info_output.stdout_lines }}"
          - "📌 测试容器: {{ '✅ 成功' if hello_world_output.rc == 0 else '⚠️  失败(可能网络问题,可稍后重试)' }}"
          - "📌 镜像加速器: {{ docker_registry_mirrors | join(', ') }}"
EOF

6.2 Playbook 执行步骤概要

步骤 操作 说明
第一步 卸载旧版本 清理 Docker/Podman 残留
第二步 安装依赖 dnf-plugins-core、lvm2 等
第三步 添加仓库 Docker CE YUM 源 + GPG 密钥
第四步 安装 Docker docker-ce + cli + compose 插件
第五步 配置 daemon.json 日志、存储、镜像加速
第六步 内核参数 overlay + bridge 转发
第七步 防火墙 开放 Docker/Swarm 端口
第八步 启动服务 启动 + 开机自启
第九步 用户权限 ansible 用户加入 docker 组
第十步 验证 版本检查 + hello-world 测试

7. 安装依赖 Collection

执行 Playbook 前,需要安装依赖的 Ansible Collection:

bash 复制代码
ansible-galaxy collection install ansible.posix community.general

验证已安装的 Collection:

bash 复制代码
ansible-galaxy collection list | grep -E "ansible.posix|community.general"

8. 执行部署

8.1 语法检查(Dry Run)

先在检查模式下运行,确认 Playbook 没有语法错误且变更符合预期:

bash 复制代码
cd ~/ansible-docker-deploy
ansible-playbook playbooks/deploy-docker.yml --check

# 报错
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --check
[WARNING]: Collection community.general does not support Ansible version 2.16.14
ERROR! [DEPRECATED]: community.general.yaml has been removed. The plugin has been superseded by the option `result_format=yaml` in callback plugin ansible.builtin.default from ansible-core 2.13 onwards. This feature was removed from community.general in version 12.0.0. Please update your playbooks.
## 是环境配置的小冲突,community.general版本太新了
## 降低版本
[root@ansible-master ansible-docker-deploy]# ansible-galaxy collection install community.general:9.5.0
Starting galaxy collection install process
[WARNING]: Collection community.general does not support Ansible version 2.16.14
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/api/v3/plugin/ansible/content/published/collections/artifacts/community-general-9.5.0.tar.gz to /root/.ansible/tmp/ansible-local-3446c0kh8jpz/tmpn8z_v25q/community-general-9.5.0-5gk99d9f
Installing 'community.general:9.5.0' to '/root/.ansible/collections/ansible_collections/community/general'
community.general:9.5.0 was installed successfully

# 再次验证
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --check

PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]

TASK [添加 Docker 官方 GPG 密钥] ******************************************************************************************************************************************************
fatal: [docker-node-02]: FAILED! => changed=false
  msg: 'failed to fetch key at https://download.docker.com/linux/rhel/gpg , error was: Request failed: <urlopen error [Errno 104] Connection reset by peer>'
fatal: [docker-node-01]: FAILED! => changed=false
  msg: 'failed to fetch key at https://download.docker.com/linux/rhel/gpg , error was: Request failed: <urlopen error [Errno 104] Connection reset by peer>'

PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01             : ok=3    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
docker-node-02             : ok=3    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

# 从阿里云下载 GPG 密钥(推荐)
[root@ansible-master ansible-docker-deploy]# curl -o /tmp/docker-gpg https://mirrors.aliyun.com/docker-ce/linux/rhel/gpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1627  100  1627    0     0  18619      0 --:--:-- --:--:-- --:--:-- 18701

# 验证下载成功
[root@ansible-master ansible-docker-deploy]# file /tmp/docker-gpg
# 应该输出: /tmp/docker-gpg: PGP public key block Public-Key (old)
/tmp/docker-gpg: PGP public key block Public-Key (old)
[root@ansible-master ansible-docker-deploy]#

# 把原来"在线添加密钥"的任务,改成从文件读取,这样就不依赖目标主机的网络了。
# 执行下面这条命令,它会自动修改你的 Playbook:
[root@ansible-master ansible-docker-deploy]# sed -i '/ansible.builtin.rpm_key:/,/state: present/ {
  s|key: https://download.docker.com/linux/rhel/gpg|key: /tmp/docker-gpg|
  /state: present/a\    file: /tmp/docker-gpg
}' ~/ansible-docker-deploy/playbooks/deploy-docker.yml

# 然后用这条命令,把下载好的密钥文件分发到所有目标主机的 /tmp 目录:
[root@ansible-master ansible-docker-deploy]# ansible docker -m copy -a "src=/tmp/docker-gpg dest=/tmp/docker-gpg" --become
docker-node-01 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "checksum": "f271af33d2a950ca31782debd8fabe5fee77b176",
    "dest": "/tmp/docker-gpg",
    "gid": 0,
    "group": "root",
    "md5sum": "72ccb88a3e48418a07d01f7f9aeb45b1",
    "mode": "0644",
    "owner": "root",
    "secontext": "unconfined_u:object_r:admin_home_t:s0",
    "size": 1627,
    "src": "/root/.ansible/tmp/ansible-tmp-1777969259.351255-3765-95521442440540/source",
    "state": "file",
    "uid": 0
}
docker-node-02 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "checksum": "f271af33d2a950ca31782debd8fabe5fee77b176",
    "dest": "/tmp/docker-gpg",
    "gid": 0,
    "group": "root",
    "md5sum": "72ccb88a3e48418a07d01f7f9aeb45b1",
    "mode": "0644",
    "owner": "root",
    "secontext": "unconfined_u:object_r:admin_home_t:s0",
    "size": 1627,
    "src": "/root/.ansible/tmp/ansible-tmp-1777969259.3707564-3766-221830823470738/source",
    "state": "file",
    "uid": 0
}

# 使用sed命令会破坏原有yaml文件的格式,所以重新覆盖源文件
cat > ~/ansible-docker-deploy/playbooks/deploy-docker.yml <<'PLAYBOOK_EOF'
---
- name: 通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE
  hosts: docker
  gather_facts: yes
  become: yes

  vars:
    docker_version: "latest"
    docker_data_root: "/var/lib/docker"
    docker_log_max_size: "100m"
    docker_log_max_file: "3"
    docker_registry_mirrors:
      - "https://mirror.ccs.tencentyun.com"
      - "https://registry.cn-hangzhou.aliyuncs.com"

  tasks:
    - name: 卸载旧版本 Docker 及相关包
      ansible.builtin.dnf:
        name:
          - docker
          - docker-client
          - docker-client-latest
          - docker-common
          - docker-latest
          - docker-latest-logrotate
          - docker-logrotate
          - docker-engine
          - podman
          - runc
          - containerd
          - containerd.io
        state: absent
        autoremove: yes
      ignore_errors: yes

    - name: 安装必要系统依赖
      ansible.builtin.dnf:
        name:
          - dnf-plugins-core
          - device-mapper-persistent-data
          - lvm2
          - curl
          - ca-certificates
          - tar
          - iptables
          - net-tools
          - socat
        state: present

    - name: 添加 Docker 官方 GPG 密钥(从本地文件)
      ansible.builtin.rpm_key:
        key: /tmp/docker-gpg
        state: present

    - name: 配置阿里云 Docker CE 仓库
      ansible.builtin.yum_repository:
        name: docker-ce-stable-aliyun
        description: Docker CE Stable - Aliyun Mirror for RHEL 10
        baseurl: https://mirrors.aliyun.com/docker-ce/linux/centos/10/$basearch/stable
        enabled: yes
        gpgcheck: yes
        gpgkey: file:///tmp/docker-gpg
        module_hotfixes: true

    - name: 清理并重建 DNF 缓存
      shell: |
        dnf clean all
        dnf makecache --timer
      args:
        executable: /bin/bash
      changed_when: false

    - name: 安装 Docker CE 及相关组件
      ansible.builtin.dnf:
        name:
          - docker-ce
          - docker-ce-cli
          - containerd.io
          - docker-buildx-plugin
          - docker-compose-plugin
        state: latest
        allow_downgrade: yes
      register: docker_install_result
      retries: 5
      delay: 10
      until: docker_install_result is succeeded

    - name: 创建 Docker 配置目录
      ansible.builtin.file:
        path: /etc/docker
        state: directory
        mode: "0755"

    - name: 生成 Docker daemon.json 配置文件
      ansible.builtin.copy:
        content: |
          {
            "data-root": "{{ docker_data_root }}",
            "log-driver": "json-file",
            "log-opts": {
              "max-size": "{{ docker_log_max_size }}",
              "max-file": "{{ docker_log_max_file }}"
            },
            "registry-mirrors": {{ docker_registry_mirrors | to_json }},
            "exec-opts": ["native.cgroupdriver=systemd"],
            "storage-driver": "overlay2",
            "live-restore": true,
            "iptables": true,
            "ip-forward": true,
            "max-concurrent-downloads": 10,
            "max-concurrent-uploads": 5
          }
        dest: /etc/docker/daemon.json
        owner: root
        group: root
        mode: "0644"
      notify: restart docker

    - name: 加载 overlay 和 br_netfilter 内核模块
      community.general.modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: 持久化内核模块配置
      ansible.builtin.copy:
        content: |
          overlay
          br_netfilter
        dest: /etc/modules-load.d/docker.conf
        owner: root
        group: root
        mode: "0644"

    - name: 配置 sysctl 网络参数
      ansible.posix.sysctl:
        name: "{{ item.key }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
        sysctl_set: yes
      loop:
        - { key: "net.bridge.bridge-nf-call-iptables", value: "1" }
        - { key: "net.bridge.bridge-nf-call-ip6tables", value: "1" }
        - { key: "net.ipv4.ip_forward", value: "1" }
        - { key: "net.ipv4.conf.all.forwarding", value: "1" }

    - name: 开放 Docker 所需端口
      ansible.posix.firewalld:
        port: "{{ item }}"
        permanent: yes
        state: enabled
        immediate: yes
      loop:
        - 2375/tcp
        - 2376/tcp
        - 2377/tcp
        - 7946/tcp
        - 7946/udp
        - 4789/udp
      when: ansible_facts.services['firewalld.service'] is defined and ansible_facts.services['firewalld.service'].state == 'running'
      ignore_errors: yes

    - name: 重载 systemd 配置
      ansible.builtin.systemd:
        daemon_reload: yes

    - name: 启动 Docker 并设置开机自启
      ansible.builtin.systemd:
        name: docker
        state: started
        enabled: yes

    - name: 等待 Docker Socket 就绪
      ansible.builtin.wait_for:
        path: /var/run/docker.sock
        state: present
        timeout: 30
        delay: 2

    - name: 将 root 用户加入 docker 组
      ansible.builtin.user:
        name: root
        groups: docker
        append: yes

    - name: 将 ansible 用户也加入 docker 组
      ansible.builtin.user:
        name: "{{ ansible_user }}"
        groups: docker
        append: yes
      when: ansible_user != 'root'

    - name: 检查 Docker 客户端版本
      ansible.builtin.command: docker --version
      register: docker_version_output
      changed_when: false

    - name: 检查 Docker 服务端详细信息
      ansible.builtin.shell: |
        docker info 2>/dev/null | grep -E "Server Version|Storage Driver|Registry Mirrors" | sed 's/^ *//'
      register: docker_info_output
      changed_when: false

    - name: 运行 hello-world 测试容器
      ansible.builtin.command:
        cmd: timeout 60 docker run --rm hello-world
      register: hello_world_output
      changed_when: false
      ignore_errors: yes

    - name: 输出部署结果摘要
      ansible.builtin.debug:
        msg:
          - "🎉 Docker 部署完成!"
          - "📌 客户端版本: {{ docker_version_output.stdout }}"
          - "📌 服务端信息:"
          - "{{ docker_info_output.stdout_lines }}"
          - "📌 测试容器: {{ '✅ 成功' if hello_world_output.rc == 0 else '⚠️  失败(可能网络问题,可稍后重试)' }}"
          - "📌 镜像加速器: {{ docker_registry_mirrors | join(', ') }}"
PLAYBOOK_EOF

8.2 正式执行

bash 复制代码
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml

PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]

TASK [添加 Docker 官方 GPG 密钥(从本地文件)] ****************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]

TASK [配置阿里云 Docker CE 仓库] ******************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]

TASK [清理并重建 DNF 缓存] ************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [安装 Docker CE 及相关组件] ******************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]

TASK [创建 Docker 配置目录] ***********************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ERROR! The requested handler 'restart docker' was not found in either the main handlers list nor in the listening handlers list

# 这个错误是因为 Playbook 中使用了 notify: restart docker,但没有定义对应的 handlers。这个功能我们目前可以安全移除,不影响部署结果。
[root@ansible-master ansible-docker-deploy]# sed -i '/notify: restart docker/d' ~/ansible-docker-deploy/playbooks/deploy-docker.yml

# 再次执行
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --start-at-task="生成 Docker daemon.json 配置文件"

PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [加载 overlay 和 br_netfilter 内核模块] ******************************************************************************************************************************************
changed: [docker-node-01] => (item=overlay)
changed: [docker-node-02] => (item=overlay)
changed: [docker-node-01] => (item=br_netfilter)
changed: [docker-node-02] => (item=br_netfilter)

TASK [持久化内核模块配置] *************************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]

TASK [配置 sysctl 网络参数] ***********************************************************************************************************************************************************
changed: [docker-node-02] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
changed: [docker-node-02] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
ok: [docker-node-02] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
changed: [docker-node-02] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})
changed: [docker-node-01] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})

TASK [开放 Docker 所需端口] ***********************************************************************************************************************************************************
skipping: [docker-node-01] => (item=2375/tcp)
skipping: [docker-node-01] => (item=2376/tcp)
skipping: [docker-node-01] => (item=2377/tcp)
skipping: [docker-node-01] => (item=7946/tcp)
skipping: [docker-node-01] => (item=7946/udp)
skipping: [docker-node-02] => (item=2375/tcp)
skipping: [docker-node-01] => (item=4789/udp)
skipping: [docker-node-01]
skipping: [docker-node-02] => (item=2376/tcp)
skipping: [docker-node-02] => (item=2377/tcp)
skipping: [docker-node-02] => (item=7946/tcp)
skipping: [docker-node-02] => (item=7946/udp)
skipping: [docker-node-02] => (item=4789/udp)
skipping: [docker-node-02]

TASK [重载 systemd 配置] **************************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [启动 Docker 并设置开机自启] *****************************************************************************************************************************************************
changed: [docker-node-02]
changed: [docker-node-01]

TASK [等待 Docker Socket 就绪] ********************************************************************************************************************************************************
ok: [docker-node-02]
ok: [docker-node-01]

TASK [将 root 用户加入 docker 组] *****************************************************************************************************************************************************
changed: [docker-node-01]
changed: [docker-node-02]

TASK [将 ansible 用户也加入 docker 组] ************************************************************************************************************************************************
skipping: [docker-node-01]
skipping: [docker-node-02]

TASK [检查 Docker 客户端版本] *********************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [检查 Docker 服务端详细信息] *****************************************************************************************************************************************************
ok: [docker-node-01]
ok: [docker-node-02]

TASK [运行 hello-world 测试容器] ******************************************************************************************************************************************************
fatal: [docker-node-01]: FAILED! => changed=false
  cmd:
  - timeout
  - '60'
  - docker
  - run
  - --rm
  - hello-world
  delta: '0:00:15.903577'
  end: '2026-05-05 16:26:35.113201'
  msg: non-zero return code
  rc: 125
  start: '2026-05-05 16:26:19.209624'
  stderr: |-
    Unable to find image 'hello-world:latest' locally
    docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

    Run 'docker run --help' for more information
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
...ignoring
fatal: [docker-node-02]: FAILED! => changed=false
  cmd:
  - timeout
  - '60'
  - docker
  - run
  - --rm
  - hello-world
  delta: '0:00:15.940468'
  end: '2026-05-05 16:26:35.170722'
  msg: non-zero return code
  rc: 125
  start: '2026-05-05 16:26:19.230254'
  stderr: |-
    Unable to find image 'hello-world:latest' locally
    docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

    Run 'docker run --help' for more information
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
...ignoring

TASK [输出部署结果摘要] ***************************************************************************************************************************************************************
ok: [docker-node-01] =>
  msg:
  - "\U0001F389 Docker 部署完成!"
  - "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
  - "\U0001F4CC 服务端信息:"
  - - 'Server Version: 29.4.2'
    - 'Storage Driver: overlay2'
    - 'Registry Mirrors:'
  - "\U0001F4CC 测试容器: ⚠️  失败(可能网络问题,可稍后重试)"
  - "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"
ok: [docker-node-02] =>
  msg:
  - "\U0001F389 Docker 部署完成!"
  - "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
  - "\U0001F4CC 服务端信息:"
  - - 'Server Version: 29.4.2'
    - 'Storage Driver: overlay2'
    - 'Registry Mirrors:'
  - "\U0001F4CC 测试容器: ⚠️  失败(可能网络问题,可稍后重试)"
  - "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"

PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01             : ok=13   changed=5    unreachable=0    failed=0    skipped=2    rescued=0    ignored=1
docker-node-02             : ok=13   changed=5    unreachable=0    failed=0    skipped=2    rescued=0    ignored=1

Docker 已经部署成功了! 🎉

虽然最后 hello-world 测试因为网络问题没拉下来,但核心组件已经完全正常:

  • Docker 29.4.2 已安装并运行
  • overlay2 存储驱动正常
  • 镜像加速器 已配置(腾讯云 + 阿里云)
  • ✅ 两台机器状态都是 failed=0,没有硬错误

8.3 针对单台节点先行验证

如果环境中有多台目标节点,建议先在一台上验证:

bash 复制代码
[root@ansible-master ansible-docker-deploy]# ansible-playbook playbooks/deploy-docker.yml --limit docker-node-01

PLAY [通过阿里云镜像源在 Red Hat 10.1 上部署 Docker CE] *******************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [docker-node-01]

TASK [卸载旧版本 Docker 及相关包] *****************************************************************************************************************************************************
changed: [docker-node-01]

TASK [安装必要系统依赖] ***************************************************************************************************************************************************************
ok: [docker-node-01]

TASK [添加 Docker 官方 GPG 密钥(从本地文件)] ****************************************************************************************************************************************
ok: [docker-node-01]

TASK [配置阿里云 Docker CE 仓库] ******************************************************************************************************************************************************
ok: [docker-node-01]

TASK [清理并重建 DNF 缓存] ************************************************************************************************************************************************************
ok: [docker-node-01]

TASK [安装 Docker CE 及相关组件] ******************************************************************************************************************************************************
changed: [docker-node-01]

TASK [创建 Docker 配置目录] ***********************************************************************************************************************************************************
ok: [docker-node-01]

TASK [生成 Docker daemon.json 配置文件] ***********************************************************************************************************************************************
ok: [docker-node-01]

TASK [加载 overlay 和 br_netfilter 内核模块] ******************************************************************************************************************************************
ok: [docker-node-01] => (item=overlay)
ok: [docker-node-01] => (item=br_netfilter)

TASK [持久化内核模块配置] *************************************************************************************************************************************************************
ok: [docker-node-01]

TASK [配置 sysctl 网络参数] ***********************************************************************************************************************************************************
ok: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.bridge.bridge-nf-call-ip6tables', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.ip_forward', 'value': '1'})
ok: [docker-node-01] => (item={'key': 'net.ipv4.conf.all.forwarding', 'value': '1'})

TASK [开放 Docker 所需端口] ***********************************************************************************************************************************************************
skipping: [docker-node-01] => (item=2375/tcp)
skipping: [docker-node-01] => (item=2376/tcp)
skipping: [docker-node-01] => (item=2377/tcp)
skipping: [docker-node-01] => (item=7946/tcp)
skipping: [docker-node-01] => (item=7946/udp)
skipping: [docker-node-01] => (item=4789/udp)
skipping: [docker-node-01]

TASK [重载 systemd 配置] **************************************************************************************************************************************************************
ok: [docker-node-01]

TASK [启动 Docker 并设置开机自启] *****************************************************************************************************************************************************
changed: [docker-node-01]

TASK [等待 Docker Socket 就绪] ********************************************************************************************************************************************************
ok: [docker-node-01]

TASK [将 root 用户加入 docker 组] *****************************************************************************************************************************************************
ok: [docker-node-01]

TASK [将 ansible 用户也加入 docker 组] ************************************************************************************************************************************************
skipping: [docker-node-01]

TASK [检查 Docker 客户端版本] *********************************************************************************************************************************************************
ok: [docker-node-01]

TASK [检查 Docker 服务端详细信息] *****************************************************************************************************************************************************
ok: [docker-node-01]

TASK [运行 hello-world 测试容器] ******************************************************************************************************************************************************
fatal: [docker-node-01]: FAILED! => changed=false
  cmd:
  - timeout
  - '60'
  - docker
  - run
  - --rm
  - hello-world
  delta: '0:00:15.870846'
  end: '2026-05-05 16:30:31.808594'
  msg: non-zero return code
  rc: 125
  start: '2026-05-05 16:30:15.937748'
  stderr: |-
    Unable to find image 'hello-world:latest' locally
    docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded

    Run 'docker run --help' for more information
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
...ignoring

TASK [输出部署结果摘要] ***************************************************************************************************************************************************************
ok: [docker-node-01] =>
  msg:
  - "\U0001F389 Docker 部署完成!"
  - "\U0001F4CC 客户端版本: Docker version 29.4.2, build 055a478"
  - "\U0001F4CC 服务端信息:"
  - - 'Server Version: 29.4.2'
    - 'Storage Driver: overlay2'
    - 'Registry Mirrors:'
  - "\U0001F4CC 测试容器: ⚠️  失败(可能网络问题,可稍后重试)"
  - "\U0001F4CC 镜像加速器: https://mirror.ccs.tencentyun.com, https://registry.cn-hangzhou.aliyuncs.com"

PLAY RECAP ****************************************************************************************************************************************************************************
docker-node-01             : ok=20   changed=3    unreachable=0    failed=0    skipped=2    rescued=0    ignored=1

8.4 常用执行参数

参数 作用
--check Dry run,不实际执行变更
--diff 显示文件变更内容
--limit <host> 限定目标主机
-v / -vv / -vvv 调整输出详细程度
-e "variable=value" 传递额外变量

8.5 配置国内加速镜像源

bash 复制代码
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
 "registry-mirrors": [
  "https://docker.m.daocloud.io",
  "https://ccr.ccs.tencentyun.com",
  "https://docker.1ms.run",
  "https://hub.xdark.top",
  "https://dhub.kubesre.xyz",
  "https://docker.kejilion.pro",
  "https://docker.xuanyuan.me",
  "https://docker.hlmirror.com",
  "https://run-docker.cn",
  "https://docker.sunzishaokao.com",
  "https://image.cloudlayer.icu"
 ]
}
EOF
# 重新加载配置
systemctl daemon-reload
# 重启docker
systemctl restart docker

测试拉取镜像

bash 复制代码
[root@docker-node-01 ~]# docker pull mysql:8.4.8
8.4.8: Pulling from library/mysql
bb5107df7baa: Pull complete
de0e913f3fda: Pull complete
20e9fc37ca7f: Pull complete
eea3d1b96bc6: Pull complete
4c72b297542d: Pull complete
c02b00ae5ee3: Pull complete
5d786b1680fa: Pull complete
120c8f895019: Pull complete
a955715b8cd4: Pull complete
57ccccafb277: Pull complete
Digest: sha256:2952e3be7807f06fc18de50b3ea1a632d5c70d63482ff7d7376fe3aa8999babf
Status: Downloaded newer image for mysql:8.4.8
docker.io/library/mysql:8.4.8
[root@docker-node-01 ~]# docker images
                                                                                                                                                                   i Info →   U  In Use
IMAGE         ID             DISK USAGE   CONTENT SIZE   EXTRA
mysql:8.4.8   7791889374e7        790MB             0B

9. 验证部署结果

9.1 在目标节点上手动验证

SSH 登录目标节点:

bash 复制代码
# 检查 Docker 版本
docker --version

# 检查 Docker 服务状态
sudo systemctl status docker

# 查看 Docker 详细信息
docker info

# 运行测试容器
docker run --rm hello-world

期望 hello-world 输出包含:

复制代码
Hello from Docker!
This message shows that your installation appears to be working correctly.

9.2 使用 Ansible Ad-Hoc 批量验证

bash 复制代码
# 检查所有节点 Docker 版本
ansible docker -m command -a "docker --version"

# 检查所有节点 Docker 服务状态
ansible docker -m command -a "systemctl is-active docker"

# 检查所有节点 Docker 信息(存储驱动、Cgroup 驱动等)
ansible docker -m command -a "docker info --format '{{ '{{' }}.ServerVersion{{ '}}' }}'"

10. 部署后配置(可选)

10.1 配置 Docker Compose 命令别名

bash 复制代码
# 检查 docker compose 插件是否已安装
ansible docker -m command -a "docker compose version"

Docker Compose V2 已作为插件安装,可直接使用 docker compose(注意中间是空格,不是连字符)。

10.2 配置日志轮转(Playbook 已包含)

Playbook 的 daemon.json 中已配置 JSON File 日志驱动,参数如下:

  • max-size: 100m --- 单个日志文件最大 100MB
  • max-file: 3 --- 最多保留 3 个日志文件

可按需调整这两个参数。

10.3 配置 Docker Swarm(如需集群)

bash 复制代码
# 在管理节点初始化 Swarm
ansible docker-node-01 -m command -a "docker swarm init --advertise-addr 192.168.10.101"

# 获取 Worker 加入 Token
# docker swarm join-token worker

# 在其他节点加入 Swarm
# ansible docker-node-02 -m command -a "docker swarm join --token <TOKEN> 192.168.10.101:2377"

11. 常见问题排查

11.1 GPG 密钥验证失败

现象:下载 Docker 仓库元数据时报 GPG 错误。

bash 复制代码
# 手动导入 GPG 密钥
sudo rpm --import https://download.docker.com/linux/rhel/gpg

11.2 RHEL 10 仓库兼容性问题

现象 :RHEL 10 的 $releasever 变量导致 Docker CE 仓库 404。

Docker 官方尚未发布 RHEL 10 专用仓库。当前 Playbook 中已使用 RHEL 9 的仓库路径作为兼容方案。如果遇到包依赖问题,可尝试使用 CentOS Stream 10 的仓库:

bash 复制代码
# 备选:添加 Docker 官方 CentOS 仓库
sudo dnf config-manager --add-repo \
  https://download.docker.com/linux/centos/docker-ce.repo

11.3 Podman 冲突

RHEL 10 默认预装 Podman。Playbook 第一步已卸载 Podman。如果仍存在冲突:

bash 复制代码
sudo dnf remove -y podman podman-docker

11.4 内核模块加载失败

现象modprobe overlay 失败。

bash 复制代码
# 检查内核是否支持 overlay
lsmod | grep overlay

# 如果不存在,升级内核
sudo dnf update -y kernel
sudo reboot

11.5 firewalld 规则未生效

现象:容器间网络不通。

bash 复制代码
# 检查 firewalld 是否在运行
sudo systemctl status firewalld

# 手动开放端口
sudo firewall-cmd --permanent --add-port=2375-2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=7946/udp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reload

# 或直接关闭 firewalld(仅测试环境)
sudo systemctl stop firewalld && sudo systemctl disable firewalld

11.6 cgroup v2 兼容性

RHEL 10 默认使用 cgroup v2。确认 Docker 使用 systemd 作为 cgroup 驱动:

bash 复制代码
docker info | grep -i cgroup
# 期望: Cgroup Driver: systemd

Playbook 的 daemon.json 已配置 "native.cgroupdriver=systemd"


12. 卸载 Docker

如需彻底移除 Docker:

bash 复制代码
# 使用 Ansible Ad-Hoc 批量卸载
ansible docker -m dnf -a "name=docker-ce,docker-ce-cli,containerd.io,docker-buildx-plugin,docker-compose-plugin state=absent autoremove=yes" -b

# 清理数据目录(谨慎操作)
ansible docker -m file -a "path=/var/lib/docker state=absent" -b
ansible docker -m file -a "path=/etc/docker state=absent" -b
相关推荐
木雷坞2 小时前
内网模型服务启动链路分层实践
docker·容器·gpu
江湖有缘3 小时前
保姆级教程:Docker 部署 Portracker 端口监控工具
jvm·docker·容器
jinanwuhuaguo4 小时前
(第三十六篇)OpenClaw 去中心化的秩序——从“中心调度”到“网格自治”的治理革命
java·大数据·开发语言·网络·docker·去中心化·github
菜鸟4046 小时前
Hermes实战案例_NAS 上跑了个 AI 管家:从信息孤岛到飞书一句话调度
云原生·eureka
摇滚侠16 小时前
Docker 如何查询挂载的目录
运维·docker·容器
江湖有缘21 小时前
基于Ubuntu系统Docker部署Note Mark:从安装到配置全流程
linux·ubuntu·docker
呆萌的代Ma1 天前
docker内的n8n配置Code节点运行python代码
python·docker·容器
活跃的煤矿打工人1 天前
【星海出品】dify 的使用
云原生·eureka
菜鸟分享录1 天前
OpenClaw Docker一键部署(轻松实现多容器隔离)
docker·ai·openclaw·小龙虾