09-03 周二 ansible部署和节点管理过程

09-03 周二 ansible部署和节点管理过程

时间 版本 修改人 描述
2024年9月3日10:08:58 V0.1 宋全恒 新建文档,

简介

首先要找一个跳板机,来确保所有的机器都可以访问。然后我们围绕ansible来搭建环境,方便一键执行所有的命令,主要的任务是将这10个节点均挂载NAS服务器,添加我们的harbor服务器,

ansible介绍

ansible/ansible at v2.17.3是一个自动化的管理工具,可以管理多个节点,实现诸如命令执行,自动挂载,文件拷贝等命令。非常的方便管理集群的场景。

常用的模块如下所示:

10GPU信息

批量执行

bash 复制代码
for ip in $(seq 64 73); do ssh root@10.107.204.$ip "systemctl restart docker"; done

结果

经过设置,在42服务器上使用yuzailiang用户创建了conda虚拟环境,ansible,激活该环境,可实现对于GPU节点的批量操作

部署步骤

创建conda环境,安装ansible

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ cat update_harbor.yml 
---
- name: Update Docker daemon configuration and ensure valid JSON
  hosts: gpus
  become: yes
  tasks:
    - name: Install Python if not installed
      ansible.builtin.package:
        name: python3
        state: present

    - name: Ensure /etc/docker/daemon.json exists
      ansible.builtin.file:
        path: /etc/docker/daemon.json
        state: touch

    - name: Read existing daemon.json
      ansible.builtin.slurp:
        path: /etc/docker/daemon.json
      register: daemon_json_content

    - name: Decode JSON
      ansible.builtin.set_fact:
        daemon_json: "{{ daemon_json_content['content'] | b64decode | from_json }}"

    - name: Ensure insecure-registries contains the new registry
      ansible.builtin.set_fact:
        updated_daemon_json: >-
          {{
            daemon_json | combine({
              'insecure-registries': (daemon_json['insecure-registries'] | default([])) + ['10.200.88.53']
            })
          }}

    - name: Write updated daemon.json
      ansible.builtin.copy:
        dest: /etc/docker/daemon.json
        content: "{{ updated_daemon_json | to_nice_json }}"
        backup: yes
        mode: '0644'

    - name: Validate JSON syntax
      ansible.builtin.command:
        cmd: 'python3 -m json.tool /etc/docker/daemon.json'
      register: validation_result
      failed_when: validation_result.rc != 0
      ignore_errors: yes

    - name: Print validation result
      ansible.builtin.debug:
        msg: "JSON validation result: {{ validation_result.stdout }}"

    - name: Restart Docker service
      ansible.builtin.service:
        name: docker
        state: restarted

    - name: Log in to Docker registry
      ansible.builtin.command:
        cmd: docker login 10.200.88.53 --username dros_admin --password 'Dros@zjgxn&07101604'
      ignore_errors: yes

配置ansible

新建inventory节点清单

bash 复制代码
[operator]
10.107.204.64

[framework]
10.107.204.65

[model]
10.107.204.66
10.107.204.67
10.107.204.68
10.107.204.69

[compile]
10.107.204.70

[abstract]
10.107.204.71

[communication]
10.107.204.72
10.107.204.73

# New group that includes all the groups
[gpus:children]
operator
framework
model
compile
abstract
communication

我们可以进一步的为这些IP起别名,方便我们操作

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ sudo vim /etc/ansible/hosts 

10.107.204.65
  
[model]
10.107.204.66
10.107.204.67
10.107.204.68
10.107.204.69

[compile]
10.107.204.70

[hardware]
10.107.204.71

[communication]
10.107.204.72
10.107.204.73

# New group that includes all the groups
[gpus:children]
operator
framework
model
compile
hardware
communication

# Aliases for all nodes
[gpus]
gpu1 ansible_host=10.107.204.64
gpu2 ansible_host=10.107.204.65
gpu3 ansible_host=10.107.204.66
gpu4 ansible_host=10.107.204.67
gpu5 ansible_host=10.107.204.68
gpu6 ansible_host=10.107.204.69
gpu7 ansible_host=10.107.204.70
gpu8 ansible_host=10.107.204.71
gpu9 ansible_host=10.107.204.72
gpu10 ansible_host=10.107.204.73

拷贝公钥,免密配置

bash 复制代码
(ansible) yuzailiang@ubuntu:~/Shell$ bash copy_pub.sh 
正在将公钥复制到 root@10.107.204.64...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.64'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.64
正在将公钥复制到 root@10.107.204.65...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.65'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.65
正在将公钥复制到 root@10.107.204.66...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
		(if you think this is a mistake, you may want to use -f option)

成功将公钥复制到 10.107.204.66
正在将公钥复制到 root@10.107.204.67...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.67'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.67
正在将公钥复制到 root@10.107.204.68...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.68'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.68
正在将公钥复制到 root@10.107.204.69...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.69'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.69
正在将公钥复制到 root@10.107.204.70...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh -o 'StrictHostKeyChecking=no' 'root@10.107.204.70'"
and check to make sure that only the key(s) you wanted were added.

成功将公钥复制到 10.107.204.70
正在将公钥复制到 root@10.107.204.71...
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yuzailiang/.ssh/id_rsa.pub"

进一步的,可以优化这个脚本,方便复用

bash 复制代码
(ansible) yuzailiang@ubuntu:~/Shell$ cat copy_pub.sh 
#!/bin/bash

# 参数检查
if [ $# -ne 3 ]; then
  echo "使用方法: $0 <基础IP> <起始IP> <终止IP>"
  echo "示例: $0 10.107.204 72 73"
  exit 1
fi

# 获取参数
BASE_IP="$1."
START_IP=$2
END_IP=$3

# SSH用户
USER="root"

# SSH密码
PASSWORD="qsgctys@05980"

# 公钥路径
PUB_KEY_PATH="$HOME/.ssh/id_rsa.pub"

# 检查sshpass是否安装
if ! command -v sshpass &> /dev/null; then
  echo "sshpass未安装,请先安装它。"
  exit 1
fi

# 检查公钥是否存在
if [ ! -f "$PUB_KEY_PATH" ]; then
  echo "SSH公钥未找到,请生成公钥或指定正确的路径。"
  exit 1
fi

# 循环遍历IP范围并复制公钥
for i in $(seq $START_IP $END_IP); do
  FULL_IP="$BASE_IP$i"
  echo "正在将公钥复制到 $USER@$FULL_IP..."
  
  # 使用sshpass传递密码并复制公钥
  sshpass -p "$PASSWORD" ssh-copy-id -i "$PUB_KEY_PATH" -o StrictHostKeyChecking=no "$USER@$FULL_IP"
  
  if [ $? -eq 0 ]; then
    echo "成功将公钥复制到 $FULL_IP"
  else
    echo "无法连接到 $FULL_IP,跳过..."
  fi
done

echo "所有操作完成。"

配置远端用户/etc/ansible/ansible.cfg

由于在本机的用户为yuzailiang,而远端操作机器的用户为root,因此我们需要关联私钥和用户。配置

bash 复制代码
(ansible) yuzailiang@ubuntu:~/Shell$ sudo cat /etc/ansible/ansible.cfg 
[defaults]
remote_user = root
private_key_file = ~/.ssh/id_rsa
interpreter_python = auto

最后interpreter_python = auto是为了抑制警告。

因此,在使用ansible环境时,需要使用42服务器,使用yuzailiang用户登录,激活环境ansible,然后就能愉快的操作这些节点组了。

使用playbook编辑hosts

新建play-bok剧本文件

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ cat update_hosts.yml 
---
- name: Ensure /etc/hosts contains NAS entry
  hosts: gpus  # 指定目标组名
  become: yes  # 提升权限以编辑 /etc/hosts
  tasks:
    - name: Check if /etc/hosts contains NAS entry
      ansible.builtin.lineinfile:
        path: /etc/hosts
        line: "10.15.35.70 NAS"
        state: present
        backup: yes  # 可选,备份文件
      tags: hosts
bash 复制代码
(ansible) yuzailiang@ubuntu:~$ ansible-playbook  update_hosts.yml -l model

PLAY [Ensure /etc/hosts contains NAS entry] **********************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************
[WARNING]: Platform linux on host 10.107.204.67 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.67]
[WARNING]: Platform linux on host 10.107.204.69 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.69]
[WARNING]: Platform linux on host 10.107.204.68 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.68]
[WARNING]: Platform linux on host 10.107.204.66 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.66]

TASK [Check if /etc/hosts contains NAS entry] ********************************************************************************************************************************************************************
changed: [10.107.204.67]
changed: [10.107.204.66]
changed: [10.107.204.68]
changed: [10.107.204.69]

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.66              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

挂载NAS

新建剧本ensure_mounts.yml

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ cat ensure_mounts.yml 
---
- name: Ensure directories and mounts are configured
  hosts: all  # 或者指定特定的组,如 'gpus'
  become: yes  # 提升权限以创建目录、编辑 /etc/fstab 和执行挂载操作
  tasks:
    - name: Ensure directories exist
      ansible.builtin.file:
        path: "{{ item }}"
        state: directory
        mode: '0755'
      loop:
        - /mnt/nas_v1
        - /mnt/nas_v2
        - /mnt/self-define

    - name: Ensure fstab contains necessary entries
      ansible.builtin.lineinfile:
        path: /etc/fstab
        line: "{{ item }}"
        state: present
        backup: yes  # 可选,备份文件
      loop:
        - "nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0"
        - "nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0"
        - "nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0"

    - name: Ensure all filesystems are mounted
      ansible.builtin.mount:
        path: "{{ item.path }}"
        src: "{{ item.src }}"
        fstype: "{{ item.fstype }}"
        opts: "{{ item.opts }}"
        state: mounted
      loop:
        - { path: "/mnt/nas_v1", src: "nas:/volume1/1", fstype: "nfs", opts: "defaults" }
        - { path: "/mnt/self-define", src: "nas:/volume1/1/self-define", fstype: "nfs", opts: "defaults" }
        - { path: "/mnt/nas_v2", src: "nas:/volume2/2", fstype: "nfs", opts: "defaults" }

执行命令

执行上述剧本,创建目录,更新/etc/fstab 并且执行挂载

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ ansible-playbook ensure_mounts.yml -l gpus

PLAY [Ensure directories and mounts are configured] **************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************
[WARNING]: Platform linux on host 10.107.204.67 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.67]
[WARNING]: Platform linux on host 10.107.204.68 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.68]
[WARNING]: Platform linux on host 10.107.204.64 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.64]
[WARNING]: Platform linux on host 10.107.204.65 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.65]
[WARNING]: Platform linux on host 10.107.204.66 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.66]
[WARNING]: Platform linux on host 10.107.204.69 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.69]
[WARNING]: Platform linux on host 10.107.204.72 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.72]
[WARNING]: Platform linux on host 10.107.204.70 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.70]
[WARNING]: Platform linux on host 10.107.204.73 is using the discovered Python interpreter at /usr/bin/python3.8, but future installation of another Python interpreter could change the meaning of that path.
See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [10.107.204.73]
fatal: [10.107.204.71]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.107.204.71 port 22: Connection timed out", "unreachable": true}

TASK [Ensure directories exist] **********************************************************************************************************************************************************************************
ok: [10.107.204.68] => (item=/mnt/nas_v1)
ok: [10.107.204.65] => (item=/mnt/nas_v1)
changed: [10.107.204.66] => (item=/mnt/nas_v1)
ok: [10.107.204.64] => (item=/mnt/nas_v1)
ok: [10.107.204.67] => (item=/mnt/nas_v1)
ok: [10.107.204.68] => (item=/mnt/nas_v2)
ok: [10.107.204.65] => (item=/mnt/nas_v2)
changed: [10.107.204.66] => (item=/mnt/nas_v2)
ok: [10.107.204.67] => (item=/mnt/nas_v2)
ok: [10.107.204.64] => (item=/mnt/nas_v2)
ok: [10.107.204.68] => (item=/mnt/self-define)
ok: [10.107.204.65] => (item=/mnt/self-define)
ok: [10.107.204.64] => (item=/mnt/self-define)
ok: [10.107.204.67] => (item=/mnt/self-define)
ok: [10.107.204.66] => (item=/mnt/self-define)
ok: [10.107.204.69] => (item=/mnt/nas_v1)
ok: [10.107.204.70] => (item=/mnt/nas_v1)
ok: [10.107.204.73] => (item=/mnt/nas_v1)
ok: [10.107.204.72] => (item=/mnt/nas_v1)
ok: [10.107.204.69] => (item=/mnt/nas_v2)
ok: [10.107.204.70] => (item=/mnt/nas_v2)
ok: [10.107.204.72] => (item=/mnt/nas_v2)
ok: [10.107.204.73] => (item=/mnt/nas_v2)
ok: [10.107.204.69] => (item=/mnt/self-define)
ok: [10.107.204.70] => (item=/mnt/self-define)
ok: [10.107.204.72] => (item=/mnt/self-define)
ok: [10.107.204.73] => (item=/mnt/self-define)

TASK [Ensure fstab contains necessary entries] *******************************************************************************************************************************************************************
ok: [10.107.204.64] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.64] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.64] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.67] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.65] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.66] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.68] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume1/1 /mnt/nas_v1 nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume1/1/self-define /mnt/self-define nfs defaults 0 0)
ok: [10.107.204.69] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.70] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.72] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)
ok: [10.107.204.73] => (item=nas:/volume2/2 /mnt/nas_v2 nfs defaults 0 0)

TASK [Ensure all filesystems are mounted] ************************************************************************************************************************************************************************
ok: [10.107.204.66] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
ok: [10.107.204.66] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
ok: [10.107.204.66] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.65] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.67] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.64] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.68] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/nas_v1', 'src': 'nas:/volume1/1', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.69] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/self-define', 'src': 'nas:/volume1/1/self-define', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.70] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.72] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})
changed: [10.107.204.73] => (item={'path': '/mnt/nas_v2', 'src': 'nas:/volume2/2', 'fstype': 'nfs', 'opts': 'defaults'})

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.64              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.65              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.66              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.70              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.71              : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.72              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.73              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  

harbor处理

如果在某些节点上的 /etc/docker/daemon.json 文件中已经包含了 "insecure-registries" 配置项,并且你希望添加新的仓库地址而不删除现有的项,你需要确保更新操作不会覆盖现有的配置。Ansible 的 blockinfile 模块可以帮助你添加新的配置,同时保留文件中已存在的其他内容。

新建playbook

yaml 复制代码
---
- name: Update Docker daemon configuration and login to repository if needed
  hosts: gpus
  become: yes
  tasks:
    - name: Install python3
      ansible.builtin.package:
        name: python3
        state: present

    - name: Ensure /etc/docker/daemon.json exists
      ansible.builtin.file:
        path: /etc/docker/daemon.json
        state: touch

    - name: Add new registry to /etc/docker/daemon.json
      ansible.builtin.blockinfile:
        path: /etc/docker/daemon.json
        block: |
          {
            "insecure-registries": ["10.200.88.53"]
          }
        marker: "# {mark} ANSIBLE MANAGED BLOCK"
        create: yes
        backup: yes
        mode: '0644'
        validate: 'python3 -m json.tool %s > /dev/null'

    - name: Restart Docker service
      ansible.builtin.service:
        name: docker
        state: restarted

    - name: Check if Docker is already logged in
      ansible.builtin.command:
        cmd: docker info | grep "Username:"
      register: docker_login_status
      ignore_errors: yes

    - name: Log in to Docker registry if not already logged in
      ansible.builtin.command:
        cmd: docker login 10.200.88.53 --username dros_admin --password 'Dros@zjgxn&07101604'
      when: docker_login_status.rc != 0
      ignore_errors: yes

playbook解析

上述命令解析如下

确保 Python 已安装

  • 确保在节点上安装了 Python3,因为 json.tool 需要 Python3 支持。

确保 /etc/docker/daemon.json 存在

  • 确保该文件存在,即使它是空文件。

读取现有的 daemon.json

  • 使用 slurp 模块读取现有的 JSON 文件内容。

解码 JSON

  • 将读取到的 base64 编码的内容解码并转换为 JSON 对象。

确保包含新仓库地址

  • 更新 JSON 对象,确保 insecure-registries 中包含新的仓库地址。

写入更新后的 daemon.json

  • 将更新后的 JSON 写入到 /etc/docker/daemon.json,并进行备份。

验证 JSON 语法

  • 验证 JSON 文件的语法正确性。

重启 Docker 服务

  • 确保 Docker 服务使用新的配置重新启动。

直接登录 Docker 注册表

  • 尝试登录 Docker 注册表,如果登录失败不会中断 Playbook 的执行。

执行命令

bash 复制代码
(ansible) yuzailiang@ubuntu:~$ ansible-playbook update_harbor.yml

TASK [Restart Docker service] ************************************************************************************************************************************************************************************
changed: [10.107.204.65]
changed: [10.107.204.66]
changed: [10.107.204.64]
changed: [10.107.204.68]
changed: [10.107.204.67]
changed: [10.107.204.72]
changed: [10.107.204.69]
changed: [10.107.204.70]
changed: [10.107.204.73]

TASK [Log in to Docker registry] *********************************************************************************************************************************************************************************
changed: [10.107.204.64]
changed: [10.107.204.67]
changed: [10.107.204.66]
changed: [10.107.204.65]
changed: [10.107.204.68]
changed: [10.107.204.69]
changed: [10.107.204.72]
changed: [10.107.204.70]
changed: [10.107.204.73]

PLAY RECAP *******************************************************************************************************************************************************************************************************
10.107.204.64              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.65              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.66              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.67              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.68              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.69              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.70              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.71              : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.72              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
10.107.204.73              : ok=11   changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

总结

本文围绕ansible,以及ansible命令和ansible-playbook命令完成了自动化集群管理的环境部署,以及使用,通过自动完成harbor仓库配置,NAS目录挂载,更新hosts,等同类任务方便所有GPU节点的使用。ansible是一个非常良好的自动化管理工具。

相关推荐
扑火的小飞蛾1 天前
【Ansible学习笔记01】 批量执行 shell 命令
笔记·学习·ansible
oMcLin1 天前
如何在 Red Hat Linux 服务器上使用 Ansible 自动化部署并管理多节点 Hadoop 集群?
linux·服务器·ansible
linux修理工4 天前
vagrant ubuntu 22.04 ansible 配置
ubuntu·ansible·vagrant
biubiubiu07065 天前
Ansible自动化
运维·自动化·ansible
秋4276 天前
ansible配置与模块介绍
ansible
秋4276 天前
ansible剧本
linux·服务器·ansible
码农101号7 天前
Ansible - Role介绍 和 使用playbook部署wordPress
android·ansible
2301_800050999 天前
Ansible
运维·ansible
阎*水11 天前
Ansible 核心要点总结
ansible
小安运维日记11 天前
RHCA - DO374 | Day09:自定义内容集和执行环境
linux·运维·服务器·系统架构·ansible·改行学it