Ansible的错误处理

环境

  • 管理节点:Ubuntu 22.04
  • 控制节点:CentOS 8
  • Ansible:2.15.6

ignore_errors

使用 ignore_errors: true 来让Ansible忽略错误(运行结果是 failed ):

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: cat /tmp/abc.txt
      ignore_errors: true

    - name: task2
      debug:
        msg: "hello"

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.55]: FAILED! => {"changed": true, "cmd": "cat /tmp/abc.txt", "delta": "0:00:00.002192", "end": "2023-11-25 16:53:58.063148", "msg": "non-zero return code", "rc": 1, "start": "2023-11-25 16:53:58.060956", "stderr": "cat: /tmp/abc.txt: No such file or directory", "stderr_lines": ["cat: /tmp/abc.txt: No such file or directory"], "stdout": "", "stdout_lines": []}
...ignoring

TASK [task2] ***************************************************************************************
ok: [192.168.1.55] => {
    "msg": "hello"
}

PLAY RECAP *****************************************************************************************
192.168.1.55               : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=1  

ignore_unreachable

如果目标节点无法访问(比如编辑 /etc/ansible/hosts ):

yaml 复制代码
---
- hosts: all
  gather_facts: false # 避免在这里出错
  tasks:
    - name: task1
      shell: cat /tmp/abc.txt
      ignore_errors: true

    - name: task2
      debug:
        msg: "hello"

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}

PLAY RECAP *****************************************************************************************
192.168.1.56               : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   

可见,本例中 ignore_errors: true 并不起作用,这是因为运行结果是 unreachable ,不是 failedignore_errors: true 只针对 failed 有效。

要想忽略 unreachable 错误,则需要 ignore_unreachable: true

yaml 复制代码
---
- hosts: all
  gather_facts: false
  tasks:
    - name: task1
      shell: cat /tmp/abc.txt
      ignore_unreachable: true

    - name: task2
      debug:
        msg: "hello"

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}
...ignoring

TASK [task2] ***************************************************************************************
ok: [192.168.1.56] => {
    "msg": "hello"
}

PLAY RECAP *****************************************************************************************
192.168.1.56               : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=1 

可见, ignore_unreachable: true 可忽略 unreachable 错误。

注意: debug 运行在控制节点,不需要访问目标节点。

yaml 复制代码
---
- hosts: all
  gather_facts: false
  tasks:
    - name: task1
      shell: cat /tmp/abc.txt
      ignore_unreachable: true

    - name: task2
      shell: cat /tmp/def.txt

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}
...ignoring

TASK [task2] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}

PLAY RECAP *****************************************************************************************
192.168.1.56               : ok=1    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=1

可见, ignore_unreachable: true 只针对当前范围有效。

所以,应把 ignore_unreachable: true 提高一层:

yaml 复制代码
---
- hosts: all
  gather_facts: false
  ignore_unreachable: true
  tasks:
    - name: task1
      shell: cat /tmp/abc.txt

    - name: task2
      shell: cat /tmp/def.txt

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}
...ignoring

TASK [task2] ***************************************************************************************
fatal: [192.168.1.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.56 port 22: No route to host", "unreachable": true}
...ignoring

PLAY RECAP *****************************************************************************************
192.168.1.56               : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=2   

force_handlers

我们知道,handler是在所有task都运行结束后才运行的。假设task1触发了handler1,但接下来task2出错了,则handler无法运行。

yaml 复制代码
---
- hosts: all
  #force_handlers: True
  tasks:
    - name: task1
      shell: cat /tmp/a.txt
      notify: handler1

    - name: task2
      shell: cat /tmp/abc.txt

  handlers:
    - name: handler1
      debug:
        msg: "OK"

运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
changed: [192.168.1.55]

TASK [task2] ***************************************************************************************
fatal: [192.168.1.55]: FAILED! => {"changed": true, "cmd": "cat /tmp/abc.txt", "delta": "0:00:00.003114", "end": "2023-11-25 17:29:23.501341", "msg": "non-zero return code", "rc": 1, "start": "2023-11-25 17:29:23.498227", "stderr": "cat: /tmp/abc.txt: No such file or directory", "stderr_lines": ["cat: /tmp/abc.txt: No such file or directory"], "stdout": "", "stdout_lines": []}

可见,handler并未运行。

通过指定 force_handlers: True ,可以强制handler运行。

添加 force_handlers: True (见注释部分),则运行结果如下:

powershell 复制代码
TASK [task1] ***************************************************************************************
changed: [192.168.1.55]

TASK [task2] ***************************************************************************************
fatal: [192.168.1.55]: FAILED! => {"changed": true, "cmd": "cat /tmp/abc.txt", "delta": "0:00:00.002423", "end": "2023-11-25 17:30:09.107674", "msg": "non-zero return code", "rc": 1, "start": "2023-11-25 17:30:09.105251", "stderr": "cat: /tmp/abc.txt: No such file or directory", "stderr_lines": ["cat: /tmp/abc.txt: No such file or directory"], "stdout": "", "stdout_lines": []}

RUNNING HANDLER [handler1] *************************************************************************
ok: [192.168.1.55] => {
    "msg": "OK"
}

注:也可以通过 meta: flush_handlers 使handler立即运行。

failed_when

可以自定义失败。具体方法为:先用register变量获取运行结果,然后再根据结果来做判断,比如:

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: cat /tmp/a.txt
      register: result
      failed_when: "'OK' not in result.stdout"
  • 若目标机器的 /tmp/a.txt 文件里包含 OK 字样,则视为运行成功:
powershell 复制代码
TASK [task1] ***************************************************************************************
changed: [192.168.1.55]

PLAY RECAP *****************************************************************************************
192.168.1.55               : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
  • 若目标机器的 /tmp/a.txt 文件里不包含 OK 字样,则视为运行失败:
powershell 复制代码
TASK [task1] ***************************************************************************************
fatal: [192.168.1.55]: FAILED! => {"changed": true, "cmd": "cat /tmp/a.txt", "delta": "0:00:00.004289", "end": "2023-11-25 17:42:22.907553", "failed_when_result": true, "msg": "", "rc": 0, "start": "2023-11-25 17:42:22.903264", "stderr": "", "stderr_lines": [], "stdout": "aaaaa\nb\nccccc", "stdout_lines": ["aaaaa", "b", "ccccc"]}

PLAY RECAP *****************************************************************************************
192.168.1.55               : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0  

如果有多个判断条件,且它们之间是"与"的关系:

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: cat /tmp/a.txt
      register: result
      failed_when:
        - "'OK' not in result.stdout"
        - "'SUCCESS' not in result.stdout"

如果目标机器的 /tmp/a.txt 文件里不包含 OK 字样,且不包含 SUCCESS 字样,则视为运行失败。

当然,也可以用 failed_when: xxxxx and xxxxx

同理,若多个条件之间是"或"的关系,则是: failed_when: xxxxx or xxxxx

注:如果语句太长,可以用 > 换行(这是yaml的语法):

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: cat /tmp/a.txt
      register: result
      failed_when: >
        'OK' not in result.stdout or
        'SUCCESS' not in result.stdout

changed_when

failed_when 类似,可通过 changed_when 来自定义结果发生变化的条件。比如,我们知道handler只有当运行结果发生变化时,才会被触发。此时自定义结果发生变化的条件就尤为有用。

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: cat /tmp/a.txt
      register: result
      changed_when: >
        'OK' in result.stdout or
        'SUCCESS' in result.stdout
      notify: handler1

  handlers:
    - name: handler1
      debug:
        msg: "The result is OK"

本例中,当目标机器的 /tmp/a.txt 文件里包含 OKSUCCESS 字样时,就会触发handler1。

强制shell/command运行成功

假设目标机器上 /tmp/b.sh 内容如下:

powershell 复制代码
#!/bin/bash

echo "bad"

exit 1

该脚本运行结果的返回值不是0,则Ansible会视为失败。如果想视为成功,一个办法是 ignore_errors: true ,还有一个办法是让shell返回0,比如:

yaml 复制代码
---
- hosts: all
  tasks:
    - name: task1
      shell: /tmp/b.sh || /bin/true

注: /tmp/b.sh || /bin/true 是一个整体,其中 /bin/true 是系统自带的脚本,返回0。

powershell 复制代码
➜  ~ /tmp/b.sh || /bin/true
bad

➜  ~ echo $?
0

参考

  • https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_error_handling.html
相关推荐
向往风的男子13 小时前
【devops】devops-ansible之介绍和基础使用
运维·ansible·devops
我的运维人生3 天前
利用Python与Ansible实现高效网络配置管理
网络·python·ansible·运维开发·技术共享
qlau20073 天前
基于kolla-ansible在AnolisOS8.6上部署all-in-one模式OpenStack-Train
ansible·openstack
Shenqi Lotus3 天前
Ansible——Playbook基本功能
运维·ansible·playbook
qlau20076 天前
基于kolla-ansible在openEuler 22.03 SP4上部署OpenStack-2023.2
ansible·openstack
水彩橘子6 天前
Semaphore UI --Ansible webui
ui·ansible
happy_king_zi6 天前
ansible企业实战
运维·ansible·devops
码上飞扬6 天前
深入浅出 Ansible 自动化运维:从入门到实战
运维·ansible·自动化运维
theo.wu7 天前
Ansible自动化部署kubernetes集群
kubernetes·自动化·ansible
xidianjiapei0017 天前
Ubuntu Juju 与 Ansible的区别
linux·ubuntu·云原生·ansible·juju