Ansible Playbook关键字 | 快速入门 | 案例教程

一、【写在前面】

1. 废话

笔者最近在规划写几篇连续的文章，想来想去还是Ansible最值得记录：

一来是此工具学习曲线比较平缓，不会一看文档就不想学了，早期学习性价比非常高；

其次、这个东西基本都要用到，无论你是走什么方向，只要机器数量上来了，这个组件是大概率要接触到的，有一些厂会用其他的工具比如SaltStack或者Puppet但是理念差不多，先学ansible之后这些都很好搞懂。

最后、这个工具学习性价比很高，不需要太多的前置知识，短时间内就能拿起来用，当然这也意味着你在使用时要特别小心，因为一条命令可能就造成大量灾难。

笔者的文档本质是二手货，只是个人心得的一些记录，是官方文档的拙略模仿，如果恰好能帮到您，那就是意外的收获了。

官网：Ansible Documentation

前置知识：YAML（必需）；shell、ini、jinja2（差不多会就行了）

PS：官网所有的关键字都在这了，点个关注就能免费看，求支持

2. 简介

Ansible是一个批量管理机器的工具，通过一个或多个主控端控制被控机器。是用python开发的，主要用SSH通讯，其他协议为补充。支持主流Linux发行版本，甚至还支持管理windows(Ansible.Windows)和macos。其命名灵感来自于1966年小说罗坎农世界《Rocannon's World》，之后著名的电影《安德的游戏》中也致敬了此词------安塞波。在小说中，作者使用常用技能"加设定"，构造了一个虚拟概念安塞波，来解决主角远距离操控星舰打仗的延迟问题。

Ansible-Playbook也即ansible剧本是最常用到也是其最强大的功能，通过构建一个或多个剧本以及模板，来达到做出一系列复杂任务。因为主要用yaml来做剧本，所以基本有手就行，看看文档一天就能构建一个基本的多任务的剧本，再配合ansible-vault和ansible-pull(或者其他代码托管)，很容易就可以构建起安全且高效的任务托管。

当然，正式开始之前，需要做好基本的软件安装和机器发现

二、【正式开始】

Ansible的关键字可以分为四类：PLAY, ROLE, BLOCK, TASK。正好对应了playbook的大致结构，简单来说，PLAY是一块内容包含一个或多个任务；ROLE可以理解为是一个类，这里被称为角色，通过不同角色来区分不同功能，也就是说ROLE中包含的内容会很多；BLOCK可以理解为也是多个任务的集合，但是包含了更多例如错误处理debug这样的功能；TASK则是ansible执行的最小单元，可以理解为语句，再小就执行不了了。

关键字可以直接理解为命令语句

(1). PLAY

1. any_errors_fatal

这个关键字表示，当任务执行中有任何错误发生，立马停止所有任务。

TypeScript 复制代码

- name: Example of any_errors_fatal    # 剧本的名字，这里开头有个' - ' 
  hosts: all                           # 标注为一个PLAY, 可以理解为一个任务块
  any_errors_fatal: true               
  tasks:                               # 真正的任务内容
    - name: Fail the command
      command: /bin/false

2. become

这个关键字可以直接提权运行命令，become是一个布尔值，为真则用sudo提权，当然需要提前在对应机器配置好sudoer

TypeScript 复制代码

- name: Example of become
  hosts: all
  become: true
  tasks:
    - name: Run a privileged command
      command: systemctl restart nginx

3. become_exe

针对become的补充，指定提权文件的文件夹

TypeScript 复制代码

- name: Example of become_exe
  hosts: all
  become: true
  become_exe: /usr/bin/sudo
  tasks:
    - name: Run a privileged command
      command: systemctl restart nginx

4. become_flags

Ansible 2.5加入的关键词，默认是与在linux下的sudo参数一致，windows下参数则是logon_type 和 logon_flags。 linux默认是"-H -S -n"

TypeScript 复制代码

- name: Example of become_flags
  hosts: all
  become: true
  become_flags: '-E'
  tasks:
    - name: Run a privileged command with environment preservation
      command: echo $MY_ENV_VAR
      environment:
        MY_ENV_VAR: "Example"

5. become_method

设置提权方法，比如设置为sudo或者su

TypeScript 复制代码

- name: 使用 sudo 方法提权
  hosts: all
  become: true
  become_method: sudo
  tasks:
    - name: 以 root 用户身份运行命令
      command: whoami

6. become_user

提权后变成指定用户，但是远程用户要有权限切换才行

TypeScript 复制代码

- name: 切换到特定用户
  hosts: all
  become: true
  become_user: someuser
  tasks:
    - name: 以特定用户身份运行命令
      command: whoami

7. check_mode

debug模式，设置为true后，只会检查而不会执行

TypeScript 复制代码

- name: 检查模式示例
  hosts: all
  tasks:
    - name: 尝试安装 vim
      yum:
        name: vim
        state: present
      check_mode: true

8. collections

collections是ansible一个比较强大的功能，可以理解为调包，在ansible强大的社区中，找到自己想要的功能。collection下面一般包含有ROLES, MODULES, PLUGINS, PLAYBOOKS，需要什么部分可以直接用.

比如下方片段，使用一个名为 my_namespace.my_collection 的 collection，其中包含一个名为 my_module 的模块和一个角色 my_role

TypeScript 复制代码

---
- name: Example playbook using a collection
  hosts: all
  collections:
    - my_namespace.my_collection

  tasks:
    - name: Use a module from the collection
      my_module:
        parameter1: value1
        parameter2: value2

  roles:
    - my_role

9. connections

这个关键词，主要用于不同机器类型，设置不同的连接类型，比如ssh，winrm，社区还有SNMP，message bus等都可以用（这里笔者有一个疑问，用SNMP通信的一般是简单设备，用ansible一般是做什么）

TypeScript 复制代码

- name: 使用 ssh 连接
  hosts: all
  connection: ssh
  tasks:
    - name: 执行远程命令
      command: echo "Hello, SSH!"

10. debug

这个关键词是调试用的，这里设置为失败后开启debug

TypeScript 复制代码

- name: 使用调试器
  hosts: localhost
  tasks:
    - name: 尝试执行可能失败的命令
      command: /bin/false
      debugger: on_failed

11. diff

类似于linux的diff命令，显示文件差异

TypeScript 复制代码

- name: 显示文件变更差异
  hosts: localhost
  tasks:
    - name: 确保文本文件包含特定内容
      lineinfile:
        path: /tmp/testfile.txt
        line: 'Ansible is awesome'
      diff: yes

12. environment

设置环境变量

TypeScript 复制代码

- name: 设置环境变量
  hosts: localhost
  tasks:
    - name: 打印环境变量
      command: echo $MY_VAR
      environment:
        MY_VAR: "Hello, Ansible!"

13. fact_path

Ansible 在执行 playbook 时通常会首先执行一个自动的信息收集步骤，称为 "Gathering Facts"。这一步骤收集的信息包括主机的网络配置、操作系统类型、硬件规格等，这些信息以变量的形式在后续的 playbook 中可供访问。当然用户也可以自定义fact，这里的fact_path就是保存这些内容的路径

当然另一个参数《gather_facts》不能为false，不过还好默认这个是true

TypeScript 复制代码

- name: 自定义事实路径
  hosts: localhost
  fact_path: /custom/fact/path
  tasks:
    - name: 收集事实
      setup:

14. gather_facts

ansible剧本运行前会获取机器的一些状态信息，这一步骤收集的信息包括主机的网络配置、操作系统类型、硬件规格等，这些信息以变量的形式在后续的 playbook 中可供访问。除了 Ansible 默认收集的事实外，用户可以通过编写自定义的事实脚本来生成额外的事实。这些脚本通常放在 /etc/ansible/facts.d 目录下（这是默认的 factpath），并需要返回 JSON 格式的数据。

TypeScript 复制代码

- name: 不收集事实
  hosts: localhost
  gather_facts: false
  tasks:
    - name: 仅运行任务
      debug:
        msg: "Skipping fact gathering"

15. gather_subset

小时候经常听家长说，这个世界不是非黑即白的，当时虽然懂却很生气觉得很犬儒，但是现在长大了发现有的时候身不由己，真没辙。这个gatherfact也是这样，不一定就是收集或者不收集，可以只收集一部分机器信息

TypeScript 复制代码

- name: 收集特定的事实子集
  hosts: localhost
  gather_subset: network,hardware
  tasks:
    - name: 显示收集的网络和硬件事实
      debug:
        var: ansible_facts

16. gather_timeout

设置获取机器信息的超时时间，这个如果不设置的话默认是10s，但我看有老外写这个后没有生效，有人说可以试试ANSIBLE_GATHER_TIMEOUT = "60"

TypeScript 复制代码

- name: 设置事实收集的超时时间
  hosts: localhost
  gather_facts: true
  gather_timeout: 30
  tasks:
    - name: 简单输出消息
      debug:
        msg: "Fact gathering with custom timeout"

17. force_handlers

这个就是执行失败的时候，的一个操作，可以用来做自救。配合handlers块使用

TypeScript 复制代码

- name: 强制执行 handler
  hosts: localhost
  force_handlers: true
  tasks:
    - name: 失败的任务
      command: /bin/false
      notify: restart nginx

  handlers:
    - name: restart nginx
      command: echo "Restarting Nginx"

18. hosts

非常有用，区分特定的主机组来执行任务，一般主机组是提前写到剧本里或者设置到ansible默认文件中。这里是指定webservers组来执行任务

TypeScript 复制代码

- name: 针对特定主机组执行操作
  hosts: webservers
  tasks:
    - name: 在所有 web 服务器上安装 nginx
      yum:
        name: nginx
        state: latest

19. ignore_errors

字如其名，用来忽略错误

TypeScript 复制代码

- name: 忽略错误
  hosts: localhost
  tasks:
    - name: 尝试执行可能失败的命令
      command: /bin/false
      ignore_errors: true

20. ignore_unreachable

字如其名，忽略不可达的机器

TypeScript 复制代码

- name: 忽略不可达的主机
  hosts: all
  ignore_unreachable: true
  tasks:
    - name: 尝试连接所有主机
      ping:

21. max_fail_percentage

设置一个失败阈值，剧本在很多主机执行任务中，如果超出阈值就会终止剧本的运行

TypeScript 复制代码

- name: 设置失败百分比阈值
  hosts: all
  max_fail_percentage: 30
  tasks:
    - name: 执行可能失败的任务
      command: /bin/false

22. module_defualts

为模块设置一些默认值，比如默认安装的应用，默认拷贝的路径，但笔者认为，这个写多了，会让可读性变差

TypeScript 复制代码

- name: 设置模块默认参数
  hosts: localhost
  module_defaults:
    yum:
      name: nginx
      state: latest
  tasks:
    - name: 安装指定软件
      yum:

23. name

见的比较多了，就是起个名字

TypeScript 复制代码

- name: 提供剧本和任务名称
  hosts: localhost
  tasks:
    - name: 输出消息
      debug:
        msg: "示例消息"

24. no_log

布尔值，控制是否输出日志，为真不输出。默认日志输出在/var/log/ansible，但是注意，ansible打印日志需要提前设置，看这个Logging Ansible output --- Ansible Community Documentation

TypeScript 复制代码

- name: 隐藏敏感信息
  hosts: localhost
  tasks:
    - name: 显示敏感命令的输出
      shell: echo "敏感数据"
      no_log: true

25. order

排序，执行剧本的机器顺序，可以顺序执行，倒序执行，随机等，inventory（默认，按照模板的机器顺序执行）、sorted、reverse_sorted、reverse_inventory（按照模板的机器倒序执行） 和 shuffle（随机）

TypeScript 复制代码

- name: 主机执行顺序
  hosts: all
  order: shuffle
  tasks:
    - name: 输出主机名
      debug:
        msg: "主机名是 {{ inventory_hostname }}"

26. port

设置连接到远端的默认端口

TypeScript 复制代码

- name: 使用自定义端口连接
  hosts: localhost
  vars:
    ansible_port: 2222
  tasks:
    - name: 检查本地端口
      wait_for:
        port: 2222
        state: present

27. post_tasks

字如其名，后面的任务，也即在roles和普通task之后执行的任务

TypeScript 复制代码

- name: 使用 post_tasks
  hosts: localhost
  tasks:
    - name: 主要任务
      debug:
        msg: "执行主要任务"
  post_tasks:
    - name: 后置任务
      debug:
        msg: "执行后置任务"

28. pre_tasks

前面的任务，在 roles 执行之前先执行的任务列表

TypeScript 复制代码

- name: 使用 pre_tasks
  hosts: localhost
  pre_tasks:
    - name: 执行前置任务
      debug:
        msg: "在 roles 之前执行的任务"
  roles:
    - myrole
  tasks:
    - name: 主任务
      debug:
        msg: "主任务执行"

29. remote_user

指定远程登陆的用户

TypeScript 复制代码

- name: 指定远程用户
  hosts: all
  remote_user: admin
  tasks:
    - name: 执行远程命令
      command: whoami

30. roles

指定要导入到剧本中的角色列表

TypeScript 复制代码

- name: 应用角色
  hosts: localhost
  roles:
    - role: example_role
      vars:
        role_var: value

31. run_once

只跑一次的任务，只在第一个可以连接到的主机，进行任务

TypeScript 复制代码

- name: 应用角色
  hosts: localhost
  roles:
    - role: example_role
      vars:
        role_var: value

32. serial

控制并发数，可以是百分比10%，也可以是数字，代表着一次在这么多台机器执行，然后接着往下

TypeScript 复制代码

- name: 使用 serial 控制批处理执行
  hosts: all
  serial: 2
  tasks:
    - name: 逐个主机执行
      debug:
        msg: "依次在两台主机上执行"

33. strategy

并发策略，常用的有linear、free、mitogen。

linear：
- linear 策略会按照主机列表的顺序依次在每台主机上执行任务。这意味着它会在一个主机上执行完所有任务，然后再切换到下一个主机执行任务，依此类推。
- 这种策略适合于那些需要线性执行任务的场景，例如在主机之间有依赖关系，或者需要确保任务按照特定顺序执行的情况。
free：
- free 策略是 Ansible 2.10 版本引入的新策略。它允许任务在所有目标主机上自由并发执行，不受任何限制。
- 这种策略适用于需要尽可能快速地执行任务，并且任务之间没有顺序要求的情况。在处理大量目标主机时，free 策略可以提高执行效率。
mitogen：
- mitogen 是一种优化策略，它通过在主机之间共享Python对象来提高执行效率。它能够显著减少任务执行的启动时间和网络开销。
- 这种策略适用于大型环境和性能敏感的场景，可以加速任务的执行并减少系统负载。
- 请注意，使用 mitogen 策略需要安装 mitogen 插件，并且可能需要对您的Ansible配置进行一些额外的调整。

TypeScript 复制代码

- hosts: all
  strategy: linear
  tasks:
    - name: Task 1
      debug:
        msg: "This is task 1"
    - name: Task 2
      debug:
        msg: "This is task 2"

tags

很有用，可以在使用 ansible-playbook playbook.yaml --tags 'thistag' 的时候指定tag，这样就会只执行对应tag的任务

TypeScript 复制代码

- name: 使用标签
  hosts: all
  tasks:
    - name: 只有在特定标签被调用时执行的任务
      debug:
        msg: "这个任务包含一个特定标签"
      tags:
        - special

    - name: 常规任务执行
      debug:
        msg: "这个任务总是执行"

tasks

任务块，在pre_tasks，roles之后执行，在post_tasks之前执行

TypeScript 复制代码

- name: 执行主要任务
  hosts: localhost
  tasks:
    - name: 打印信息
      debug:
        msg: "主任务列表中的任务"

    - name: 检查磁盘空间
      command: df -h
      register: disk_space

    - name: 显示磁盘空间信息
      debug:
        msg: "{{ disk_space.stdout }}"

throttle

设置并发数上限，有利于控制资源

TypeScript 复制代码

- name: 使用限流控制并发执行
  hosts: all
  tasks:
    - name: 并发限制任务执行
      command: echo "Running task with throttle"
      throttle: 2

timeout

超时设置，超时将认为失败

TypeScript 复制代码

- name: 任务执行超时设置
  hosts: localhost
  tasks:
    - name: 可能会长时间运行的任务
      command: sleep 5
      timeout: 2

vars

设置变量的，一旦设置，在当前层级生效

TypeScript 复制代码

- name: 定义和使用变量
  hosts: localhost
  vars:
    my_var: "Hello, Ansible"
  tasks:
    - name: 使用定义的变量
      debug:
        msg: "{{ my_var }}"

vars_files

大量的变量，可以用文件来定义，一般是yaml或者ini

TypeScript 复制代码

- name: 从外部文件加载变量
  hosts: localhost
  vars_files:
    - vars/extra_vars.yml
  tasks:
    - name: 使用外部定义的变量
      debug:
        msg: "{{ external_var }}"

(2). ROLE

除了play里面的关键词，还多了下面这些

1. when

就是if语句

TypeScript 复制代码

- hosts: all
  roles:
    - role: configure-nginx
      when: "'webserver' in group_names"


- hosts: all
  roles:
    - role: deploy-app
      when: "ansible_os_family == 'RedHat' and ansible_distribution_major_version|int >= 7"

(3). BLOCK

除了上述的关键词，还多了下面这些

1. always

always 关键词在 Ansible 的错误处理和任务流控制中非常重要。它通常与 block 和 rescue 结构一起使用，确保无论前面的任务是否成功，都将执行指定的任务

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Attempt to execute a command
      block:
        - command: /bin/false
      rescue:
        - debug:
            msg: "The command failed, handling error..."
      always:
        - debug:
            msg: "This task runs no matter what happened before."

2. delegate_facts

delegate_facts 是在使用 delegate_to 时一起使用的选项，用于决定是否将从远程主机收集的事实带回到委托主机上。

TypeScript 复制代码

- hosts: server1
  tasks:
    - name: Gather facts from another host
      setup:
      delegate_to: server2
      delegate_facts: true

3. delegate_to

delegate_to 关键词用于将任务委托给另一台主机执行，而不是在当前目标主机上执行。

TypeScript 复制代码

- hosts: webserver
  tasks:
    - name: Restart database server
      service:
        name: postgresql
        state: restarted
      delegate_to: dbserver

4. notify

notify 关键词用于触发 handlers 的执行。当一个任务发出 notify 调用时，它将触发在 handlers 部分定义的相应任务。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Update the web server configuration
      template:
        src: /srv/httpd/templates/httpd.conf.j2
        dest: /etc/httpd/conf/httpd.conf
      notify:
        - restart apache

  handlers:
    - name: restart apache
      service:
        name: httpd
        state: restarted
# 在这个示例中，如果配置文件 httpd.conf 被更新，则会触发一个 handler 任务来重启 Apache 服务。

# 这些关键词扩展了 Ansible 的灵活性和控制力，使其能够更加精确地管理任务执行和应用逻辑。

(4). TASK

除了上述的关键词，还多了下面这些

1. action

直接指定要运行的模块

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Shutdown remote machine
      action: command /sbin/shutdown -t now

2. args

用来以字典形式明确地指定任务使用的参数。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Create a directory
      args:
        chdir: /tmp
        executable: /bin/mkdir
        creates: test_directory
      command: mkdir test_directory

3. async

用于设置任务的最大异步执行时间（秒），使任务在后台运行。因为考虑到有些任务可以丢到后台跑，顺序执行会卡后面任务的进度，所以这个时候用async可以让进度更快

TypeScript 复制代码

- name: Run a long-running task asynchronously
  shell: "some_long_running_task"
  async: 300    # 超时时间300s
  poll: 0
  register: async_result

- name: Check the status of the async task
  async_status:   # 使用 async_status 模块来轮询异步任务的执行状态，直到任务完成
    jid: "{{ async_result.ansible_job_id }}"
  register: async_status_result
  until: async_status_result.finished
  retries: 60
  delay: 10

4. changed_when

用于定义任务执行后是否更改的条件。每次跑ansible剧本，结束不是会有一个changed: 3 error: 1这种类似的信息吗

changed_when 接受一个条件表达式，当表达式返回 True 时，任务将被标记为"已更改"，否则将被标记为"未更改"。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Check if a file exists
      command: ls /path/to/file
      register: result
      changed_when: "'No such file' in result.stderr"

5. delay

用在重试逻辑中，定义重试前的等待时间（秒）。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Wait for web service to start
      uri:
        url: http://example.com
        status_code: 200
      register: result
      until: result.status == 200
      retries: 5
      delay: 10

6. local_action

用于在控制节点上执行操作，等同于 delegate_to: localhost。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Gather local facts
      local_action: setup

7. loop

用来重复执行，并且配合item取数，第一次item是testuser1，第二次是testuser2

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Add several users
      user:
        name: "{{ item }}"
        state: present
        groups: "wheel"
      loop:
        - testuser1
        - testuser2

8. loop_control

类似py的enumerate，index_var：用于指定循环的索引变量；loop_var：用于指定循环的循环变量。

TypeScript 复制代码

- name: Loop example
  hosts: localhost
  tasks:
    - name: Print loop index and item
      debug:
        msg: "Item {{ my_item }} at index {{ my_index }}"
      loop: 
        - item1
        - item2
        - item3
      loop_control:
        label: "{{ item }}"
        index_var: my_index
        loop_var: my_item

9. poll

设置异步任务的轮询间隔，若为0则不轮询。这里设置了异步任务超时14400s，每10s轮询一次

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Run a background job
      command: /usr/bin/some_long_job --option
      async: 14400
      poll: 10

10. register

用于捕获命令的输出

TypeScript 复制代码

- hosts: all
  tasks:
    - command: echo "Hello"
      register: echo_output
    - debug:
        msg: "The output was '{{ echo_output.stdout }}'"

11. retries和until

until定义任务应重复执行的条件，通常与 retries 和 delay 一起使用。

retries与 until 一起使用，定义重试次数。

TypeScript 复制代码

- hosts: all
  tasks:
    - name: Check web service
      uri:
        url: http://example.com
        return_content: yes
      register: webpage
      until: webpage.status == 200
      retries: 5
      delay: 2

12. with_<lookup_plugin>

也被称为with_X，是loop的一种，官方提供了很多with，有

with_list(按列表循环),
with_items（平铺一层并循环）,
with_indexed_items（索引管理）,
with_flattened（彻底平铺列表）,
with_together（取列）,
with_dict（迭代字典）,
with_sequence（生成一个数列并进行迭代）,
with_subelements（遍历一个列表的列表，要求有字典结构否则报错）,
with_nested/with_cartesian（笛卡尔积）,
with_random_choice（随机选取）。

可以看这个Loops --- Ansible Community Documentation

光是with_X的内容就可以单独水一篇文章，这里先举一个简单例子

TypeScript 复制代码

# 一个简单的三元素循环，指定列表进行迭代
- name: Print list items
  debug:
    msg: "{{ item }}"
  with_list:
    - one
    - two
    - three

有一点可以注意一下，with_list，with_items，with_flattened稍微有点区别，容易搞混。with_list是按列表频谱，也就是每个元素直接提取；with_items是会将列表的第一层拆开；with_flattened是彻底平铺，遍历每层，笔者写了个测试用例，可以看一下

三、【总结】

总结了一下ansible剧本的所有关键字，官网文档的关键字就这么多，基本看明白了关键字搞懂了剧本结构，就可以开始写剧本了。

四、【参考】