Chapter 8 RH294 RHEL Automation with Ansible

ONLY FOR SELF STUDY, NO COMMERCIAL USAGE!!!

- [***ONLY FOR SELF STUDY, NO COMMERCIAL USAGE!!!***](#ONLY FOR SELF STUDY, NO COMMERCIAL USAGE!!!)
- [Chapter 8. Troubleshooting Ansible](#Chapter 8. Troubleshooting Ansible)
- - [Troubleshooting Playbooks](#Troubleshooting Playbooks)
  - - [Debugging Playbooks](#Debugging Playbooks)
    - [Examining Values of Variables with the Debug Module](#Examining Values of Variables with the Debug Module)
    - [Reviewing Playbooks for Errors](#Reviewing Playbooks for Errors)
    - - [Checking Playbook Syntax for Problems](#Checking Playbook Syntax for Problems)
      - [Checking a Given Task in a Playbook](#Checking a Given Task in a Playbook)
      - [Checking Playbooks for Issues and Following Good Practices](#Checking Playbooks for Issues and Following Good Practices)
    - [Reviewing Playbook Artifacts and Log Files](#Reviewing Playbook Artifacts and Log Files)
    - - [Playbook Artifacts from Automation Content Navigator](#Playbook Artifacts from Automation Content Navigator)
      - [Logging Output to a Text File](#Logging Output to a Text File)
    - References
    - Example
  - [Troubleshooting Ansible Managed Hosts](#Troubleshooting Ansible Managed Hosts)
  - - [Troubleshooting Connections](#Troubleshooting Connections)
    - - [Problems Authenticating to Managed Hosts](#Problems Authenticating to Managed Hosts)
      - [Problems with Name or Address Resolution](#Problems with Name or Address Resolution)
      - [Problems with Privilege Escalation](#Problems with Privilege Escalation)
      - [Problems with Python on Managed Hosts](#Problems with Python on Managed Hosts)
    - [Using Check Mode as a Testing Tool](#Using Check Mode as a Testing Tool)
    - [Testing with Modules](#Testing with Modules)
    - [Running Ad Hoc Commands with Ansible](#Running Ad Hoc Commands with Ansible)
    - - [Testing Managed Hosts Using Ad Hoc Commands](#Testing Managed Hosts Using Ad Hoc Commands)
    - References
    - Example
  - [Chapter 8 Example](#Chapter 8 Example)

Chapter 8. Troubleshooting Ansible

Troubleshooting Playbooks

Debugging Playbooks

The output provided by the ansible-navigator run command is a good starting point for troubleshooting issues with your plays and the hosts on which they run.

You can increase the verbosity of the output by adding one or more -v options. The ansible-navigator run -v command provides additional debugging information, with up to four total levels.

Table 8.1. Verbosity Configuration

Option	Description
`-v`	The output data is displayed.
`-vv`	Both the output and input data are displayed.
`-vvv`	Includes information about connections to managed hosts.
`-vvvv`	Includes additional information, such as the scripts that are executed on each remote host, and the user that is executing each script.

Examining Values of Variables with the Debug Module

You can use the ansible.builtin.debug module to provide insight into what is happening in the play. You can create a task that uses this module to display the value for a given variable at a specific point in the play.

The following examples use the msg and var settings inside ansible.builtin.debug tasks. This first example displays the value at run time of the ansible_facts['memfree_mb'] fact as part of a message printed to the output of ansible-navigator run.

yaml 复制代码

- name: Display free memory
  ansible.builtin.debug:
    msg: "Free memory for this system is {{ ansible_facts['memfree_mb'] }}"

This second example displays the value of the output variable.

yaml 复制代码

- name: Display the "output" variable
  ansible.builtin.debug:
    var: output
    verbosity: 2

The verbosity parameter controls when the ansible.builtin.debug module is executed. The value correlates to the number of -v options that are specified when the playbook is run. For example, if -vv is specified, and verbosity is set to 2 for a task, then that task is included in the debug output. The default value of the verbosity parameter is 0.

Reviewing Playbooks for Errors

Several issues can occur during a playbook run, many related to the syntax of either the playbook or any of the templates it uses, or due to connectivity issues with the managed hosts (for example, an error in the host name of the managed host in the inventory file).

A number of tools are available that you can use to review your playbook for syntax errors and other problems before you run it.

Checking Playbook Syntax for Problems

The ansible-navigator run command accepts the --syntax-check option, which tests your playbook for syntax errors instead of actually running it.

It is a good practice to validate the syntax of your playbook before using it or if you are having problems with it.

复制代码

[student@demo ~]$ ansible-navigator run \
> -m stdout playbook.yml --syntax-check

Checking a Given Task in a Playbook

You can use the ansible-navigator run command with the --step option to step through a playbook, one task at a time.

The ansible-navigator run --step command interactively prompts for confirmation that you want each task to run. Press Y to confirm that you want the task to run, N to skip the task, or C to continue running the remaining tasks.

复制代码

[student@demo ~]$ ansible-navigator run \
> -m stdout playbook.yml --step --pae false

PLAY [Managing errors playbook] **********************************************
Perform task: TASK: Gathering Facts (N)o/(y)es/(c)ontinue:

Because Ansible prompts you for input when you use the --step option, you must disable playbook artifacts and use standard output mode.

You can also start running a playbook from a specific task by using the --start-at-task option. Provide the name of a task as an argument to the ansible-navigator run --start-at-task command.

For example, suppose that your playbook contains a task named Ensure {``{ web_service }} is started. Use the following command to run the playbook starting at that task:

复制代码

[student@demo ~]$ ansible-navigator run \
> -m stdout playbook.yml --start-at-task "Ensure {{ web_service }} is started"

You can use the ansible-navigator run --list-tasks command to list the task names in your playbook.

Checking Playbooks for Issues and Following Good Practices

One of the best ways to make it easier for you to debug playbooks is for you to follow good practices when writing them in the first place. Some recommended practices for playbook development include:

Use a concise description of the play's or task's purpose to name plays and tasks. The play name or task name is displayed when the playbook is executed. This also helps document what each play or task is supposed to accomplish, and possibly why it is needed.
Use comments to add additional inline documentation about tasks.
Make effective use of vertical white space. In general, organize task attributes vertically to make them easier to read.
Consistent horizontal indentation is critical. Use spaces, not tabs, to avoid indentation errors. Set up your text editor to insert spaces when you press the Tab key to make this easier.
Try to keep the playbook as simple as possible. Only use the features that you need.

See Good Practices for Ansible.

To help you follow good practices like these, Red Hat Ansible Automation Platform 2 provides a tool, ansible-lint, that uses a set of predefined rules to look for possible issues with your playbook. Not all the issues that it reports break your playbook, but a reported issue might indicate the presence of a more serious error.

Important:

The ansible-lint command is a Technology Preview in Red Hat Ansible Automation Platform 2.2. Red Hat does not yet fully support this tool; for details, see the Knowledgebase article "What does a "Technology Preview" feature mean?".

For example, assume that you have the following playbook, site.yml:

yaml 复制代码

- name: Configure servers with Ansible tools
  hosts: all  #(1)

  tasks:
    - name: Make sure tools are installed
      package: #(2)
        name:  #(3)
          - ansible-doc
          - ansible-navigator  #(4)

Run the ansible-lint site.yml command to validate it. You might get the following output as a result:

复制代码

WARNING  Overriding detected file kind 'yaml' with 'playbook' for given positional argument: site.yml
WARNING  Listing 4 violation(s) that are fatal
yaml: trailing spaces (trailing-spaces) 
site.yml:2

fqcn-builtins: Use FQCN for builtin actions. 
site.yml:5 Task/Handler: Make sure tools are installed

yaml: trailing spaces (trailing-spaces) 
site.yml:7

yaml: too many blank lines (1 > 0) (empty-lines) 
site.yml:10

You can skip specific rules or tags by adding them to your configuration file:
# .config/ansible-lint.yml
warn_list:  # or 'skip_list' to silence them completely
  - fqcn-builtins  # Use FQCN for builtin actions.
  - yaml  # Violations reported by yamllint.

Finished with 4 failure(s), 0 warning(s) on 1 files.

This run of ansible-lint found four style issues:

1	Line 2 of the playbook (`hosts: all`) apparently has trailing white space, detected by the `yaml` rule. It is not a problem with the playbook directly, but many developers prefer not to have trailing white space in files stored in version control to avoid unnecessary differences as files are edited.
2	Line 5 of the playbook (`package:`) does not use a FQCN for the module name on that task. It should be `ansible.builtin.package:` instead. This was detected by the `fqcn-builtins` rule.
3	Line 7 of the playbook also apparently has trailing white space.
4	The playbook ends with one or more blank lines, detected by the `yaml` rule.

The ansible-lint tool uses a local configuration file, which is either the .ansible-lint or .config/ansible-lint.yml file in the current directory. You can edit this configuration file to convert rule failures to warnings (by adding them as a list to the warn_list directive) or skip the checks entirely (by adding them as a list to the skip_list directive).

If you have a syntax error in the playbook, ansible-lint reports it just like ansible-navigator run --syntax-check does.

After you correct these style issues, the ansible-lint site.yml report is as follows:

复制代码

WARNING  Overriding detected file kind 'yaml' with 'playbook' for given positional argument: site.yml

This is an advisory message that you can ignore, and the lack of other output indicates that ansible-lint did not detect any other style issues.

For more information on ansible-lint, see https://docs.ansible.com/lint.html and the ansible-lint --help command.

Important

The ansible-lint command evaluates your playbook based on the software on your workstation. It does not use the automation execution environment container that is used by ansible-navigator.

The ansible-navigator command has an experimental lint option that runs ansible-lint in your automation execution environment, but the ansible-lint tool needs to be installed inside the automation execution environment's container image for the option to work. This is currently not the case with the default execution environment. You need a custom execution environment to run ansible-navigator lint at this time.

In addition, the version of ansible-lint provided with Red Hat Ansible Automation Platform 2.2 assumes that your playbooks are using Ansible Core 2.13, which is the version currently used by the default execution environment. It does not support earlier Ansible 2.9 playbooks.

Reviewing Playbook Artifacts and Log Files

Red Hat Ansible Automation Platform can log the output of playbook runs that you make from the command line in a number of different ways.

ansible-navigator can produce playbook artifacts that store information about runs of playbooks in JSON format.
You can log information about playbook runs to a text file in a location on the system to which you can write.

Playbook Artifacts from Automation Content Navigator

The ansible-navigator command produces playbook artifact files by default each time you use it to run a playbook. These files record information about the playbook run, and can be used to review the results of the run when it completes, to troubleshoot issues, or be kept for compliance purposes.

Each playbook artifact file is named based on the name of the playbook you ran, followed by the word artifact, and then the time stamp of when the playbook was run, ending with the .json file extension.

For example, if you run the command ansible-navigator run site.yml at 20:00 UTC on July 22, 2022, the resulting file name of the artifact file could be:

复制代码

site-artifact-2022-07-22T20:00:04.019343+00:00.json

You can review the contents of these files with the ansible-navigator replay command. If you include the -m stdout option, then the output of the playbook run is printed to your terminal as if it had just run. However, if you omit that option, you can examine the results of the run interactively.

For example, you run the following playbook, site.yml, and it fails but you do not know why. You run ansible-navigator run site.yml --syntax-check and the ansible-lint command, but neither command reports any issues.

yaml 复制代码

- name: Configure servers with Ansible tools
  hosts: all

  tasks:
    - name: Make sure tools are installed
      ansible.builtin.package:
        name:
          - ansible-doc
          - ansible-navigator

To troubleshoot further, you run ansible-navigator replay in interactive mode on the resulting artifact file, which opens the following output in your terminal:

Figure 8.1: Initial replay screen

If you enter :0 to view the play, the following output is printed:

Figure 8.2: Play results by machine and task

It looks like the task Make sure tools are installed failed on both the server-1.example.com and server-2.example.com hosts. By entering :2, you can look at the failure for the server-2.example.com host:

Figure 8.3: Task results for a specific machine

The task is attempting to use the ansible.builtin.package module to install the ansible-doc package, and that package is not available in the RPM package repositories used by the server-2.example.com host, so the task failed. (You might discover that the ansible-doc command is now provided as part of the ansible-navigator RPM package as the ansible-navigator doc command, and changing the task accordingly fixes the problem.)

Another useful thing to know is that you can look at the results of a successful Gathering Facts task and the debugging output includes the values of all the facts that were gathered:

Figure 8.4: Task results for successful fact gathering

This can help you debug issues involving Ansible facts without adding a task to the play that uses the ansible.builtin.debug module to print out fact values.

You might not want to save playbook artifacts for several reasons.

You are concerned about sensitive information being saved in the log file.
You need to provide interactive input, such as a password, to ansible-navigator for some reason.
You do not want the files to clutter up the project directory.

You can keep the files from being generated by creating an ansible-navigator.yml file in the project directory that disables the playbook artifacts:

yaml 复制代码

ansible-navigator:
  playbook-artifact:
    enable: false

Logging Output to a Text File

Ansible provides a built-in logging infrastructure that can be configured through the log_path parameter in the default section of the ansible.cfg configuration file, or through the $ANSIBLE_LOG_PATH environment variable. The environment variable takes precedence over the configuration file if both are configured. If a logging path is configured, then Ansible stores output from ansible-navigator commands as text in the specified file. This mechanism also works with earlier tools such as ansible-playbook.

If you configure Ansible to write log files to the /var/log directory, then Red Hat recommends that you configure logrotate to manage them.

References

Configuring Ansible --- Ansible Documentation

ansible.builtin.debug module --- Print statements during execution --- Ansible Documentation

Tips and tricks --- Ansible Documentation

Good Practices for Ansible

Ansible Lint Documentation

Example

Using the commands to Find the errors in the playbook

yaml 复制代码

[student@workstation troubleshoot-playbook]$ ll
total 12
-rw-r--r--. 1 student student   78 Sep 17 10:44 inventory
-rw-r--r--. 1 student student  517 Sep 17 10:44 samba.conf.j2
-rw-r--r--. 1 student student 1131 Sep 17 10:44 samba.yml
[student@workstation troubleshoot-playbook]$ cat inventory 
[samba_servers]
servera.lab.example.com

[mailrelay]
servera.lab.example.com


[student@workstation troubleshoot-playbook]$ cat samba.conf.j2 
# {{ random_var }}
    [global]
      workgroup = KAMANSI
      server string = Samba Server Version %v
      log file = /var/log/samba/log.%m
      max log size = 50
      security = user
      passdb backend = tdbsam
      load printers = yes
      cups options = raw
    [homes]
      comment = Home Directories
      browseable = no
      writable = yes
    [printers]
      comment = All Printers
      path = /var/spool/samba
      browseable = no
      guest ok = no
      writable = no
      printable = yes



# look carefully on the playbook, several errors are there!!!!
[student@workstation troubleshoot-playbook]$ cat samba.yml 
---
- name: Install a samba server
  hosts: samba_servers
  user: devops
  become: true
  vars:
    install_state: installed
    random_var: This is colon: test

  tasks:
    - name: Install samba
      ansible.builtin.dnf:
        name: samba
        state: {{ install_state }}

    - name: Install firewalld
      ansible.builtin.dnf:
        name: firewalld
        state: installed

    - name: Debug install_state variable
      ansible.builtin.debug:
        msg: "The state for the samba service is {{ install_state }}"

    - name: Start firewalld
      ansible.builtin.service:
        name: firewalld
        state: started
        enabled: true

    - name: Configure firewall for samba
      ansible.posix.firewalld:
        state: enabled
        permanent: true
        immediate: true
        service: samba

     - name: Deliver samba config
       ansible.builtin.template:
         src: samba.j2
         dest: /etc/samba/smb.conf
         owner: root
         group: root
         mode: 0644

    - name: Start samba
      ansible.builtin.service:
        name: smb
        state: started
        enabled: true

Create a file named ansible.cfg in the current directory. Configure the log_path parameter to write Ansible logs to the /home/student/troubleshoot-playbook/ansible.log file. Configure the inventory parameter to use the /home/student/troubleshoot-playbook/inventory file deployed by the lab script.

The completed ansible.cfg file should contain the following:

复制代码

[defaults]
log_path = /home/student/troubleshoot-playbook/ansible.log
inventory = /home/student/troubleshoot-playbook/inventory

After fixing all the errors in the playbook:

yaml 复制代码

---
- name: Install a samba server
  hosts: samba_servers
  user: devops
  become: true
  vars:
    install_state: installed
    random_var: "This is colon: test"

  tasks:
    - name: Install samba
      ansible.builtin.dnf:
        name: samba
        state: "{{ install_state }}"

    - name: Install firewalld
      ansible.builtin.dnf:
        name: firewalld
        state: installed

    - name: Debug install_state variable
      ansible.builtin.debug:
        msg: "The state for the samba service is {{ install_state }}"

    - name: Start firewalld
      ansible.builtin.service:
        name: firewalld
        state: started
        enabled: true

    - name: Configure firewall for samba
      ansible.posix.firewalld:
        state: enabled
        permanent: true
        immediate: true
        service: samba

    - name: Deliver samba config
      ansible.builtin.template:
        src: samba.conf.j2
        dest: /etc/samba/smb.conf
        owner: root
        group: root
        mode: 0644

    - name: Start samba
      ansible.builtin.service:
        name: smb
        state: started
        enabled: true

Troubleshooting Ansible Managed Hosts

Troubleshooting Connections

Problems Authenticating to Managed Hosts

You could see similar "permission denied" errors in the following situations:

You try to connect as the wrong remote_user for your authentication credentials
You connect as the correct remote_user but the authentication credentials are missing or incorrect

For example, you might see the following output when running a playbook that is designed to connect to the remote root user account:

复制代码

[student@controlnode ~]$ ansible-navigator run \
> -m stdout playbook.yml

PLAY [Install a samba server] **************************************************

TASK [Gathering Facts] *********************************************************
fatal: [host.lab.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: developer@host: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).", "unreachable": true}

PLAY RECAP *********************************************************************
host.lab.example.com    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
Please review the log for errors.

In this case, ansible-navigator is trying to connect as the developer user account, according to the preceding output. One reason this might happen is if ansible.cfg has been configured in the project to set the remote_user to the developer user instead of the root user.

Another reason you could see a "permission denied" error like this is if you do not have the correct SSH keys set up, or did not provide the correct password for that user.

复制代码

[root@controlnode ~]# ansible-navigator run \
> -m stdout playbook.yml

TASK [Gathering Facts] *********************************************************
fatal: [host.lab.example.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: root@host: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).", "unreachable": true}

PLAY RECAP *********************************************************************
host    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

Please review the log for errors.

In the preceding example, the playbook is attempting to connect to the host machine as the root user but the SSH key for the root user on the controlnode machine has not been added to the authorized_keys file for the root user on the host machine.

Problems with Name or Address Resolution

A more subtle problem has to do with inventory settings. For a complex server with multiple network addresses, you might need to use a particular address or DNS name when connecting to that system. You might not want to use that address as the machine's inventory name for better readability. You can set a host inventory variable, ansible_host, that overrides the inventory name with a different name or IP address and be used by Ansible to connect to that host. This variable could be set in the host_vars file or directory for that host, or could be set in the inventory file itself.

For example, the following inventory entry configures Ansible to connect to 192.0.2.4 when processing the web4.phx.example.com host:

复制代码

web4.phx.example.com ansible_host=192.0.2.4

This is a useful way to control how Ansible connects to managed hosts. However, it can also cause problems if the value of ansible_host is incorrect.

Problems with Privilege Escalation

If your playbook connects as a remote_user and then uses privilege escalation to become the root user (or some other user), make sure that become is set properly, and that you are using the correct value for the become_user directive. The setting for become_user is root by default.

If the remote user needs to provide a sudo password, you should confirm that you are providing the correct sudo password, and that sudo on the managed host is configured correctly.

复制代码

[user@controlnode ~]$ ansible-navigator run \
> -m stdout playbook.yml

TASK [Gathering Facts] *********************************************************
fatal: [host]: FAILED! => {"msg": "Missing sudo password"}

PLAY RECAP *********************************************************************
host             : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Please review the log for errors.

In the preceding example, the playbook is attempting to run sudo on the host machine but it fails. The remote_user is not set up to run sudo commands without a password on the host machine. Either sudo on the host machine is not properly configured, or it is supposed to require a sudo password and you neglected to provide one when running the playbook.

Important:

Normally, ansible-navigator runs as root inside its automation execution environment. However, the root user in the container has access to SSH keys provided by the user that ran ansible-navigator on the workstation. This can be slightly confusing when you are trying to debug remote_user and become directives, especially if you are used to the earlier ansible-playbook command that runs as the user on the workstation.

Problems with Python on Managed Hosts

For normal operation, Ansible requires a Python interpreter to be installed on managed hosts running Linux. Ansible attempts to locate a Python interpreter on each Linux managed host the first time a module is run on that host.

复制代码

[user@controlnode ~]$ ansible-navigator run \
> -m stdout playbook.yml

TASK [Gathering Facts] *********************************************************
fatal: [host]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python"}, "failed": true, "module_stderr": "Shared connection to host closed.\r\n", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "warnings": ["No python interpreters found for host host (tried ['python3.10', 'python3.9', 'python3.8', 'python3.7', 'python3.6', 'python3.5', '/usr/bin/python3', '/usr/libexec/platform-python', 'python2.7', 'python2.6', '/usr/bin/python', 'python'])"]}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"}

PLAY RECAP *********************************************************************
host    : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
Please review the log for errors.

Using Check Mode as a Testing Tool

You can use the ansible-navigator run --check command to run "smoke tests" on a playbook. This option runs the playbook, connecting to the managed hosts normally but without making changes to them.

If a module used within the playbook supports "check mode", then the changes that would have been made to the managed hosts are displayed but not performed. If check mode is not supported by a module, then ansible-navigator does not display the predicted changes, but the module still takes no action.

复制代码

[student@demo ~]$ ansible-navigator run \
> -m stdout playbook.yml --check

Important:

The ansible-navigator run --check command might not work properly if your tasks use conditionals. One reason for this might be that the conditionals depend on some preceding task in the play actually running so that the condition evaluates correctly.

You can force tasks to always run in check mode or to always run normally with the check_mode setting. If a task has check_mode: true set, it always runs in its check mode and does not perform any action, even if you do not pass the --check option to ansible-navigator. Likewise, if a task has check_mode: false set, it always runs normally, even if you pass --check to ansible-navigator.

The following task always runs in check mode, and does not make changes to managed hosts.

yaml 复制代码

  tasks:
    - name: task always in check mode
      ansible.builtin.shell: uname -a
      check_mode: true

The following task always runs normally, even when started with ansible-navigator run --check.

yaml 复制代码

  tasks:
    - name: task always runs even in check mode
      ansible.builtin.shell: uname -a
      check_mode: false

This can be useful because you can run most of a playbook normally and test individual tasks with check_mode: true. Many plays use facts or set variables to conditionally run tasks. Conditional tasks might fail if a fact or variable is undefined, due to the task that collects them or sets them not executing on a managed node. You can use check_mode: false on tasks that gather facts or set variables but do not otherwise change the managed node. This enables the play to proceed further when using --check mode.

A task can determine if the playbook is running in check mode by testing the value of the magic variable ansible_check_mode. This Boolean variable is set to true if the playbook is running in check mode.

Warning:

Tasks that have check_mode: false set run even when the playbook is run with ansible-navigator run --check. Therefore, you cannot trust that the --check option makes no changes to managed hosts, without inspecting the playbook and any roles or tasks associated with it.

Note:

If you have older playbooks that use always_run: true to force tasks to run normally even in check mode, you need to replace that code with check_mode: false in Ansible 2.6 and later.

The ansible-navigator command also provides a --diff option. This option reports the changes made to the template files on managed hosts. If used with the --check option, those changes are displayed in the command's output but not actually made.

复制代码

[student@demo ~]$ ansible-navigator run \
> -m stdout playbook.yml --check --diff

Testing with Modules

Some modules can provide additional information about the status of a managed host. The following list includes some Ansible modules that can be used to test and debug issues on managed hosts.

The ansible.builtin.uri module provides a way to verify that a RESTful API is returning the required content.

yaml 复制代码

  tasks:
    - ansible.builtin.uri:
        url: http://api.myapp.example.com
        return_content: true
      register: apiresponse

    - ansible.builtin.fail:
        msg: 'version was not provided'
      when: "'version' not in apiresponse.content"

The ansible.builtin.script module runs a script on managed hosts, and fails if the return code for that script is nonzero. The script must exist in the Ansible project and is transferred to and run on the managed hosts.

复制代码

  tasks:
    - ansible.builtin.script: scripts/check_free_memory --min 2G

The ansible.builtin.stat module gathers facts for a file much like the stat command. You can use it to register a variable and then test to determine if the file exists or to get other information about the file. If the file does not exist, the ansible.builtin.stat task does not fail, but its registered variable reports false for *['stat']['exists'].

In this example, an application is still running if /var/run/app.lock exists, in which case the play should abort.

yaml 复制代码

  tasks:
    - name: Check if /var/run/app.lock exists
      ansible.builtin.stat:
        path: /var/run/app.lock
      register: lock

    - name: Fail if the application is running
      ansible.builtin.fail:
      when: lock['stat']['exists']

The ansible.builtin.assert module is an alternative to the ansible.builtin.fail module. The ansible.builtin.assert module supports a that option that takes a list of conditionals. If any of those conditionals are false, the task fails. You can use the success_msg and fail_msg options to customize the message it prints if it reports success or failure.

The following example repeats the preceding one, but uses ansible.builtin.assert instead of the ansible.builtin.fail module:

yaml 复制代码

  tasks:
    - name: Check if /var/run/app.lock exists
      ansible.builtin.stat:
        path: /var/run/app.lock
      register: lock

    - name: Fail if the application is running
      ansible.builtin.assert:
        that:
          - not lock['stat']['exists']

Running Ad Hoc Commands with Ansible

An ad hoc command is a way of executing a single Ansible task quickly, one that you do not need to save to run again later. They are simple, online operations that can be run without writing a playbook.

Ad hoc commands do not run inside an automation execution environment container. Instead, they run using Ansible software, roles, and collections installed directly on your workstation. To use ad hoc Ansible Core 2.13 commands, you need to install the ansible-core RPM package on your workstation.

Use the ansible command to run ad hoc commands:

复制代码

[user@controlnode ~]$ ansible host-pattern -m module [-a 'module arguments'] \
> [-i inventory]

The *host-pattern* argument is used to specify the managed hosts against which the ad hoc command should be run. The -i option is used to specify a different inventory location to use from the default in the current Ansible configuration file. The -m option specifies the module that Ansible should run on the targeted hosts. The -a option takes a list of arguments for the module as a quoted string.

Note:

If you use the ansible command but do not specify a module with the -m option, the ansible.builtin.command module is used by default. It is always best to specify the module you intend to use, even if you intend to use the ansible.builtin.command module.

Ansible ad hoc commands can be useful, but should be kept to troubleshooting and one-time use cases. For example, if you are aware of multiple pending network changes, it is more efficient to create a playbook with an ansible.builtin.ping task that you can run multiple times, compared to typing out a one-time use ad hoc command multiple times.

Testing Managed Hosts Using Ad Hoc Commands

复制代码

$ ansible [pattern] -m [module] -a "[module options]"

The -a option accepts options either through the key=value syntax or a JSON string starting with { and ending with } for more complex option structure. You can learn more about patterns and modules on other pages

The following examples illustrate some tests that can be made on a managed host using ad hoc commands.

You have used the ansible.builtin.ping module to test whether you can connect to managed hosts. Depending on the options that you pass, you can also use it to test whether privilege escalation and credentials are correctly configured.

复制代码

[student@demo ~]$ ansible demohost -m ansible.builtin.ping
demohost | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
[student@demo ~]$ ansible demohost -m ansible.builtin.ping --become
demohost | FAILED! => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "module_stderr": "sudo: a password is required\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}

This example returns the current available space on the disks configured on the demohost managed host. That can be useful to confirm that the file system on the managed host is not full.

复制代码

[student@demo ~]$ ansible demohost -m ansible.builtin.command -a 'df'

This example returns the current available free memory on the demohost managed host.

复制代码

[student@demo ~]$ ansible demohost -m ansible.builtin.command -a 'free -m'

yaml 复制代码

# Rebooting servers
$ ansible atlanta -a "/sbin/reboot"
# Rebooting probably requires privilege escalation. You can connect to the server as username and run the command as the root user by using the become keyword:
$ ansible atlanta -a "/sbin/reboot" -f 10 -u username --become [--ask-become-pass]

# Managing files
$ ansible webservers -m ansible.builtin.file -a "dest=/srv/foo/b.txt mode=600 owner=mdehaan group=mdehaan"
# create directory
$ ansible webservers -m ansible.builtin.file -a "dest=/path/to/c mode=755 owner=mdehaan group=mdehaan state=directory"
# delete files
$ ansible webservers -m ansible.builtin.file -a "dest=/path/to/c state=absent"

#Managing packages
$ ansible webservers -m ansible.builtin.yum -a "name=acme-1.5 state=present/latest/absent"

#Managing users and groups
$ ansible all -m ansible.builtin.user -a "name=foo password=<encrypted password here>"
$ ansible all -m ansible.builtin.user -a "name=foo state=absent"

# Managing services
$ ansible webservers -m ansible.builtin.service -a "name=httpd state=started/restarted/stopped"

# Gathering facts
$ ansible all -m ansible.builtin.setup

# Check mode
$  ansible all -m copy -a "content=foo dest=/root/bar.txt" -C
# Enabling check mode (-C or --check) in the above command means Ansible does not actually create or update the /root/bar.txt file on any remote systems

# remote at webservers running 'systemctl status httpd'with user devops and use -b（or --become）to get root privelege
[student@workstation troubleshoot-review]$ ansible webservers -u devops -b \
> -m command -a 'systemctl status httpd'

References

Check Mode ("Dry Run") --- Ansible Documentation

Testing Strategies --- Ansible Documentation

Example

Run the samba.yml playbook. The first task fails with an error related to an SSH connection problem.

yaml 复制代码

# samba.yml
---
- name: Install a samba server
  hosts: samba_servers
  user: devops
  become: true
  vars:
    install_state: installed
    random_var: "This is colon: test"

  tasks:
    - name: Install samba
      ansible.builtin.dnf:
        name: samba
        state: "{{ install_state }}"

    - name: Install firewalld
      ansible.builtin.dnf:
        name: firewalld
        state: installed

    - name: Debug install_state variable
      ansible.builtin.debug:
        msg: "The state for the samba service is {{ install_state }}"

    - name: Start firewalld
      ansible.builtin.service:
        name: firewalld
        state: started
        enabled: true

    - name: Configure firewall for samba
      ansible.posix.firewalld:
        state: enabled
        permanent: true
        immediate: true
        service: samba

    - name: Deliver samba config
      ansible.builtin.template:
        src: samba.conf.j2
        dest: /etc/samba/smb.conf
        owner: root
        group: root
        mode: 0644

    - name: Start samba
      ansible.builtin.service:
        name: smb
        state: started
        enabled: true

复制代码

[student@workstation troubleshoot-host]$ ansible-navigator run \
> -m stdout samba.yml

PLAY [Install a samba server] **************************************************

TASK [Gathering Facts] *********************************************************
fatal: [servera.lab.exammple.com]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host servera.lab.exammple.com port 22: Connection timed out", "unreachable": true}

PLAY RECAP *********************************************************************
servera.lab.exammple.com   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
Please review the log for errors.

Make sure that you can connect to the servera.lab.example.com managed host as the devops user using SSH, and that the correct SSH keys are in place. Log off again when you have finished.
复制代码
```
[student@workstation troubleshoot-host]$ ssh devops@servera.lab.example.com
...output omitted...
[devops@servera ~]$ exit
logout
Connection to servera.lab.example.com closed.
```
That is working normally.
Test to see if you can run modules on the servera.lab.example.com managed host by using an ad hoc command that runs the ansible.builtin.ping module.
复制代码
```
[student@workstation troubleshoot-host]$ ansible servera.lab.example.com \
> -m ansible.builtin.ping
servera.lab.example.com | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
```
Based on the preceding output, that is also working, and successfully connected to the managed host.

This should suggest to you that the problem is not with the SSH configuration and credentials, or with the ad hoc command that you used. So the question now is why the ad hoc command worked and the ansible-navigator command did not. There might be a problem with the play in the playbook, or with the inventory.

Rerun the samba.yml playbook with -vvvv to get more information about the run. An error is issued because the servera.lab.example.com managed host is not reachable.

复制代码

[student@workstation troubleshoot-host]$ ansible-navigator run \
> -m stdout -vvvv samba.yml
ansible-playbook [core 2.13.0]
...output omitted...

PLAYBOOK: samba.yml ************************************************************
Positional arguments: /home/student/troubleshoot-host/samba.yml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('/home/student/troubleshoot-host/inventory',)
forks: 5
1 plays in /home/student/troubleshoot-host/samba.yml

PLAY [Install a samba server] **************************************************

TASK [Gathering Facts] *********************************************************
task path: /home/student/troubleshoot-host/samba.yml:2
<servera.lab.exammple.com> ESTABLISH SSH CONNECTION FOR USER: devops
...output omitted...
fatal: [servera.lab.exammple.com]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/runner/.ssh/config\r\ndebug1: /home/runner/.ssh/config line 1: Applying options for *\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: 
....omitted.....
debug1: Connecting to servera.lab.exammple.com [3.130.253.23] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.253.23 port 22: Connection timed out\r\ndebug1: Connecting to servera.lab.exammple.com [3.130.204.160] port 22.\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug1: connect to address 3.130.204.160 port 22: Connection timed out\r\nssh: connect to host servera.lab.exammple.com port 22: Connection timed out",
    "unreachable": true
}

PLAY RECAP *********************************************************************
servera.lab.exammple.com   : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
Please review the log for errors.

Investigate the inventory file for errors.

If you look at the [samba_servers] group, servera.lab.example.com is misspelled (with an extra m). Correct this error as shown below:

复制代码

[student@workstation troubleshoot-host]$ cat inventory 
[samba_servers]
servera.lab.exammple.com   # bad here

[mailrelay]
servera.lab.example.com

# ==> changed to
[samba_servers]
servera.lab.example.com
...output omitted...

Run the playbook again and all tasks should succeed.

Chapter 8 Example

Instructions

In the /home/student/troubleshoot-review directory, there is a playbook named secure-web.yml. This playbook contains one play that is supposed to set up Apache HTTPD with TLS/SSL for hosts in the webservers group. The serverb.lab.example.com node is supposed to be the only host in the webservers group right now. Ansible can connect to that host using the remote devops account and SSH keys that have already been set up. That user can also become root on the managed host without a sudo password.

Unfortunately, several problems exist that you need to fix before you can run the playbook successfully.

yaml 复制代码

[student@workstation troubleshoot-review]$ ll
total 20
-rw-r--r--. 1 student student   33 Sep 20 10:15 ansible.cfg
-rw-r--r--. 1 student student   21 Sep 20 10:15 index.html
-rw-r--r--. 1 student student   74 Sep 20 10:15 inventory
-rw-r--r--. 1 student student 2674 Sep 20 10:15 secure-web.yml
-rw-r--r--. 1 student student  604 Sep 20 10:15 vhosts.conf

[student@workstation troubleshoot-review]$ cat ansible.cfg 
[defaults]
inventory = inventory

[student@workstation troubleshoot-review]$ cat index.html 
This is a test page.

[student@workstation troubleshoot-review]$ cat inventory 
[webservers]
serverb.lab.example.com ansible_host=serverc.lab.example.com

[student@workstation troubleshoot-review]$ cat secure-web.yml 
---
# start of secure web server playbook
- name: Create secure web service
  hosts: webservers
  remote_user: students
  vars:
    random_var: This is colon: test
    rule:
      - http
      - https

  tasks:
    - block:
        - name: Install web server packages
          ansible.builtin.dnf:
            name: {{ item }}
            state: latest
          notify:
            - Restart services
          loop:
            - httpd
            - mod_ssl

        - name: Install httpd config files
          ansible.builtin.copy:
            src: vhosts.conf
            dest: /etc/httpd/conf.d/vhosts.conf
            backup: true
            owner: root
            group: root
            mode: 0644
          register: vhosts_config
          notify:
            - Restart services

        - name: Create ssl certificate
          ansible.builtin.command: openssl req -new -nodes -x509 -subj "/C=US/ST=North Carolina/L=Raleigh/O=Example Inc/CN=serverb.lab.example.com" -days 120 -keyout /etc/pki/tls/private/serverb.lab.example.com.key -out /etc/pki/tls/certs/serverb.lab.example.com.crt -extensions v3_ca
          args:
            creates: /etc/pki/tls/certs/serverb.lab.example.com.crt

         - name: Start and enable web services
           ansible.builtin.service:
             name: httpd
             state: started
             enabled: true

        - name: Open ports for http and https
          ansible.posix.firewalld:
            service: "{{ item }}"
            immediate: true
            permanent: true
            state: enabled
          loop: "{{ rule }}"

        - name: Deliver content
          ansible.builtin.copy:
            dest: /var/www/vhosts/serverb-secure/
            src: index.html

        - name: Check httpd syntax
          ansible.builtin.command: /sbin/httpd -t
          register: httpd_conf_syntax
          failed_when: "'Syntax OK' not in httpd_conf_syntax.stderr"

        - name: Httpd_conf_syntax variable
          ansible.builtin.debug:
            msg: "The httpd_conf_syntax variable value is {{ httpd_conf_syntax }}"

        - name: Check httpd status
          ansible.builtin.command: systemctl is-active httpd
          register: httpd_status
          changed_when: httpd_status.rc != 0
          notify:
            - Restart services

      rescue:
        - name: Recover original httpd config
          ansible.builtin.file:
            path: /etc/httpd/conf.d/vhosts.conf
            state: absent
          notify:
            - Restart services

  handlers:
    - name: Restart services
      ansible.builtin.service:
        name: httpd
        state: restarted

# end of secure web play
[student@workstation troubleshoot-review]$ 
[student@workstation troubleshoot-review]$ cat vhosts.conf 
<VirtualHost serverb.lab.example.com>
    ServerAdmin webmaster@foob.example.com
    ServerName serverb.lab.example.com
    ErrorLog logs/serverb-ssl.error.log
    CustomLog logs/serverb-secure.common.log common
    DocumentRoot /var/www/vhosts/serverb-secure/

    SSLEngine On
    SSLCertificateFile /etc/pki/tls/certs/serverb.lab.example.com.crt
    SSLCertificateKeyFile /etc/pki/tls/private/serverb.lab.example.com.key

    <Directory /var/www/vhosts/serverb-secure>
        Options +Indexes +followsymlinks +includes
        Order allow,deny
        allow from all
    </Directory>
</VirtualHost>

[student@workstation troubleshoot-review]$

many errors will show, after correcting all errors:

the yml would be:

yaml 复制代码

[student@workstation troubleshoot-review]$ cat inventory 
[webservers]
serverb.lab.example.com 


[student@workstation troubleshoot-review]$ cat secure-web.yml 
---
# start of secure web server playbook
- name: Create secure web service
  hosts: webservers
  remote_user: devops
  become: true
  vars:
    random_var: "This is colon: test"
    rule:
      - http
      - https

  tasks:
    - block:
        - name: Install web server packages
          ansible.builtin.dnf:
            name: "{{ item }}"
            state: latest
          notify:
            - Restart services
          loop:
            - httpd
            - mod_ssl

        - name: Install httpd config files
          ansible.builtin.copy:
            src: vhosts.conf
            dest: /etc/httpd/conf.d/vhosts.conf
            backup: true
            owner: root
            group: root
            mode: 0644
          register: vhosts_config
          notify:
            - Restart services

        - name: Create ssl certificate
          ansible.builtin.command: openssl req -new -nodes -x509 -subj "/C=US/ST=North Carolina/L=Raleigh/O=Example Inc/CN=serverb.lab.example.com" -days 120 -keyout /etc/pki/tls/private/serverb.lab.example.com.key -out /etc/pki/tls/certs/serverb.lab.example.com.crt -extensions v3_ca
          args:
            creates: /etc/pki/tls/certs/serverb.lab.example.com.crt

        - name: Start and enable web services
          ansible.builtin.service:
            name: httpd
            state: started
            enabled: true

        - name: Open ports for http and https
          ansible.posix.firewalld:
            service: "{{ item }}"
            immediate: true
            permanent: true
            state: enabled
          loop: "{{ rule }}"

        - name: Deliver content
          ansible.builtin.copy:
            dest: /var/www/vhosts/serverb-secure/
            src: index.html

        - name: Check httpd syntax
          ansible.builtin.command: /sbin/httpd -t
          register: httpd_conf_syntax
          failed_when: "'Syntax OK' not in httpd_conf_syntax.stderr"

        - name: Httpd_conf_syntax variable
          ansible.builtin.debug:
            msg: "The httpd_conf_syntax variable value is {{ httpd_conf_syntax }}"

        - name: Check httpd status
          ansible.builtin.command: systemctl is-active httpd
          register: httpd_status
          changed_when: httpd_status.rc != 0
          notify:
            - Restart services

      rescue:
        - name: Recover original httpd config
          ansible.builtin.file:
            path: /etc/httpd/conf.d/vhosts.conf
            state: absent
          notify:
            - Restart services

  handlers:
    - name: Restart services
      ansible.builtin.service:
        name: httpd
        state: restarted

# end of secure web play

TO BE CONTINUED...

Chapter 8 RH294 RHEL Automation with Ansible

ONLY FOR SELF STUDY, NO COMMERCIAL USAGE!!!

Contents

Chapter 8. Troubleshooting Ansible

Troubleshooting Playbooks

Debugging Playbooks

Examining Values of Variables with the Debug Module

Reviewing Playbooks for Errors

Checking Playbook Syntax for Problems

Checking a Given Task in a Playbook

Checking Playbooks for Issues and Following Good Practices

Reviewing Playbook Artifacts and Log Files

Playbook Artifacts from Automation Content Navigator

Logging Output to a Text File

References

Example

Troubleshooting Ansible Managed Hosts

Troubleshooting Connections

Problems Authenticating to Managed Hosts

Problems with Name or Address Resolution

Problems with Privilege Escalation

Problems with Python on Managed Hosts

Using Check Mode as a Testing Tool

Testing with Modules

Running Ad Hoc Commands with Ansible

Testing Managed Hosts Using Ad Hoc Commands

References

Example

Chapter 8 Example