Ansible自动化运维实战:从入门到生产级应用

Ansible自动化运维实战:从入门到生产级应用

"能自动化的工作,就不要手动做。"这是我从事运维工作多年最深刻的体会之一。而提到自动化运维工具,Ansible无疑是我的首选。简洁的YAML语法、无需代理的架构、丰富的模块支持,让它成为运维工程师的得力助手。今天想分享一些Ansible的实战经验,希望能帮助大家提升运维效率。

一、Ansible核心概念

1.1 Inventory清单文件

Inventory定义了要管理的主机及其分组:

ini 复制代码
# inventory/hosts.ini
[webservers]
web01.example.com ansible_host=192.168.1.10
web02.example.com ansible_host=192.168.1.11
web03.example.com ansible_host=192.168.1.12

[dbservers]
db01.example.com ansible_host=192.168.2.10
db02.example.com ansible_host=192.168.2.11

[production:children]
webservers
dbservers

[production:vars]
ansible_user=admin
ansible_port=22

1.2 Playbook剧本

Playbook是Ansible的任务编排文件:

yaml 复制代码
# site.yml
---
- name: Deploy Web Application
  hosts: webservers
  become: yes
  vars:
    app_version: "2.0.0"
  tasks:
    - name: Ensure nginx is installed
      apt:
        name: nginx
        state: present
        update_cache: yes

    - name: Deploy application
      copy:
        src: "app/{{ app_version }}/"
        dest: /var/www/myapp
        owner: www-data
        group: www-data
        mode: '0755'

1.3 Roles角色

Roles用于组织可复用的Playbook组件:

复制代码
roles/
├── common/
│   ├── defaults/
│   │   └── main.yml
│   ├── tasks/
│   │   └── main.yml
│   ├── handlers/
│   │   └── main.yml
│   └── templates/
│       └── motd.j2
├── nginx/
│   ├── defaults/
│   ├── tasks/
│   ├── handlers/
│   ├── templates/
│   └── vars/
└── mysql/
    ├── defaults/
    ├── tasks/
    ├── handlers/
    └── vars/

二、基础任务实战

2.1 软件包管理

yaml 复制代码
# 安装软件包
- name: Install required packages
  apt:
    name:
      - nginx
      - git
      - vim
      - curl
    state: present
    update_cache: yes

# 移除软件包
- name: Remove unwanted packages
  apt:
    name: ufw
    state: absent

# 使用pip安装Python包
- name: Install Python packages
  pip:
    name:
      - docker
      - docker-compose
    state: present

2.2 用户和组管理

yaml 复制代码
# 创建用户
- name: Create deployment user
  user:
    name: deploy
    comment: "Deployment User"
    shell: /bin/bash
    groups: sudo
    password: "{{ 'SecurePassword123!' | password_hash('sha512') }}"
    ssh_key_file: .ssh/id_rsa
    generate_ssh_key: yes
    ssh_key_comment: "deploy@ansible"

# 创建系统用户(无登录权限)
- name: Create service account
  user:
    name: myapp
    system: yes
    create_home: no
    shell: /usr/sbin/nologin

2.3 文件和目录管理

yaml 复制代码
# 创建目录
- name: Create application directories
  file:
    path: "{{ item }}"
    state: directory
    owner: www-data
    group: www-data
    mode: '0755'
  loop:
    - /var/www/myapp
    - /var/log/myapp
    - /var/run/myapp

# 复制文件
- name: Copy configuration file
  copy:
    src: nginx.conf
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
    backup: yes
    validate: nginx -t -c %s

# 修改文件权限
- name: Set permissions on private keys
  file:
    path: "{{ item }}"
    mode: '0600'
    owner: root
    group: root
  loop:
    - /root/.ssh/id_rsa
    - /root/.ssh/id_rsa.pub

2.4 服务管理

yaml 复制代码
# 启动和启用服务
- name: Ensure nginx is running
  service:
    name: nginx
    state: started
    enabled: yes

# 重启服务(触发handlers)
- name: Restart nginx after config change
  service:
    name: nginx
    state: restarted

# 使用systemd
- name: Reload systemd daemon
  systemd:
    daemon_reload: yes

三、Playbook高级特性

3.1 Handlers触发机制

Handlers用于在任务完成后执行特定操作(如重启服务):

yaml 复制代码
# handlers/main.yml
- name: Restart nginx
  service:
    name: nginx
    state: restarted

- name: Reload nginx
  service:
    name: nginx
    state: reloaded

# tasks/main.yml
- name: Copy nginx config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify:
    - Restart nginx

- name: Check nginx config
  command: nginx -t
  notify:
    - Reload nginx

3.2 条件执行

yaml 复制代码
# 只在特定条件下执行
- name: Install PostgreSQL on database servers
  apt:
    name: postgresql
    state: present
  when: "'dbservers' in group_names"

# 多条件判断
- name: Install monitoring tools
  apt:
    name: prometheus
    state: present
  when:
    - ansible_distribution == "Ubuntu"
    - ansible_distribution_version == "22.04"

# 基于变量值判断
- name: Enable maintenance mode
  command: /usr/local/bin/enable-maintenance.sh
  when: maintenance_mode | default(false) | bool

3.3 循环迭代

yaml 复制代码
# 使用loop创建多个用户
- name: Create application users
  user:
    name: "{{ item.username }}"
    comment: "{{ item.fullname }}"
    shell: /bin/bash
    groups: "{{ item.groups }}"
  loop:
    - { username: 'alice', fullname: 'Alice Wang', groups: ['www-data', 'sudo'] }
    - { username: 'bob', fullname: 'Bob Li', groups: ['www-data'] }
    - { username: 'charlie', fullname: 'Charlie Chen', groups: ['www-data'] }

# 使用register保存任务输出
- name: Get container list
  command: docker ps -a --format "{{ '{{' }}.Names{{ '}}' }}"
  register: docker_containers

- name: Stop all containers
  service:
    name: "{{ item }}"
    state: stopped
  loop: "{{ docker_containers.stdout_lines }}"

3.4 错误处理

yaml 复制代码
# 忽略任务失败
- name: Ignore failure
  command: /opt/scripts/deprecated.sh
  ignore_errors: yes

# 强制执行后续任务
- name: Always report success
  block:
    - name: Run diagnostic script
      command: /opt/scripts/diagnostics.sh
  rescue:
    - name: Report failure
      debug:
        msg: "Task failed but continuing"

# 强制失败任务
- name: Fail if required variable not set
  fail:
    msg: "app_version must be defined"
  when: app_version is not defined

四、生产环境实践

4.1 配置管理

yaml 复制代码
# roles/common/tasks/main.yml
- name: Update sysctl parameters
  sysctl:
    name: "{{ item.name }}"
    value: "{{ item.value }}"
    state: present
    reload: yes
    sysctl_file: /etc/sysctl.d/99-custom.conf
  loop:
    - { name: 'net.ipv4.tcp_fin_timeout', value: '30' }
    - { name: 'net.core.somaxconn', value: '65535' }
    - { name: 'vm.swappiness', value: '10' }

4.2 部署Playbook

yaml 复制代码
# deploy.yml
---
- name: Deploy Application to Production
  hosts: webservers
  become: yes
  vars_files:
    - vars/secrets.yml.enc
  environment:
    PATH: /usr/local/bin:{{ ansible_env.PATH }}

  pre_tasks:
    - name: Verify deployment prerequisites
      assert:
        that:
          - app_version is defined
          - deploy_key is defined

    - name: Create backup of current version
      archive:
        path: /var/www/myapp
        dest: /tmp/myapp-backup-{{ ansible_date_time.epoch }}.tar.gz
        format: gz

  tasks:
    - name: Pull new application version
      git:
        repo: "{{ git_repo }}"
        version: "{{ app_version }}"
        dest: /var/www/myapp
        force: yes
        accept_hostkey: yes
      notify: Restart application

    - name: Run database migrations
      command: /var/www/myapp/scripts/migrate.sh
      args:
        creates: /var/www/myapp/.migrated
      environment:
        DATABASE_URL: "{{ db_url }}"

    - name: Verify deployment
      uri:
        url: "http://{{ ansible_host }}/health"
        status_code: 200
      register: health_check
      retries: 5
      delay: 10
      until: health_check.status == 200

  handlers:
    - name: Restart application
      systemd:
        name: myapp
        state: restarted

  post_tasks:
    - name: Cleanup old backups
      find:
        paths: /tmp
        patterns: myapp-backup-*.tar.gz
        age: 7d
      register: old_backups

    - name: Remove old backups
      file:
        path: "{{ item.path }}"
        state: absent
      loop: "{{ old_backups.files }}"

4.3 集群初始化

yaml 复制代码
# cluster-setup.yml
---
- name: Initialize Kubernetes Cluster Nodes
  hosts: k8s_masters
  become: yes
  vars:
    k8s_version: "1.28.0"
    pod_network_cidr: "10.244.0.0/16"

  tasks:
    - name: Disable swap
      shell: |
        swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/' /etc/fstab
      args:
        executable: /bin/bash

    - name: Load required kernel modules
      modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: Configure kernel parameters
      sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
      loop:
        - { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
        - { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
        - { name: 'net.ipv4.ip_forward', value: '1' }

    - name: Install container runtime
      include_tasks: tasks/install-containerd.yml

    - name: Install kubeadm
      include_tasks: tasks/install-kubeadm.yml

五、Ansible Vault安全

5.1 加密敏感数据

bash 复制代码
# 创建加密文件
ansible-vault create vars/secrets.yml

# 编辑加密文件
ansible-vault edit vars/secrets.yml

# 加密现有文件
ansible-vault encrypt vars/secrets.yml

# 解密文件
ansible-vault decrypt vars/secrets.yml

# 查看加密文件
ansible-vault view vars/secrets.yml

5.2 使用加密变量

yaml 复制代码
# playbook.yml
- name: Deploy application
  hosts: webservers
  vars_files:
    - vars/secrets.yml
  tasks:
    - name: Create database credentials
      copy:
        content: |
          DATABASE_USER={{ db_user }}
          DATABASE_PASSWORD={{ db_password }}
        dest: /etc/myapp/.env
        mode: '0600'

5.3 Vault ID

bash 复制代码
# 使用多个vault ID
ansible-vault create --vault-id prod@prompt vars/production.yml
ansible-vault create --vault-id dev@prompt vars/development.yml

# 运行playbook时指定vault ID
ansible-playbook site.yml --vault-id prod@prompt

六、最佳实践

6.1 目录结构规范

复制代码
ansible/
├── inventory/
│   ├── production/
│   │   ├── hosts.ini
│   │   └── group_vars/
│   │       └── all.yml
│   └── staging/
│       └── hosts.ini
├── library/
├── lookup_plugins/
├── filter_plugins/
├── roles/
│   ├── common/
│   ├── nginx/
│   ├── mysql/
│   └── myapp/
├── playbooks/
│   ├── site.yml
│   ├── database.yml
│   └── monitoring.yml
├── vars/
│   └── secrets.yml
└── ansible.cfg

6.2 ansible.cfg配置

ini 复制代码
[defaults]
inventory = inventory/production/hosts.ini
roles_path = roles
host_key_checking = False
timeout = 30
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
callbacks_enabled = profile_tasks, timer
display_skipped_hosts = False
interpreter_python = auto_silent

[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

6.3 测试驱动开发

bash 复制代码
# 语法检查
ansible-playbook site.yml --syntax-check

# 列出任务
ansible-playbook site.yml --list-tasks

# 列出主机
ansible-playbook site.yml --list-hosts

# 模拟执行(check模式)
ansible-playbook site.yml --check

# 模拟执行并显示差异
ansible-playbook site.yml --check --diff

# 限速执行
ansible-playbook site.yml --diff --limit web01.example.com

七、性能优化

7.1 异步执行

yaml 复制代码
# 长时间运行的任务使用异步
- name: Run long task
  command: /opt/scripts/long-running-task.sh
  async: 3600
  poll: 0
  register: long_task

- name: Check async status
  async_status:
    jid: "{{ long_task.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 100
  delay: 30

7.2 批量执行策略

yaml 复制代码
# 批量更新,避免同时重启大量服务
- name: Rolling update
  hosts: webservers
  serial: 2
  tasks:
    - name: Update application
      include_tasks: tasks/update-app.yml

    - name: Verify health
      include_tasks: tasks/verify-health.yml

结语

Ansible是运维自动化的利器,但工具终究只是工具,最重要的是背后的运维思维和流程设计。在使用Ansible的过程中,我深刻体会到"Infrastructure as Code"的意义------将基础设施的配置和管理代码化、版本化、可测试化。

希望这篇文章能帮助你更好地使用Ansible进行自动化运维。如果有任何问题,欢迎在评论区交流。

本文作者:侯万里(万里侯),致力于推动运维自动化的老兵

相关推荐
Cat_Rocky1 小时前
k8s zabbix7学习-设置告警
学习·容器·kubernetes
MY_TEUCK2 小时前
【Java 后端 | 微服务远程调用实战】Nacos + OpenFeign 从入门到公共模块抽取
java·开发语言·微服务
woniu_buhui_fei2 小时前
单体服务拆分微服务
微服务·架构
容器魔方2 小时前
华为云云容器引擎CCE 2026-Q1优化升级,全面进化您的云原生体验!
大数据·分布式·云原生·容器·云计算
海市公约2 小时前
微服务Token认证从登录到鉴权的完整执行链路
微服务·中间件·权限控制·token认证·分布式安全
云游牧者3 小时前
K8S-Ingress流量治理全解-Traefik从入门到实战完全指南
云原生·中间件·容器·kubernetes·ingress·traefik
万里侯3 小时前
Kubernetes网络性能优化:提升集群网络效率
微服务·容器·k8s
万里侯3 小时前
技术人的人际关系:建立良好的职业网络
微服务·容器·k8s
AI云原生3 小时前
容器网络模型与服务发现:从踩坑到精通,Kubernetes 网络问题排查全指南
服务器·网络·云原生·容器·kubernetes·云计算·服务发现