Ansible自动化运维实战:从入门到生产级应用
"能自动化的工作,就不要手动做。"这是我从事运维工作多年最深刻的体会之一。而提到自动化运维工具,Ansible无疑是我的首选。简洁的YAML语法、无需代理的架构、丰富的模块支持,让它成为运维工程师的得力助手。今天想分享一些Ansible的实战经验,希望能帮助大家提升运维效率。
一、Ansible核心概念
1.1 Inventory清单文件
Inventory定义了要管理的主机及其分组:
ini
# inventory/hosts.ini
[webservers]
web01.example.com ansible_host=192.168.1.10
web02.example.com ansible_host=192.168.1.11
web03.example.com ansible_host=192.168.1.12
[dbservers]
db01.example.com ansible_host=192.168.2.10
db02.example.com ansible_host=192.168.2.11
[production:children]
webservers
dbservers
[production:vars]
ansible_user=admin
ansible_port=22
1.2 Playbook剧本
Playbook是Ansible的任务编排文件:
yaml
# site.yml
---
- name: Deploy Web Application
hosts: webservers
become: yes
vars:
app_version: "2.0.0"
tasks:
- name: Ensure nginx is installed
apt:
name: nginx
state: present
update_cache: yes
- name: Deploy application
copy:
src: "app/{{ app_version }}/"
dest: /var/www/myapp
owner: www-data
group: www-data
mode: '0755'
1.3 Roles角色
Roles用于组织可复用的Playbook组件:
roles/
├── common/
│ ├── defaults/
│ │ └── main.yml
│ ├── tasks/
│ │ └── main.yml
│ ├── handlers/
│ │ └── main.yml
│ └── templates/
│ └── motd.j2
├── nginx/
│ ├── defaults/
│ ├── tasks/
│ ├── handlers/
│ ├── templates/
│ └── vars/
└── mysql/
├── defaults/
├── tasks/
├── handlers/
└── vars/
二、基础任务实战
2.1 软件包管理
yaml
# 安装软件包
- name: Install required packages
apt:
name:
- nginx
- git
- vim
- curl
state: present
update_cache: yes
# 移除软件包
- name: Remove unwanted packages
apt:
name: ufw
state: absent
# 使用pip安装Python包
- name: Install Python packages
pip:
name:
- docker
- docker-compose
state: present
2.2 用户和组管理
yaml
# 创建用户
- name: Create deployment user
user:
name: deploy
comment: "Deployment User"
shell: /bin/bash
groups: sudo
password: "{{ 'SecurePassword123!' | password_hash('sha512') }}"
ssh_key_file: .ssh/id_rsa
generate_ssh_key: yes
ssh_key_comment: "deploy@ansible"
# 创建系统用户(无登录权限)
- name: Create service account
user:
name: myapp
system: yes
create_home: no
shell: /usr/sbin/nologin
2.3 文件和目录管理
yaml
# 创建目录
- name: Create application directories
file:
path: "{{ item }}"
state: directory
owner: www-data
group: www-data
mode: '0755'
loop:
- /var/www/myapp
- /var/log/myapp
- /var/run/myapp
# 复制文件
- name: Copy configuration file
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
backup: yes
validate: nginx -t -c %s
# 修改文件权限
- name: Set permissions on private keys
file:
path: "{{ item }}"
mode: '0600'
owner: root
group: root
loop:
- /root/.ssh/id_rsa
- /root/.ssh/id_rsa.pub
2.4 服务管理
yaml
# 启动和启用服务
- name: Ensure nginx is running
service:
name: nginx
state: started
enabled: yes
# 重启服务(触发handlers)
- name: Restart nginx after config change
service:
name: nginx
state: restarted
# 使用systemd
- name: Reload systemd daemon
systemd:
daemon_reload: yes
三、Playbook高级特性
3.1 Handlers触发机制
Handlers用于在任务完成后执行特定操作(如重启服务):
yaml
# handlers/main.yml
- name: Restart nginx
service:
name: nginx
state: restarted
- name: Reload nginx
service:
name: nginx
state: reloaded
# tasks/main.yml
- name: Copy nginx config
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify:
- Restart nginx
- name: Check nginx config
command: nginx -t
notify:
- Reload nginx
3.2 条件执行
yaml
# 只在特定条件下执行
- name: Install PostgreSQL on database servers
apt:
name: postgresql
state: present
when: "'dbservers' in group_names"
# 多条件判断
- name: Install monitoring tools
apt:
name: prometheus
state: present
when:
- ansible_distribution == "Ubuntu"
- ansible_distribution_version == "22.04"
# 基于变量值判断
- name: Enable maintenance mode
command: /usr/local/bin/enable-maintenance.sh
when: maintenance_mode | default(false) | bool
3.3 循环迭代
yaml
# 使用loop创建多个用户
- name: Create application users
user:
name: "{{ item.username }}"
comment: "{{ item.fullname }}"
shell: /bin/bash
groups: "{{ item.groups }}"
loop:
- { username: 'alice', fullname: 'Alice Wang', groups: ['www-data', 'sudo'] }
- { username: 'bob', fullname: 'Bob Li', groups: ['www-data'] }
- { username: 'charlie', fullname: 'Charlie Chen', groups: ['www-data'] }
# 使用register保存任务输出
- name: Get container list
command: docker ps -a --format "{{ '{{' }}.Names{{ '}}' }}"
register: docker_containers
- name: Stop all containers
service:
name: "{{ item }}"
state: stopped
loop: "{{ docker_containers.stdout_lines }}"
3.4 错误处理
yaml
# 忽略任务失败
- name: Ignore failure
command: /opt/scripts/deprecated.sh
ignore_errors: yes
# 强制执行后续任务
- name: Always report success
block:
- name: Run diagnostic script
command: /opt/scripts/diagnostics.sh
rescue:
- name: Report failure
debug:
msg: "Task failed but continuing"
# 强制失败任务
- name: Fail if required variable not set
fail:
msg: "app_version must be defined"
when: app_version is not defined
四、生产环境实践
4.1 配置管理
yaml
# roles/common/tasks/main.yml
- name: Update sysctl parameters
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_file: /etc/sysctl.d/99-custom.conf
loop:
- { name: 'net.ipv4.tcp_fin_timeout', value: '30' }
- { name: 'net.core.somaxconn', value: '65535' }
- { name: 'vm.swappiness', value: '10' }
4.2 部署Playbook
yaml
# deploy.yml
---
- name: Deploy Application to Production
hosts: webservers
become: yes
vars_files:
- vars/secrets.yml.enc
environment:
PATH: /usr/local/bin:{{ ansible_env.PATH }}
pre_tasks:
- name: Verify deployment prerequisites
assert:
that:
- app_version is defined
- deploy_key is defined
- name: Create backup of current version
archive:
path: /var/www/myapp
dest: /tmp/myapp-backup-{{ ansible_date_time.epoch }}.tar.gz
format: gz
tasks:
- name: Pull new application version
git:
repo: "{{ git_repo }}"
version: "{{ app_version }}"
dest: /var/www/myapp
force: yes
accept_hostkey: yes
notify: Restart application
- name: Run database migrations
command: /var/www/myapp/scripts/migrate.sh
args:
creates: /var/www/myapp/.migrated
environment:
DATABASE_URL: "{{ db_url }}"
- name: Verify deployment
uri:
url: "http://{{ ansible_host }}/health"
status_code: 200
register: health_check
retries: 5
delay: 10
until: health_check.status == 200
handlers:
- name: Restart application
systemd:
name: myapp
state: restarted
post_tasks:
- name: Cleanup old backups
find:
paths: /tmp
patterns: myapp-backup-*.tar.gz
age: 7d
register: old_backups
- name: Remove old backups
file:
path: "{{ item.path }}"
state: absent
loop: "{{ old_backups.files }}"
4.3 集群初始化
yaml
# cluster-setup.yml
---
- name: Initialize Kubernetes Cluster Nodes
hosts: k8s_masters
become: yes
vars:
k8s_version: "1.28.0"
pod_network_cidr: "10.244.0.0/16"
tasks:
- name: Disable swap
shell: |
swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/' /etc/fstab
args:
executable: /bin/bash
- name: Load required kernel modules
modprobe:
name: "{{ item }}"
state: present
loop:
- overlay
- br_netfilter
- name: Configure kernel parameters
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
loop:
- { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
- { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
- { name: 'net.ipv4.ip_forward', value: '1' }
- name: Install container runtime
include_tasks: tasks/install-containerd.yml
- name: Install kubeadm
include_tasks: tasks/install-kubeadm.yml
五、Ansible Vault安全
5.1 加密敏感数据
bash
# 创建加密文件
ansible-vault create vars/secrets.yml
# 编辑加密文件
ansible-vault edit vars/secrets.yml
# 加密现有文件
ansible-vault encrypt vars/secrets.yml
# 解密文件
ansible-vault decrypt vars/secrets.yml
# 查看加密文件
ansible-vault view vars/secrets.yml
5.2 使用加密变量
yaml
# playbook.yml
- name: Deploy application
hosts: webservers
vars_files:
- vars/secrets.yml
tasks:
- name: Create database credentials
copy:
content: |
DATABASE_USER={{ db_user }}
DATABASE_PASSWORD={{ db_password }}
dest: /etc/myapp/.env
mode: '0600'
5.3 Vault ID
bash
# 使用多个vault ID
ansible-vault create --vault-id prod@prompt vars/production.yml
ansible-vault create --vault-id dev@prompt vars/development.yml
# 运行playbook时指定vault ID
ansible-playbook site.yml --vault-id prod@prompt
六、最佳实践
6.1 目录结构规范
ansible/
├── inventory/
│ ├── production/
│ │ ├── hosts.ini
│ │ └── group_vars/
│ │ └── all.yml
│ └── staging/
│ └── hosts.ini
├── library/
├── lookup_plugins/
├── filter_plugins/
├── roles/
│ ├── common/
│ ├── nginx/
│ ├── mysql/
│ └── myapp/
├── playbooks/
│ ├── site.yml
│ ├── database.yml
│ └── monitoring.yml
├── vars/
│ └── secrets.yml
└── ansible.cfg
6.2 ansible.cfg配置
ini
[defaults]
inventory = inventory/production/hosts.ini
roles_path = roles
host_key_checking = False
timeout = 30
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
callbacks_enabled = profile_tasks, timer
display_skipped_hosts = False
interpreter_python = auto_silent
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
6.3 测试驱动开发
bash
# 语法检查
ansible-playbook site.yml --syntax-check
# 列出任务
ansible-playbook site.yml --list-tasks
# 列出主机
ansible-playbook site.yml --list-hosts
# 模拟执行(check模式)
ansible-playbook site.yml --check
# 模拟执行并显示差异
ansible-playbook site.yml --check --diff
# 限速执行
ansible-playbook site.yml --diff --limit web01.example.com
七、性能优化
7.1 异步执行
yaml
# 长时间运行的任务使用异步
- name: Run long task
command: /opt/scripts/long-running-task.sh
async: 3600
poll: 0
register: long_task
- name: Check async status
async_status:
jid: "{{ long_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 100
delay: 30
7.2 批量执行策略
yaml
# 批量更新,避免同时重启大量服务
- name: Rolling update
hosts: webservers
serial: 2
tasks:
- name: Update application
include_tasks: tasks/update-app.yml
- name: Verify health
include_tasks: tasks/verify-health.yml
结语
Ansible是运维自动化的利器,但工具终究只是工具,最重要的是背后的运维思维和流程设计。在使用Ansible的过程中,我深刻体会到"Infrastructure as Code"的意义------将基础设施的配置和管理代码化、版本化、可测试化。
希望这篇文章能帮助你更好地使用Ansible进行自动化运维。如果有任何问题,欢迎在评论区交流。
本文作者:侯万里(万里侯),致力于推动运维自动化的老兵