Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南
- 一、环境说明
- 二、部署步骤与深度解析
-
- 步骤1:创建Filebeat工作目录
- [步骤 2:下载并解压 Filebeat 安装包](#步骤 2:下载并解压 Filebeat 安装包)
- [步骤 3:filebeat.yml 核心配置文件](#步骤 3:filebeat.yml 核心配置文件)
- [步骤 4:校验配置文件语法与 ES 连接](#步骤 4:校验配置文件语法与 ES 连接)
- [步骤 5:迁移 Filebeat 到系统标准目录](#步骤 5:迁移 Filebeat 到系统标准目录)
- [步骤 6:创建 Systemd 服务](#步骤 6:创建 Systemd 服务)
- [步骤 7:启动服务并设置开机自启](#步骤 7:启动服务并设置开机自启)
- 三、常见问题与解决方案
-
- [配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'](#配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity')
- [ES 写入失败 no matching index template found](#ES 写入失败 no matching index template found)
- [首次启动 CPU 飙升](#首次启动 CPU 飙升)
- 四、总结
在日志分析场景中,Nginx访问日志(access.log)包含了丰富的用户行为、请求性能等关键数据,将其高效采集并写入Elasticsearch(ES)是实现日志检索、分析与可视化的基础。本文基于Filebeat 8.15.5和ES 8环境,结合实际部署经验,详细拆解从环境准备到服务上线的完整流程,同时解答部署过程中常见的坑点与解决方案。
一、环境说明
- 操作系统:Rocky Linux(兼容CentOS、RHEL等主流Linux发行版)
- 核心组件版本:Filebeat 8.15.5、Elasticsearch 8.x
- 日志源:Nginx/OpenResty访问日志(双路径:
/mnt/data/openresty2/nginx/logs/access.log和/mnt/data/openresty/nginx/logs/access.log) - ES配置:地址
http://1.2.3.4:5678,索引格式nginx_%{+yyyyMMdd},预处理管道pipeline-nginx-plaintext
二、部署步骤与深度解析
步骤1:创建Filebeat工作目录
bash
cd /mnt/data
mkdir filebeat-es
cd filebeat-es/
解析:
工作目录选择 /mnt/data:通常 /mnt 目录用于挂载数据盘,存储空间充足,适合存放 Filebeat 安装包、配置文件及临时数据,避免因系统盘空间不足导致服务异常,后续的filebeat日志也在这个地方存储,以便于排查错误。
独立目录 filebeat-es:便于统一管理 Filebeat 相关文件,后续迁移、升级时更高效,也能与其他应用目录区分,提升系统整洁度。
步骤 2:下载并解压 Filebeat 安装包
bash
# 下载filbeat 到/mnt/data/filebeat-es
# 下载地址:https://www.elastic.co/downloads/past-releases/filebeat-8-15-5
tar -zxvf filebeat-8.15.5-linux-x86_64.tar.gz -C ./
mv filebeat-8.15.5-linux-x86_64 filebeat-es
cd filebeat-es/
解析:
解压参数说明:tar -zxvf 中,z 表示解压 gz 压缩包,x 表示提取文件,v 显示解压过程,f 指定目标文件;-C ./ 表示将文件解压到当前目录,避免文件混乱。
重命名目录:将默认的 filebeat-8.15.5-linux-x86_64 改为 filebeat-es,名称更简洁且明确用途(用于对接 ES),后续操作更易识别。
步骤 3:filebeat.yml 核心配置文件
bash
vi filebeat.yml
bash
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# 这里定义类型是log
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
# 这个地方是nginx日志的实际路径,可以根据实际进行修改
paths:
- /mnt/data/openresty/nginx/logs/access.log
#- c:\programdata\elasticsearch\logs\*
# 这个地方是从末尾开始读取写入ES,通常nginx的日志文件很大 耗费CPU资源,历史的log不再处理
tail_files: true
offset:
initial: end
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
# exclude_lines: ['^127.0.0.1']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
# 这个地方是给采集日志添加logtype列,通过es查询的时候可以辨别 类似自定义标签,可以修改
# 比如某个主机上安装了两个用途不同的openresty实例,或者多个实例组成的负载,用于辨别。
fields:
logtype: "mynginx01_openresty"
# level: debug
# review: 1
fields_under_root: true
json.enabled: true
json.keys_under_root: true
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
# multiline.pattern: ^\[
# multiline.pattern: '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s-'
# Defines if the pattern set under pattern should be negated or not. Default is false.
# multiline.negate: true
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
# multiline.match: after
#------------------------------------------------------------------------------------------
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
# 这个地方是我第二个nginx日志的实际路径,可以根据实际进行修改或者删除
paths:
- /mnt/data/openresty2/nginx/logs/access.log
#- c:\programdata\elasticsearch\logs\*
# 参考第一段
tail_files: true
offset:
initial: end
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
# exclude_lines: ['^127.0.0.1']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
fields:
logtype: "mynginx01_openresty2"
# level: debug
# review: 1
fields_under_root: true
json.enabled: true
json.keys_under_root: true
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
# multiline.pattern: ^\[
# multiline.pattern: '^[0-9]{4}/[0-9]{2}/[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s'
# Defines if the pattern set under pattern should be negated or not. Default is false.
# multiline.negate: true
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
# multiline.match: after
#============================= Filebeat modules ===============================
filebeat.shutdown_timeout: 5s
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false
setup.data_streams.enabled: false
setup.template.name: "nginx_template"
setup.template.pattern: "nginx*"
setup.template.priority: 200
setup.template.settings:
index.number_of_shards: 5
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["1.2.3.4:5678"]
data_stream: false
pipeline: "pipeline-nginx-plaintext"
index: "nginx_%{+yyyyMMdd}"
preset: balanced
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
#output.logstash:
# The Logstash hosts
#hosts: ["localhost:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
#================================ Logging =====================================
logging.level: info
logging.to_files: true
logging.files:
path: /mnt/data/filebeat-es
name: filebeat.log
keepfiles: 3
permissions: 0644
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:
#================================= Migration ==================================
# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true
以下是逐段详细解释:
输入配置 (filebeat.inputs)
这部分定义了数据源。你配置了两个独立的输入块(inputs),分别对应两个不同的 Nginx 实例(openresty 和 openresty2)。
yaml
- type: log
enabled: true
paths:
- /mnt/data/openresty/nginx/logs/access.log
# --------------------------------------------------------
# 🚀 性能优化核心配置:解决 CPU 飙升问题
# --------------------------------------------------------
tail_files: true # 启动时忽略文件中的旧数据,只看新产生的数据
offset:
initial: end # 配合 tail_files,明确指定偏移量从文件末尾开始
# --------------------------------------------------------
# 🏷️ 标签与分类:区分不同业务来源
# --------------------------------------------------------
fields:
logtype: "mynginx01_openresty" # 给该日志打上特定标签,用于后续生成索引名
fields_under_root: true # 将 logtype 字段放在根节点,而不是 fields.logtype
# --------------------------------------------------------
# 🧩 JSON 解析:在 Agent 端直接解析
# --------------------------------------------------------
json.enabled: true # 开启 JSON 解析
json.keys_under_root: true # 解析后的字段(如 status, uri)直接放在根节点
- 双输入设计:监控了两个路径。通过 fields.logtype 字段进行区分。
- tail_files & offset:这是你解决 "大文件导致 CPU 高" 的关键设置。它告诉 Filebeat:"不要管以前那数据,只盯着文件尾部,有新行才发"。
- json.keys_under_root: true:这非常重要。它意味着 Filebeat 发给 ES 的数据已经是结构化的(例如 {"status": 200, "clientip": "..."}),而不是包含一整行 JSON 字符串的 message 字段。
模板与兼容性设置 (setup.template & setup.ilm)
这部分是为了让 Filebeat 兼容 Elasticsearch 8.x,并允许你使用自定义的索引名称。
yaml
# ==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false # 禁用索引生命周期管理 (ILM)
setup.data_streams.enabled: false # 禁用数据流 (Data Streams)
# 定义模板
setup.template.name: "nginx_template" # 模板名称
setup.template.pattern: "nginx*" # 模板匹配的索引模式
setup.template.priority: 200 # 优先级 200 (默认是 150),确保你的模板覆盖系统默认模板
setup.template.settings:
index.number_of_shards: 5 # 强制设置每个索引有 5 个主分片
- 什么要禁用 ILM 和 Data Streams?
在 ES 8 中,默认强制使用 Data Streams(索引名类似 .ds-logs-nginx-...)。
为了使用你自定义的索引名 nginx_%{+yyyyMMdd},必须显式禁用这两个功能。
- 优先级 200:非常关键。这防止了 Filebeat 自身的默认模板覆盖你的设置,确保分片数设置为 5 生效。
输出配置 (output.elasticsearch)
定义数据发送到哪里以及如何命名。
yaml
output.elasticsearch:
hosts: ["1.2.3.4:5678"]
data_stream: false # 再次确认禁用数据流输出
# ⚠️ 关键点:Ingest Pipeline
pipeline: "pipeline-nginx-plaintext"
# 索引命名规则
index: "nginx_%{+yyyyMMdd}" # 按天生成索引,例如 nginx_20251217
preset: balanced # 性能预设,平衡吞吐量和资源消耗
- index:这里你设置了
nginx_%{+yyyyMMdd}。 - 注意:你之前的配置使用了
%{[logtype]}变量(如 nginx_public_...),但在这个配置中去掉了 logtype。这意味着两个 Nginx 实例的日志会写入同一个索引 nginx_20251217 中。如果这是有意为之的(为了减少分片数量),那是没问题的;如果想分开,需要改回nginx_%{[logtype]}_%{+yyyyMMdd}。 - pipeline: 这里指定了
pipeline-nginx-plaintext。 - 潜在冲突提醒:你的 Input 已经开启了 JSON 解析,但 Pipeline 名字叫 "plaintext"(纯文本)。通常 plaintext pipeline 用 Grok 解析原始文本。如果 Filebeat 发送的是已经解析好的 JSON 字段,而 Pipeline 还在尝试解析 message 文本,可能会报错或失效。
日志配置 (logging)
定义 Filebeat 自身的运行日志(用于排查 Filebeat 报错)。
erlang
logging.level: info
logging.to_files: true
logging.files:
path: /mnt/data/filebeat-es # 日志存放路径
name: filebeat.log
keepfiles: 3 # 只保留最近 3 个文件,防止占满磁盘
permissions: 0644
这份配置非常成熟,适合生产环境。它解决了大文件读取、多实例区分和 ES 8 兼容性问题。
步骤 4:校验配置文件语法与 ES 连接
bash
./filebeat test config -c filebeat.yml
./filebeat test output -c filebeat.yml
解析:
- 语法校验 test config:检查 filebeat.yml 是否存在缩进错误、字段拼写错误等语法问题。输出 Config OK 表示语法无错;若报错,需根据提示修正(如 type mismatch 通常是字段格式错误)。
- 输出连接测试 test output:验证 Filebeat 能否正常连接 ES。成功会显示 ES 版本、连接状态等信息;若报错 connection refused,需检查 ES 地址 / 端口是否正确、防火墙是否放行 12200 端口。
bash
输出说明:
✅ 连接正常:输出类似以下内容(显示 ES 版本、连接成功):
elasticsearch: http://1.2.3.4:5678...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 1.2.3.4
dial up... OK
TLS... WARN secure connection disabled
talk to server... OK
version: 8.15.5
步骤 5:迁移 Filebeat 到系统标准目录
bash
cd ..
mv filebeat-es /usr/local/
解析:
迁移到 /usr/local/:Linux 系统中,/usr/local 是默认的第三方应用安装目录,符合系统目录规范,便于后续管理(如统一升级、权限控制)。
注意:迁移后需确保目录权限正确,后续启动服务时避免权限不足问题。
步骤 6:创建 Systemd 服务
bash
vi /usr/lib/systemd/system/filebeat-es.service
服务文件内容:
bash
[Unit]
Description=Filebeat sends log data to Elasticsearch
Documentation=https://www.elastic.co/docs/beats/filebeat
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/filebeat-es/filebeat -c /usr/local/filebeat-es/filebeat.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
解析:
Systemd 服务优势:相比传统的 init.d 脚本,Systemd 支持开机自启、状态管理、日志聚合等功能,更适合生产环境的服务管理。
关键配置说明:After=network-online.target 确保服务在网络完全就绪后启动,避免因 ES 未访问导致启动失败;Restart=always 提高服务可用性,异常退出后自动恢复;LimitNOFILE=65535 解决 Filebeat 处理大量日志时的文件句柄限制问题。
步骤 7:启动服务并设置开机自启
bash
systemctl daemon-reload
systemctl start filebeat-es
systemctl status filebeat-es
systemctl enable filebeat-es
解析:
daemon-reload:重新加载 Systemd 配置,让新创建的 filebeat-es.service 生效,修改服务文件后必须执行此命令。
start 与 status:启动服务后,通过 status 查看运行状态。若显示 active (running) 表示启动成功;若失败,可通过 journalctl -u filebeat-es -f 查看详细错误日志。
enable:设置开机自启,避免服务器重启后需手动启动服务,确保日志采集不中断。
三、常见问题与解决方案
配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'
原因:file_identity 是对象类型,错误配置为字符串(如 file_identity: path)。
解决方案:删除该错误配置(默认按路径标识文件,无需显式声明),或改为正确对象格式:
bash
file_identity:
path: ~
ES 写入失败 no matching index template found
原因:Filebeat 配置的模板未在 ES 中创建。
解决方案:通过 Filebeat 自动推送模板:
bash
/usr/local/filebeat-es/filebeat setup --index-management -c /usr/local/filebeat-es/filebeat.yml
或手动通过 ES API 创建:
bash
curl -X PUT "http://1.2.3.4:5678/_template/nginx_template" -H "Content-Type: application/json" -d '{
"index_patterns": ["nginx_*"],
"settings": {"index.number_of_shards":1,"index.number_of_replicas":0},
"mappings": {"dynamic": true}
}'
首次启动 CPU 飙升
原因:Filebeat 全量扫描大日志文件(如 8GB + 的 access.log)。
解决方案:确保配置中包含 offset.initial: end,并清理注册表文件:
bash
rm -rf /usr/local/filebeat-es/data/registry/filebeat/filestream/*
systemctl restart filebeat-es
- 日志采集成功但 ES 中无数据
原因:ES 预处理管道 pipeline-nginx-plaintext 不存在或配置错误。
解决方案:创建基础管道测试:
bash
curl -X PUT "http://1.2.3.4:5678/_ingest/pipeline/pipeline-nginx-plaintext" -H "Content-Type: application/json" -d '{
"description": "Nginx日志解析管道",
"processors": [
{"grok": {"field": "message", "patterns": ["%{HTTPDATE:time_local_str}"]}},
{"date": {"field": "time_local_str", "target_field": "time_local", "formats": ["dd/MMM/yyyy:HH:mm:ss Z"]}}
]
}'
四、总结
本文基于 Filebeat 8.15.5 版本,详细实现了 Nginx access 日志到 ES 8 的采集流程,核心亮点包括:
- 优化大日志文件采集:通过 offset.initial: end 避免 CPU 飙升;
- 规范配置与服务管理:采用 Systemd 实现开机自启,确保服务稳定性;
- 避坑指南:针对模板匹配、管道配置、权限等常见问题提供解决方案。
通过这套流程,可实现 Nginx 日志的高效、稳定采集,为后续的日志分析、可视化(如 Kibana 仪表盘)奠定基础。如需扩展,可进一步配置日志过滤、字段加工、ES 索引生命周期管理等功能。