Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

在日志分析场景中,Nginx访问日志(access.log)包含了丰富的用户行为、请求性能等关键数据,将其高效采集并写入Elasticsearch(ES)是实现日志检索、分析与可视化的基础。本文基于Filebeat 8.15.5和ES 8环境,结合实际部署经验,详细拆解从环境准备到服务上线的完整流程,同时解答部署过程中常见的坑点与解决方案。

一、环境说明

  • 操作系统:Rocky Linux(兼容CentOS、RHEL等主流Linux发行版)
  • 核心组件版本:Filebeat 8.15.5、Elasticsearch 8.x
  • 日志源:Nginx/OpenResty访问日志(双路径:/mnt/data/openresty2/nginx/logs/access.log/mnt/data/openresty/nginx/logs/access.log
  • ES配置:地址 http://1.2.3.4:5678,索引格式 nginx_%{+yyyyMMdd},预处理管道 pipeline-nginx-plaintext

二、部署步骤与深度解析

步骤1:创建Filebeat工作目录

bash 复制代码
cd /mnt/data
mkdir filebeat-es
cd filebeat-es/

解析:​

工作目录选择 /mnt/data:通常 /mnt 目录用于挂载数据盘,存储空间充足,适合存放 Filebeat 安装包、配置文件及临时数据,避免因系统盘空间不足导致服务异常,后续的filebeat日志也在这个地方存储,以便于排查错误。​

独立目录 filebeat-es:便于统一管理 Filebeat 相关文件,后续迁移、升级时更高效,也能与其他应用目录区分,提升系统整洁度。​

步骤 2:下载并解压 Filebeat 安装包​

bash 复制代码
# 下载filbeat 到/mnt/data/filebeat-es
# 下载地址:https://www.elastic.co/downloads/past-releases/filebeat-8-15-5
tar -zxvf filebeat-8.15.5-linux-x86_64.tar.gz -C ./​
mv filebeat-8.15.5-linux-x86_64 filebeat-es​
cd filebeat-es/​


解析:​

解压参数说明:tar -zxvf 中,z 表示解压 gz 压缩包,x 表示提取文件,v 显示解压过程,f 指定目标文件;-C ./ 表示将文件解压到当前目录,避免文件混乱。​

重命名目录:将默认的 filebeat-8.15.5-linux-x86_64 改为 filebeat-es,名称更简洁且明确用途(用于对接 ES),后续操作更易识别。​

步骤 3:filebeat.yml 核心配置文件​

bash 复制代码
vi filebeat.yml​
bash 复制代码
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# 这里定义类型是log
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是nginx日志的实际路径,可以根据实际进行修改
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 这个地方是从末尾开始读取写入ES,通常nginx的日志文件很大 耗费CPU资源,历史的log不再处理
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  # 这个地方是给采集日志添加logtype列,通过es查询的时候可以辨别 类似自定义标签,可以修改
  # 比如某个主机上安装了两个用途不同的openresty实例,或者多个实例组成的负载,用于辨别。
  fields:
    logtype: "mynginx01_openresty"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s-'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#------------------------------------------------------------------------------------------
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是我第二个nginx日志的实际路径,可以根据实际进行修改或者删除
  paths:
    - /mnt/data/openresty2/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 参考第一段
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
     logtype: "mynginx01_openresty2"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{4}/[0-9]{2}/[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#============================= Filebeat modules ===============================
filebeat.shutdown_timeout: 5s
filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false
setup.data_streams.enabled: false
setup.template.name: "nginx_template"
setup.template.pattern: "nginx*"
setup.template.priority: 200
setup.template.settings:
index.number_of_shards: 5
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["1.2.3.4:5678"]
  data_stream: false
  pipeline: "pipeline-nginx-plaintext"
  index: "nginx_%{+yyyyMMdd}"
  preset: balanced
  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

#================================ Logging =====================================
logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es
  name: filebeat.log
  keepfiles: 3
  permissions: 0644
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

#================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

​以下是逐段详细解释:

输入配置 (filebeat.inputs)

这部分定义了数据源。你配置了两个独立的输入块(inputs),分别对应两个不同的 Nginx 实例(openresty 和 openresty2)。​

yaml 复制代码
- type: log
  enabled: true
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
  # --------------------------------------------------------
  # 🚀 性能优化核心配置:解决 CPU 飙升问题
  # --------------------------------------------------------
  tail_files: true       # 启动时忽略文件中的旧数据,只看新产生的数据
  offset:
    initial: end         # 配合 tail_files,明确指定偏移量从文件末尾开始
  
  # --------------------------------------------------------
  # 🏷️ 标签与分类:区分不同业务来源
  # --------------------------------------------------------
  fields:
    logtype: "mynginx01_openresty" # 给该日志打上特定标签,用于后续生成索引名
  fields_under_root: true           # 将 logtype 字段放在根节点,而不是 fields.logtype

  # --------------------------------------------------------
  # 🧩 JSON 解析:在 Agent 端直接解析
  # --------------------------------------------------------
  json.enabled: true         # 开启 JSON 解析
  json.keys_under_root: true # 解析后的字段(如 status, uri)直接放在根节点
  • 双输入设计:监控了两个路径。通过 fields.logtype 字段进行区分。
  • tail_files & offset:这是你解决 "大文件导致 CPU 高" 的关键设置。它告诉 Filebeat:"不要管以前那数据,只盯着文件尾部,有新行才发"。
  • json.keys_under_root: true:这非常重要。它意味着 Filebeat 发给 ES 的数据已经是结构化的(例如 {"status": 200, "clientip": "..."}),而不是包含一整行 JSON 字符串的 message 字段。

模板与兼容性设置 (setup.template & setup.ilm)

这部分是为了让 Filebeat 兼容 Elasticsearch 8.x,并允许你使用自定义的索引名称。

yaml 复制代码
# ==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false          # 禁用索引生命周期管理 (ILM)
setup.data_streams.enabled: false # 禁用数据流 (Data Streams)

# 定义模板
setup.template.name: "nginx_template"   # 模板名称
setup.template.pattern: "nginx*"        # 模板匹配的索引模式
setup.template.priority: 200            # 优先级 200 (默认是 150),确保你的模板覆盖系统默认模板
setup.template.settings:
  index.number_of_shards: 5             # 强制设置每个索引有 5 个主分片
  • 什么要禁用 ILM 和 Data Streams?

在 ES 8 中,默认强制使用 Data Streams(索引名类似 .ds-logs-nginx-...)。

为了使用你自定义的索引名 nginx_%{+yyyyMMdd},必须显式禁用这两个功能。

  • 优先级 200:非常关键。这防止了 Filebeat 自身的默认模板覆盖你的设置,确保分片数设置为 5 生效。

输出配置 (output.elasticsearch)

定义数据发送到哪里以及如何命名。

yaml 复制代码
output.elasticsearch:
  hosts: ["1.2.3.4:5678"]
  data_stream: false               # 再次确认禁用数据流输出
  
  # ⚠️ 关键点:Ingest Pipeline
  pipeline: "pipeline-nginx-plaintext" 
  
  # 索引命名规则
  index: "nginx_%{+yyyyMMdd}"      # 按天生成索引,例如 nginx_20251217
  
  preset: balanced                 # 性能预设,平衡吞吐量和资源消耗
  • index:这里你设置了 nginx_%{+yyyyMMdd}
  • 注意:你之前的配置使用了 %{[logtype]} 变量(如 nginx_public_...),但在这个配置中去掉了 logtype。这意味着两个 Nginx 实例的日志会写入同一个索引 nginx_20251217 中。如果这是有意为之的(为了减少分片数量),那是没问题的;如果想分开,需要改回 nginx_%{[logtype]}_%{+yyyyMMdd}
  • pipeline: 这里指定了 pipeline-nginx-plaintext
  • 潜在冲突提醒:你的 Input 已经开启了 JSON 解析,但 Pipeline 名字叫 "plaintext"(纯文本)。通常 plaintext pipeline 用 Grok 解析原始文本。如果 Filebeat 发送的是已经解析好的 JSON 字段,而 Pipeline 还在尝试解析 message 文本,可能会报错或失效。

日志配置 (logging)

定义 Filebeat 自身的运行日志(用于排查 Filebeat 报错)。

erlang 复制代码
logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es  # 日志存放路径
  name: filebeat.log
  keepfiles: 3                 # 只保留最近 3 个文件,防止占满磁盘
  permissions: 0644

这份配置非常成熟,适合生产环境。它解决了大文件读取、多实例区分和 ES 8 兼容性问题。

步骤 4:校验配置文件语法与 ES 连接​

bash 复制代码
./filebeat test config -c filebeat.yml
./filebeat test output -c filebeat.yml


解析:​

  • 语法校验 test config:检查 filebeat.yml 是否存在缩进错误、字段拼写错误等语法问题。输出 Config OK 表示语法无错;若报错,需根据提示修正(如 type mismatch 通常是字段格式错误)。
  • 输出连接测试 test output:验证 Filebeat 能否正常连接 ES。成功会显示 ES 版本、连接状态等信息;若报错 connection refused,需检查 ES 地址 / 端口是否正确、防火墙是否放行 12200 端口。
bash 复制代码
​输出说明:
✅ 连接正常:输出类似以下内容(显示 ES 版本、连接成功):

elasticsearch: http://1.2.3.4:5678...
  parse url... OK
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 1.2.3.4
    dial up... OK
  TLS... WARN secure connection disabled
  talk to server... OK
  version: 8.15.5 

步骤 5:迁移 Filebeat 到系统标准目录​

bash 复制代码
cd ..
mv filebeat-es /usr/local/

解析:​

迁移到 /usr/local/:Linux 系统中,/usr/local 是默认的第三方应用安装目录,符合系统目录规范,便于后续管理(如统一升级、权限控制)。​

注意:迁移后需确保目录权限正确,后续启动服务时避免权限不足问题。​

步骤 6:创建 Systemd 服务

bash 复制代码
vi /usr/lib/systemd/system/filebeat-es.service

服务文件内容:​

bash 复制代码
[Unit]
Description=Filebeat sends log data to Elasticsearch
Documentation=https://www.elastic.co/docs/beats/filebeat
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/filebeat-es/filebeat -c /usr/local/filebeat-es/filebeat.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target


解析:​

Systemd 服务优势:相比传统的 init.d 脚本,Systemd 支持开机自启、状态管理、日志聚合等功能,更适合生产环境的服务管理。​

关键配置说明:After=network-online.target 确保服务在网络完全就绪后启动,避免因 ES 未访问导致启动失败;Restart=always 提高服务可用性,异常退出后自动恢复;LimitNOFILE=65535 解决 Filebeat 处理大量日志时的文件句柄限制问题。​

步骤 7:启动服务并设置开机自启​

bash 复制代码
systemctl daemon-reload 
systemctl start filebeat-es
systemctl status filebeat-es
systemctl enable filebeat-es


解析:​

daemon-reload:重新加载 Systemd 配置,让新创建的 filebeat-es.service 生效,修改服务文件后必须执行此命令。​

start 与 status:启动服务后,通过 status 查看运行状态。若显示 active (running) 表示启动成功;若失败,可通过 journalctl -u filebeat-es -f 查看详细错误日志。​

enable:设置开机自启,避免服务器重启后需手动启动服务,确保日志采集不中断。​

三、常见问题与解决方案​

配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'​

原因:file_identity 是对象类型,错误配置为字符串(如 file_identity: path)。​

解决方案:删除该错误配置(默认按路径标识文件,无需显式声明),或改为正确对象格式:​

bash 复制代码
file_identity:
  path: ~

ES 写入失败 no matching index template found​

原因:Filebeat 配置的模板未在 ES 中创建。​

解决方案:通过 Filebeat 自动推送模板:​

bash 复制代码
​/usr/local/filebeat-es/filebeat setup --index-management -c /usr/local/filebeat-es/filebeat.yml

或手动通过 ES API 创建:​

bash 复制代码
​curl -X PUT "http://1.2.3.4:5678/_template/nginx_template" -H "Content-Type: application/json" -d '{
  "index_patterns": ["nginx_*"],
  "settings": {"index.number_of_shards":1,"index.number_of_replicas":0},
  "mappings": {"dynamic": true}
}'
​

首次启动 CPU 飙升​

原因:Filebeat 全量扫描大日志文件(如 8GB + 的 access.log)。​

解决方案:确保配置中包含 offset.initial: end,并清理注册表文件:​

bash 复制代码
​rm -rf /usr/local/filebeat-es/data/registry/filebeat/filestream/*
systemctl restart filebeat-es

  1. 日志采集成功但 ES 中无数据​

原因:ES 预处理管道 pipeline-nginx-plaintext 不存在或配置错误。​

解决方案:创建基础管道测试:​

bash 复制代码
​curl -X PUT "http://1.2.3.4:5678/_ingest/pipeline/pipeline-nginx-plaintext" -H "Content-Type: application/json" -d '{
  "description": "Nginx日志解析管道",
  "processors": [
    {"grok": {"field": "message", "patterns": ["%{HTTPDATE:time_local_str}"]}},
    {"date": {"field": "time_local_str", "target_field": "time_local", "formats": ["dd/MMM/yyyy:HH:mm:ss Z"]}}
  ]
}'

四、总结​

本文基于 Filebeat 8.15.5 版本,详细实现了 Nginx access 日志到 ES 8 的采集流程,核心亮点包括:

  1. 优化大日志文件采集:通过 offset.initial: end 避免 CPU 飙升;
  2. 规范配置与服务管理:采用 Systemd 实现开机自启,确保服务稳定性;
  3. 避坑指南:针对模板匹配、管道配置、权限等常见问题提供解决方案。
    通过这套流程,可实现 Nginx 日志的高效、稳定采集,为后续的日志分析、可视化(如 Kibana 仪表盘)奠定基础。如需扩展,可进一步配置日志过滤、字段加工、ES 索引生命周期管理等功能。
相关推荐
yy55274 小时前
Nginx 性能优化与监控
运维·nginx·性能优化
阿里云大数据AI技术11 小时前
阿里云荣获 2025–2026 年度 Elastic中国最佳合作伙伴奖
人工智能·elasticsearch
vpk11213 小时前
【无标题】
运维·elasticsearch·docker
綮地14 小时前
ES备份脚本
大数据·elasticsearch·搜索引擎
三水不滴15 小时前
Elasticsearch 实战系列(二):SpringBoot 集成 Elasticsearch,从 0 到 1 实现商品搜索系统
经验分享·spring boot·笔记·后端·elasticsearch·搜索引擎
奋斗者1号18 小时前
解决Git Push Gerrit分支失败的全流程实战
大数据·git·elasticsearch
margu_16818 小时前
【Elasticsearch】es7.2 跨集群迁移大量数据
elasticsearch
qq_1728055918 小时前
腾讯云WordPress遭遇Nginx 502问题排查与解决方案
nginx·腾讯云
常利兵18 小时前
Spring Boot 邂逅Elasticsearch:打造搜索“光速引擎”
spring boot·elasticsearch·jenkins
Jiozg18 小时前
ES安装到linux(ubuntu)
linux·ubuntu·elasticsearch