Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

一、环境说明
二、部署步骤与深度解析
- 步骤1：创建Filebeat工作目录
- [步骤 2：下载并解压 Filebeat 安装包](#步骤 2：下载并解压 Filebeat 安装包)
- [步骤 3：filebeat.yml 核心配置文件](#步骤 3：filebeat.yml 核心配置文件)
- [步骤 4：校验配置文件语法与 ES 连接](#步骤 4：校验配置文件语法与 ES 连接)
- [步骤 5：迁移 Filebeat 到系统标准目录](#步骤 5：迁移 Filebeat 到系统标准目录)
- [步骤 6：创建 Systemd 服务](#步骤 6：创建 Systemd 服务)
- [步骤 7：启动服务并设置开机自启](#步骤 7：启动服务并设置开机自启)
三、常见问题与解决方案
- [配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'](#配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity')
- [ES 写入失败 no matching index template found](#ES 写入失败 no matching index template found)
- [首次启动 CPU 飙升](#首次启动 CPU 飙升)
四、总结

在日志分析场景中，Nginx访问日志（access.log）包含了丰富的用户行为、请求性能等关键数据，将其高效采集并写入Elasticsearch（ES）是实现日志检索、分析与可视化的基础。本文基于Filebeat 8.15.5和ES 8环境，结合实际部署经验，详细拆解从环境准备到服务上线的完整流程，同时解答部署过程中常见的坑点与解决方案。

一、环境说明

操作系统：Rocky Linux（兼容CentOS、RHEL等主流Linux发行版）
核心组件版本：Filebeat 8.15.5、Elasticsearch 8.x
日志源：Nginx/OpenResty访问日志（双路径：/mnt/data/openresty2/nginx/logs/access.log 和 /mnt/data/openresty/nginx/logs/access.log）
ES配置：地址 http://1.2.3.4:5678，索引格式 nginx_%{+yyyyMMdd}，预处理管道 pipeline-nginx-plaintext

二、部署步骤与深度解析

步骤1：创建Filebeat工作目录

bash 复制代码

cd /mnt/data
mkdir filebeat-es
cd filebeat-es/

解析：

工作目录选择 /mnt/data：通常 /mnt 目录用于挂载数据盘，存储空间充足，适合存放 Filebeat 安装包、配置文件及临时数据，避免因系统盘空间不足导致服务异常，后续的filebeat日志也在这个地方存储，以便于排查错误。

独立目录 filebeat-es：便于统一管理 Filebeat 相关文件，后续迁移、升级时更高效，也能与其他应用目录区分，提升系统整洁度。

步骤 2：下载并解压 Filebeat 安装包

bash 复制代码

# 下载filbeat 到/mnt/data/filebeat-es
# 下载地址：https://www.elastic.co/downloads/past-releases/filebeat-8-15-5
tar -zxvf filebeat-8.15.5-linux-x86_64.tar.gz -C ./
mv filebeat-8.15.5-linux-x86_64 filebeat-es
cd filebeat-es/

解析：

解压参数说明：tar -zxvf 中，z 表示解压 gz 压缩包，x 表示提取文件，v 显示解压过程，f 指定目标文件；-C ./ 表示将文件解压到当前目录，避免文件混乱。

重命名目录：将默认的 filebeat-8.15.5-linux-x86_64 改为 filebeat-es，名称更简洁且明确用途（用于对接 ES），后续操作更易识别。

步骤 3：filebeat.yml 核心配置文件

bash 复制代码

vi filebeat.yml

bash 复制代码

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# 这里定义类型是log
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是nginx日志的实际路径，可以根据实际进行修改
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 这个地方是从末尾开始读取写入ES，通常nginx的日志文件很大 耗费CPU资源，历史的log不再处理
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  # 这个地方是给采集日志添加logtype列，通过es查询的时候可以辨别 类似自定义标签，可以修改
  # 比如某个主机上安装了两个用途不同的openresty实例，或者多个实例组成的负载，用于辨别。
  fields:
    logtype: "mynginx01_openresty"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s-'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#------------------------------------------------------------------------------------------
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是我第二个nginx日志的实际路径，可以根据实际进行修改或者删除
  paths:
    - /mnt/data/openresty2/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 参考第一段
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
     logtype: "mynginx01_openresty2"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{4}/[0-9]{2}/[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#============================= Filebeat modules ===============================
filebeat.shutdown_timeout: 5s
filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false
setup.data_streams.enabled: false
setup.template.name: "nginx_template"
setup.template.pattern: "nginx*"
setup.template.priority: 200
setup.template.settings:
index.number_of_shards: 5
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["1.2.3.4:5678"]
  data_stream: false
  pipeline: "pipeline-nginx-plaintext"
  index: "nginx_%{+yyyyMMdd}"
  preset: balanced
  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

#================================ Logging =====================================
logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es
  name: filebeat.log
  keepfiles: 3
  permissions: 0644
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

#================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

以下是逐段详细解释：

输入配置 (filebeat.inputs)

这部分定义了数据源。你配置了两个独立的输入块（inputs），分别对应两个不同的 Nginx 实例（openresty 和 openresty2）。

yaml 复制代码

- type: log
  enabled: true
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
  # --------------------------------------------------------
  # 🚀 性能优化核心配置：解决 CPU 飙升问题
  # --------------------------------------------------------
  tail_files: true       # 启动时忽略文件中的旧数据，只看新产生的数据
  offset:
    initial: end         # 配合 tail_files，明确指定偏移量从文件末尾开始
  
  # --------------------------------------------------------
  # 🏷️ 标签与分类：区分不同业务来源
  # --------------------------------------------------------
  fields:
    logtype: "mynginx01_openresty" # 给该日志打上特定标签，用于后续生成索引名
  fields_under_root: true           # 将 logtype 字段放在根节点，而不是 fields.logtype

  # --------------------------------------------------------
  # 🧩 JSON 解析：在 Agent 端直接解析
  # --------------------------------------------------------
  json.enabled: true         # 开启 JSON 解析
  json.keys_under_root: true # 解析后的字段（如 status, uri）直接放在根节点

双输入设计：监控了两个路径。通过 fields.logtype 字段进行区分。
tail_files & offset：这是你解决 "大文件导致 CPU 高" 的关键设置。它告诉 Filebeat："不要管以前那数据，只盯着文件尾部，有新行才发"。
json.keys_under_root: true：这非常重要。它意味着 Filebeat 发给 ES 的数据已经是结构化的（例如 {"status": 200, "clientip": "..."}），而不是包含一整行 JSON 字符串的 message 字段。

模板与兼容性设置 (setup.template & setup.ilm)

这部分是为了让 Filebeat 兼容 Elasticsearch 8.x，并允许你使用自定义的索引名称。

yaml 复制代码

# ==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false          # 禁用索引生命周期管理 (ILM)
setup.data_streams.enabled: false # 禁用数据流 (Data Streams)

# 定义模板
setup.template.name: "nginx_template"   # 模板名称
setup.template.pattern: "nginx*"        # 模板匹配的索引模式
setup.template.priority: 200            # 优先级 200 (默认是 150)，确保你的模板覆盖系统默认模板
setup.template.settings:
  index.number_of_shards: 5             # 强制设置每个索引有 5 个主分片

什么要禁用 ILM 和 Data Streams？

在 ES 8 中，默认强制使用 Data Streams（索引名类似 .ds-logs-nginx-...）。

为了使用你自定义的索引名 nginx_%{+yyyyMMdd}，必须显式禁用这两个功能。

优先级 200：非常关键。这防止了 Filebeat 自身的默认模板覆盖你的设置，确保分片数设置为 5 生效。

输出配置 (output.elasticsearch)

定义数据发送到哪里以及如何命名。

yaml 复制代码

output.elasticsearch:
  hosts: ["1.2.3.4:5678"]
  data_stream: false               # 再次确认禁用数据流输出
  
  # ⚠️ 关键点：Ingest Pipeline
  pipeline: "pipeline-nginx-plaintext" 
  
  # 索引命名规则
  index: "nginx_%{+yyyyMMdd}"      # 按天生成索引，例如 nginx_20251217
  
  preset: balanced                 # 性能预设，平衡吞吐量和资源消耗

index：这里你设置了 nginx_%{+yyyyMMdd}。
注意：你之前的配置使用了 %{[logtype]} 变量（如 nginx_public_...），但在这个配置中去掉了 logtype。这意味着两个 Nginx 实例的日志会写入同一个索引 nginx_20251217 中。如果这是有意为之的（为了减少分片数量），那是没问题的；如果想分开，需要改回 nginx_%{[logtype]}_%{+yyyyMMdd}。
pipeline: 这里指定了 pipeline-nginx-plaintext。
潜在冲突提醒：你的 Input 已经开启了 JSON 解析，但 Pipeline 名字叫 "plaintext"（纯文本）。通常 plaintext pipeline 用 Grok 解析原始文本。如果 Filebeat 发送的是已经解析好的 JSON 字段，而 Pipeline 还在尝试解析 message 文本，可能会报错或失效。

日志配置 (logging)

定义 Filebeat 自身的运行日志（用于排查 Filebeat 报错）。

erlang 复制代码

logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es  # 日志存放路径
  name: filebeat.log
  keepfiles: 3                 # 只保留最近 3 个文件，防止占满磁盘
  permissions: 0644

这份配置非常成熟，适合生产环境。它解决了大文件读取、多实例区分和 ES 8 兼容性问题。

步骤 4：校验配置文件语法与 ES 连接

bash 复制代码

./filebeat test config -c filebeat.yml
./filebeat test output -c filebeat.yml

解析：

语法校验 test config：检查 filebeat.yml 是否存在缩进错误、字段拼写错误等语法问题。输出 Config OK 表示语法无错；若报错，需根据提示修正（如 type mismatch 通常是字段格式错误）。
输出连接测试 test output：验证 Filebeat 能否正常连接 ES。成功会显示 ES 版本、连接状态等信息；若报错 connection refused，需检查 ES 地址 / 端口是否正确、防火墙是否放行 12200 端口。

bash 复制代码

输出说明：
✅ 连接正常：输出类似以下内容（显示 ES 版本、连接成功）：

elasticsearch: http://1.2.3.4:5678...
  parse url... OK
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 1.2.3.4
    dial up... OK
  TLS... WARN secure connection disabled
  talk to server... OK
  version: 8.15.5

步骤 5：迁移 Filebeat 到系统标准目录

bash 复制代码

cd ..
mv filebeat-es /usr/local/

解析：

迁移到 /usr/local/：Linux 系统中，/usr/local 是默认的第三方应用安装目录，符合系统目录规范，便于后续管理（如统一升级、权限控制）。

注意：迁移后需确保目录权限正确，后续启动服务时避免权限不足问题。

步骤 6：创建 Systemd 服务

bash 复制代码

vi /usr/lib/systemd/system/filebeat-es.service

服务文件内容：

bash 复制代码

[Unit]
Description=Filebeat sends log data to Elasticsearch
Documentation=https://www.elastic.co/docs/beats/filebeat
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/filebeat-es/filebeat -c /usr/local/filebeat-es/filebeat.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

解析：

Systemd 服务优势：相比传统的 init.d 脚本，Systemd 支持开机自启、状态管理、日志聚合等功能，更适合生产环境的服务管理。

关键配置说明：After=network-online.target 确保服务在网络完全就绪后启动，避免因 ES 未访问导致启动失败；Restart=always 提高服务可用性，异常退出后自动恢复；LimitNOFILE=65535 解决 Filebeat 处理大量日志时的文件句柄限制问题。

步骤 7：启动服务并设置开机自启

bash 复制代码

systemctl daemon-reload 
systemctl start filebeat-es
systemctl status filebeat-es
systemctl enable filebeat-es

解析：

daemon-reload：重新加载 Systemd 配置，让新创建的 filebeat-es.service 生效，修改服务文件后必须执行此命令。

start 与 status：启动服务后，通过 status 查看运行状态。若显示 active (running) 表示启动成功；若失败，可通过 journalctl -u filebeat-es -f 查看详细错误日志。

enable：设置开机自启，避免服务器重启后需手动启动服务，确保日志采集不中断。

三、常见问题与解决方案

配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'

原因：file_identity 是对象类型，错误配置为字符串（如 file_identity: path）。

解决方案：删除该错误配置（默认按路径标识文件，无需显式声明），或改为正确对象格式：

bash 复制代码

file_identity:
  path: ~

ES 写入失败 no matching index template found

原因：Filebeat 配置的模板未在 ES 中创建。

解决方案：通过 Filebeat 自动推送模板：

bash 复制代码

/usr/local/filebeat-es/filebeat setup --index-management -c /usr/local/filebeat-es/filebeat.yml

或手动通过 ES API 创建：

bash 复制代码

curl -X PUT "http://1.2.3.4:5678/_template/nginx_template" -H "Content-Type: application/json" -d '{
  "index_patterns": ["nginx_*"],
  "settings": {"index.number_of_shards":1,"index.number_of_replicas":0},
  "mappings": {"dynamic": true}
}'

首次启动 CPU 飙升

原因：Filebeat 全量扫描大日志文件（如 8GB + 的 access.log）。

解决方案：确保配置中包含 offset.initial: end，并清理注册表文件：

bash 复制代码

rm -rf /usr/local/filebeat-es/data/registry/filebeat/filestream/*
systemctl restart filebeat-es

日志采集成功但 ES 中无数据

原因：ES 预处理管道 pipeline-nginx-plaintext 不存在或配置错误。

解决方案：创建基础管道测试：

bash 复制代码

curl -X PUT "http://1.2.3.4:5678/_ingest/pipeline/pipeline-nginx-plaintext" -H "Content-Type: application/json" -d '{
  "description": "Nginx日志解析管道",
  "processors": [
    {"grok": {"field": "message", "patterns": ["%{HTTPDATE:time_local_str}"]}},
    {"date": {"field": "time_local_str", "target_field": "time_local", "formats": ["dd/MMM/yyyy:HH:mm:ss Z"]}}
  ]
}'

四、总结

本文基于 Filebeat 8.15.5 版本，详细实现了 Nginx access 日志到 ES 8 的采集流程，核心亮点包括：

优化大日志文件采集：通过 offset.initial: end 避免 CPU 飙升；
规范配置与服务管理：采用 Systemd 实现开机自启，确保服务稳定性；
避坑指南：针对模板匹配、管道配置、权限等常见问题提供解决方案。
通过这套流程，可实现 Nginx 日志的高效、稳定采集，为后续的日志分析、可视化（如 Kibana 仪表盘）奠定基础。如需扩展，可进一步配置日志过滤、字段加工、ES 索引生命周期管理等功能。

Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

一、环境说明

二、部署步骤与深度解析

步骤1：创建Filebeat工作目录

步骤 2：下载并解压 Filebeat 安装包​

步骤 3：filebeat.yml 核心配置文件​

输入配置 (filebeat.inputs)

模板与兼容性设置 (setup.template & setup.ilm)

输出配置 (output.elasticsearch)

日志配置 (logging)

步骤 4：校验配置文件语法与 ES 连接​

步骤 5：迁移 Filebeat 到系统标准目录​

步骤 6：创建 Systemd 服务

步骤 7：启动服务并设置开机自启​

三、常见问题与解决方案​

配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'​

ES 写入失败 no matching index template found​

首次启动 CPU 飙升​

四、总结​

步骤 2：下载并解压 Filebeat 安装包

步骤 3：filebeat.yml 核心配置文件

步骤 4：校验配置文件语法与 ES 连接

步骤 5：迁移 Filebeat 到系统标准目录

步骤 7：启动服务并设置开机自启

三、常见问题与解决方案

配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'

ES 写入失败 no matching index template found

首次启动 CPU 飙升

四、总结