Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

Nginx access 日志通过 Filebeat 8.15.5 写入 Elasticsearch 8 实战指南

在日志分析场景中,Nginx访问日志(access.log)包含了丰富的用户行为、请求性能等关键数据,将其高效采集并写入Elasticsearch(ES)是实现日志检索、分析与可视化的基础。本文基于Filebeat 8.15.5和ES 8环境,结合实际部署经验,详细拆解从环境准备到服务上线的完整流程,同时解答部署过程中常见的坑点与解决方案。

一、环境说明

  • 操作系统:Rocky Linux(兼容CentOS、RHEL等主流Linux发行版)
  • 核心组件版本:Filebeat 8.15.5、Elasticsearch 8.x
  • 日志源:Nginx/OpenResty访问日志(双路径:/mnt/data/openresty2/nginx/logs/access.log/mnt/data/openresty/nginx/logs/access.log
  • ES配置:地址 http://1.2.3.4:5678,索引格式 nginx_%{+yyyyMMdd},预处理管道 pipeline-nginx-plaintext

二、部署步骤与深度解析

步骤1:创建Filebeat工作目录

bash 复制代码
cd /mnt/data
mkdir filebeat-es
cd filebeat-es/

解析:​

工作目录选择 /mnt/data:通常 /mnt 目录用于挂载数据盘,存储空间充足,适合存放 Filebeat 安装包、配置文件及临时数据,避免因系统盘空间不足导致服务异常,后续的filebeat日志也在这个地方存储,以便于排查错误。​

独立目录 filebeat-es:便于统一管理 Filebeat 相关文件,后续迁移、升级时更高效,也能与其他应用目录区分,提升系统整洁度。​

步骤 2:下载并解压 Filebeat 安装包​

bash 复制代码
# 下载filbeat 到/mnt/data/filebeat-es
# 下载地址:https://www.elastic.co/downloads/past-releases/filebeat-8-15-5
tar -zxvf filebeat-8.15.5-linux-x86_64.tar.gz -C ./​
mv filebeat-8.15.5-linux-x86_64 filebeat-es​
cd filebeat-es/​


解析:​

解压参数说明:tar -zxvf 中,z 表示解压 gz 压缩包,x 表示提取文件,v 显示解压过程,f 指定目标文件;-C ./ 表示将文件解压到当前目录,避免文件混乱。​

重命名目录:将默认的 filebeat-8.15.5-linux-x86_64 改为 filebeat-es,名称更简洁且明确用途(用于对接 ES),后续操作更易识别。​

步骤 3:filebeat.yml 核心配置文件​

bash 复制代码
vi filebeat.yml​
bash 复制代码
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
# 这里定义类型是log
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是nginx日志的实际路径,可以根据实际进行修改
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 这个地方是从末尾开始读取写入ES,通常nginx的日志文件很大 耗费CPU资源,历史的log不再处理
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  # 这个地方是给采集日志添加logtype列,通过es查询的时候可以辨别 类似自定义标签,可以修改
  # 比如某个主机上安装了两个用途不同的openresty实例,或者多个实例组成的负载,用于辨别。
  fields:
    logtype: "mynginx01_openresty"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\s-'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#------------------------------------------------------------------------------------------
- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 这个地方是我第二个nginx日志的实际路径,可以根据实际进行修改或者删除
  paths:
    - /mnt/data/openresty2/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*
  # 参考第一段
  tail_files: true
  offset:
    initial: end

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # exclude_lines: ['^127.0.0.1']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
     logtype: "mynginx01_openresty2"
  #  level: debug
  #  review: 1
  fields_under_root: true
  json.enabled: true
  json.keys_under_root: true
  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  # multiline.pattern: ^\[
  # multiline.pattern: '^[0-9]{4}/[0-9]{2}/[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\s'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  # multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  # multiline.match: after

#============================= Filebeat modules ===============================
filebeat.shutdown_timeout: 5s
filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false
setup.data_streams.enabled: false
setup.template.name: "nginx_template"
setup.template.pattern: "nginx*"
setup.template.priority: 200
setup.template.settings:
index.number_of_shards: 5
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["1.2.3.4:5678"]
  data_stream: false
  pipeline: "pipeline-nginx-plaintext"
  index: "nginx_%{+yyyyMMdd}"
  preset: balanced
  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

#================================ Logging =====================================
logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es
  name: filebeat.log
  keepfiles: 3
  permissions: 0644
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

#================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

​以下是逐段详细解释:

输入配置 (filebeat.inputs)

这部分定义了数据源。你配置了两个独立的输入块(inputs),分别对应两个不同的 Nginx 实例(openresty 和 openresty2)。​

yaml 复制代码
- type: log
  enabled: true
  paths:
    - /mnt/data/openresty/nginx/logs/access.log
  # --------------------------------------------------------
  # 🚀 性能优化核心配置:解决 CPU 飙升问题
  # --------------------------------------------------------
  tail_files: true       # 启动时忽略文件中的旧数据,只看新产生的数据
  offset:
    initial: end         # 配合 tail_files,明确指定偏移量从文件末尾开始
  
  # --------------------------------------------------------
  # 🏷️ 标签与分类:区分不同业务来源
  # --------------------------------------------------------
  fields:
    logtype: "mynginx01_openresty" # 给该日志打上特定标签,用于后续生成索引名
  fields_under_root: true           # 将 logtype 字段放在根节点,而不是 fields.logtype

  # --------------------------------------------------------
  # 🧩 JSON 解析:在 Agent 端直接解析
  # --------------------------------------------------------
  json.enabled: true         # 开启 JSON 解析
  json.keys_under_root: true # 解析后的字段(如 status, uri)直接放在根节点
  • 双输入设计:监控了两个路径。通过 fields.logtype 字段进行区分。
  • tail_files & offset:这是你解决 "大文件导致 CPU 高" 的关键设置。它告诉 Filebeat:"不要管以前那数据,只盯着文件尾部,有新行才发"。
  • json.keys_under_root: true:这非常重要。它意味着 Filebeat 发给 ES 的数据已经是结构化的(例如 {"status": 200, "clientip": "..."}),而不是包含一整行 JSON 字符串的 message 字段。

模板与兼容性设置 (setup.template & setup.ilm)

这部分是为了让 Filebeat 兼容 Elasticsearch 8.x,并允许你使用自定义的索引名称。

yaml 复制代码
# ==================== Elasticsearch template setting ==========================
setup.ilm.enabled: false          # 禁用索引生命周期管理 (ILM)
setup.data_streams.enabled: false # 禁用数据流 (Data Streams)

# 定义模板
setup.template.name: "nginx_template"   # 模板名称
setup.template.pattern: "nginx*"        # 模板匹配的索引模式
setup.template.priority: 200            # 优先级 200 (默认是 150),确保你的模板覆盖系统默认模板
setup.template.settings:
  index.number_of_shards: 5             # 强制设置每个索引有 5 个主分片
  • 什么要禁用 ILM 和 Data Streams?

在 ES 8 中,默认强制使用 Data Streams(索引名类似 .ds-logs-nginx-...)。

为了使用你自定义的索引名 nginx_%{+yyyyMMdd},必须显式禁用这两个功能。

  • 优先级 200:非常关键。这防止了 Filebeat 自身的默认模板覆盖你的设置,确保分片数设置为 5 生效。

输出配置 (output.elasticsearch)

定义数据发送到哪里以及如何命名。

yaml 复制代码
output.elasticsearch:
  hosts: ["1.2.3.4:5678"]
  data_stream: false               # 再次确认禁用数据流输出
  
  # ⚠️ 关键点:Ingest Pipeline
  pipeline: "pipeline-nginx-plaintext" 
  
  # 索引命名规则
  index: "nginx_%{+yyyyMMdd}"      # 按天生成索引,例如 nginx_20251217
  
  preset: balanced                 # 性能预设,平衡吞吐量和资源消耗
  • index:这里你设置了 nginx_%{+yyyyMMdd}
  • 注意:你之前的配置使用了 %{[logtype]} 变量(如 nginx_public_...),但在这个配置中去掉了 logtype。这意味着两个 Nginx 实例的日志会写入同一个索引 nginx_20251217 中。如果这是有意为之的(为了减少分片数量),那是没问题的;如果想分开,需要改回 nginx_%{[logtype]}_%{+yyyyMMdd}
  • pipeline: 这里指定了 pipeline-nginx-plaintext
  • 潜在冲突提醒:你的 Input 已经开启了 JSON 解析,但 Pipeline 名字叫 "plaintext"(纯文本)。通常 plaintext pipeline 用 Grok 解析原始文本。如果 Filebeat 发送的是已经解析好的 JSON 字段,而 Pipeline 还在尝试解析 message 文本,可能会报错或失效。

日志配置 (logging)

定义 Filebeat 自身的运行日志(用于排查 Filebeat 报错)。

erlang 复制代码
logging.level: info
logging.to_files: true
logging.files:
  path: /mnt/data/filebeat-es  # 日志存放路径
  name: filebeat.log
  keepfiles: 3                 # 只保留最近 3 个文件,防止占满磁盘
  permissions: 0644

这份配置非常成熟,适合生产环境。它解决了大文件读取、多实例区分和 ES 8 兼容性问题。

步骤 4:校验配置文件语法与 ES 连接​

bash 复制代码
./filebeat test config -c filebeat.yml
./filebeat test output -c filebeat.yml


解析:​

  • 语法校验 test config:检查 filebeat.yml 是否存在缩进错误、字段拼写错误等语法问题。输出 Config OK 表示语法无错;若报错,需根据提示修正(如 type mismatch 通常是字段格式错误)。
  • 输出连接测试 test output:验证 Filebeat 能否正常连接 ES。成功会显示 ES 版本、连接状态等信息;若报错 connection refused,需检查 ES 地址 / 端口是否正确、防火墙是否放行 12200 端口。
bash 复制代码
​输出说明:
✅ 连接正常:输出类似以下内容(显示 ES 版本、连接成功):

elasticsearch: http://1.2.3.4:5678...
  parse url... OK
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 1.2.3.4
    dial up... OK
  TLS... WARN secure connection disabled
  talk to server... OK
  version: 8.15.5 

步骤 5:迁移 Filebeat 到系统标准目录​

bash 复制代码
cd ..
mv filebeat-es /usr/local/

解析:​

迁移到 /usr/local/:Linux 系统中,/usr/local 是默认的第三方应用安装目录,符合系统目录规范,便于后续管理(如统一升级、权限控制)。​

注意:迁移后需确保目录权限正确,后续启动服务时避免权限不足问题。​

步骤 6:创建 Systemd 服务

bash 复制代码
vi /usr/lib/systemd/system/filebeat-es.service

服务文件内容:​

bash 复制代码
[Unit]
Description=Filebeat sends log data to Elasticsearch
Documentation=https://www.elastic.co/docs/beats/filebeat
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/filebeat-es/filebeat -c /usr/local/filebeat-es/filebeat.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target


解析:​

Systemd 服务优势:相比传统的 init.d 脚本,Systemd 支持开机自启、状态管理、日志聚合等功能,更适合生产环境的服务管理。​

关键配置说明:After=network-online.target 确保服务在网络完全就绪后启动,避免因 ES 未访问导致启动失败;Restart=always 提高服务可用性,异常退出后自动恢复;LimitNOFILE=65535 解决 Filebeat 处理大量日志时的文件句柄限制问题。​

步骤 7:启动服务并设置开机自启​

bash 复制代码
systemctl daemon-reload 
systemctl start filebeat-es
systemctl status filebeat-es
systemctl enable filebeat-es


解析:​

daemon-reload:重新加载 Systemd 配置,让新创建的 filebeat-es.service 生效,修改服务文件后必须执行此命令。​

start 与 status:启动服务后,通过 status 查看运行状态。若显示 active (running) 表示启动成功;若失败,可通过 journalctl -u filebeat-es -f 查看详细错误日志。​

enable:设置开机自启,避免服务器重启后需手动启动服务,确保日志采集不中断。​

三、常见问题与解决方案​

配置校验报错 type mismatch accessing 'filebeat.inputs.0.file_identity'​

原因:file_identity 是对象类型,错误配置为字符串(如 file_identity: path)。​

解决方案:删除该错误配置(默认按路径标识文件,无需显式声明),或改为正确对象格式:​

bash 复制代码
file_identity:
  path: ~

ES 写入失败 no matching index template found​

原因:Filebeat 配置的模板未在 ES 中创建。​

解决方案:通过 Filebeat 自动推送模板:​

bash 复制代码
​/usr/local/filebeat-es/filebeat setup --index-management -c /usr/local/filebeat-es/filebeat.yml

或手动通过 ES API 创建:​

bash 复制代码
​curl -X PUT "http://1.2.3.4:5678/_template/nginx_template" -H "Content-Type: application/json" -d '{
  "index_patterns": ["nginx_*"],
  "settings": {"index.number_of_shards":1,"index.number_of_replicas":0},
  "mappings": {"dynamic": true}
}'
​

首次启动 CPU 飙升​

原因:Filebeat 全量扫描大日志文件(如 8GB + 的 access.log)。​

解决方案:确保配置中包含 offset.initial: end,并清理注册表文件:​

bash 复制代码
​rm -rf /usr/local/filebeat-es/data/registry/filebeat/filestream/*
systemctl restart filebeat-es

  1. 日志采集成功但 ES 中无数据​

原因:ES 预处理管道 pipeline-nginx-plaintext 不存在或配置错误。​

解决方案:创建基础管道测试:​

bash 复制代码
​curl -X PUT "http://1.2.3.4:5678/_ingest/pipeline/pipeline-nginx-plaintext" -H "Content-Type: application/json" -d '{
  "description": "Nginx日志解析管道",
  "processors": [
    {"grok": {"field": "message", "patterns": ["%{HTTPDATE:time_local_str}"]}},
    {"date": {"field": "time_local_str", "target_field": "time_local", "formats": ["dd/MMM/yyyy:HH:mm:ss Z"]}}
  ]
}'

四、总结​

本文基于 Filebeat 8.15.5 版本,详细实现了 Nginx access 日志到 ES 8 的采集流程,核心亮点包括:

  1. 优化大日志文件采集:通过 offset.initial: end 避免 CPU 飙升;
  2. 规范配置与服务管理:采用 Systemd 实现开机自启,确保服务稳定性;
  3. 避坑指南:针对模板匹配、管道配置、权限等常见问题提供解决方案。
    通过这套流程,可实现 Nginx 日志的高效、稳定采集,为后续的日志分析、可视化(如 Kibana 仪表盘)奠定基础。如需扩展,可进一步配置日志过滤、字段加工、ES 索引生命周期管理等功能。
相关推荐
福大大架构师每日一题6 小时前
nginx 1.29.4 发布:支持 HTTP/2 后端与加密客户端问候(ECH),多项功能优化与修复
运维·nginx·http
又是努力搬砖的一年6 小时前
elasticsearch修改字段类型
android·大数据·elasticsearch
Elasticsearch6 小时前
在 Kibana 中可视化你的 Bosch Smart Home 数据
elasticsearch
爱宇阳7 小时前
宝塔面板 + Nginx + Spring Boot 零停机滚动发布完整教程
运维·spring boot·nginx
全栈工程师修炼指南7 小时前
Nginx | HTTP 反向代理:对上游服务端返回响应处理实践
运维·网络·nginx·安全·http
管理大亨7 小时前
企业级ELK:从日志收集到业务驱动
java·大数据·网络·数据库·elk·elasticsearch
小马爱打代码8 小时前
架构设计:ElasticSearch+HBase 海量存储架构设计与实现
大数据·elasticsearch·hbase
serve the people8 小时前
AI 模型识别 Nginx 流量中爬虫机器人的防御机制
人工智能·爬虫·nginx
Evan芙8 小时前
基于Nginx和Python的动态站点安装配置
数据库·python·nginx