EFK/ELK9.0.3 windows搭建(还在尝试中,新版问题太多了)

背景

最近某个功能要使用到ELK(ElasticSearch、Logstash、Kibana)采集日志,对数据进行分析,网上百度了一下,目前推荐不使用Logstash而使用Filebeat ,即EFK。

下载链接

Elasticsearch

Kibana

Filebeat

Logstash

analysis-ik

安装前提

需安装java环境,在此不叙述,不会的自行百度安装。

下载示例

Elasticsearch安装

下载完成后解压对应的zip包,然后配置文件在config目录下,启动在bin目录下的elasticsearch.bat,不需要改动任何配置,直接点击bat命令启动即可。

账号密码和证书

启动bat过程中会出现账号(默认都是elastic)密码和Kibana的认证证书,记得保存下来后续登录需要用到,注意密码和证书的首位空格去掉,可以使用bin/elasticsearch-reset-password -u elastic命令更改密码。

配置说明

9.0.3版本默认以ssl方式登录的,启动成功后默认都是开启安全保护。

xpack.security.enabled: false:此设置表示你已禁用了 Elasticsearch 的安全功能。这意味着 Elasticsearch 不会执行用户身份验证和访问控制。请确保你在另外的方式下对 Elasticsearch 进行了安全保护。

xpack.security.enrollment.enabled: true:此设置启用了 Elasticsearch 的安全证书认证功能。通过此功能,你可以使用证书来进行节点之间的相互认证。

xpack.security.http.ssl.enabled: false:此设置表示你已禁用了通过 HTTPS 加密来保护 Kibana、Logstash 和 Agents 与 Elasticsearch 之间的连接。这意味着这些连接将以明文方式传输数据。

xpack.security.transport.ssl.enabled: true:此设置表示你已启用了节点之间的传输层加密和相互认证功能。这样可以保护 Elasticsearch 集群节点之间的通信安全。

cluster.initial_master_nodes: ["PC-20230824PCHD"]:此设置指定了初始主节点的名称。只有具有该名称的节点才能成为集群的初始主节点。

http.host: 0.0.0.0:此设置允许从任何地方的 HTTP API 连接。连接是加密的,需要用户身份验证。

解决跨域问题

bash 复制代码
http.cors.enabled: true
http.cors.allow-origin: "*"

启动成功

访问elasticsearch登录链接https://localhost:9200,注意是https,输入上面获取的账号密码,启动成功。

配置分词器(暂未解决引入报错问题,请勿参考)

ik版本号要和es版本号一致,下载的elasticsearch-analysis-ik-9.0.3.zip解压到目录复制到es的plugins目录下即完成ik插件的安装,注意解压完删除zip包,不然启动会报错起不来

重启es,启动日志看到loaded plugin [analysis-ik],说明加载了ik分词器。

KIBANA安装

下载完成后解压对应的zip包,然后配置文件在config目录下,修改配置文件,将i18n.locale注释放开,修改为zh-CN,为中文启动,启动在bin目录下的kibana.bat,不需要改动任何配置,直接点击bat命令启动即可。

启动成功

启动过程中会出现登录链接,点击跳转登录即可,在cmd窗口中按住ctrl键+鼠标点击即可跳转登录至浏览器,kibana登录链接http://localhost:5601

输入令牌

登录过程中需要输入令牌,将上面启动es获取到的令牌输入即可。

输入验证码

验证码就在kibana的启动窗口获取。

如果令牌忘记保存了,可以在es的bin目录输入下面命令重新获取令牌,elasticsearch-create-enrollment-token -s kibana

成功后出现登录界面,输入es的账号密码即可登录。

登陆成功后显示中文页面

启动报错

启动完成后,最后会出现一行错误(虽然不影响), Error: Unable to create alerts client because the Encrypted Saved Objects plugin is missing encryption key. Please set xpack.encryptedSavedObjects.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.因为es没有encrypted-saved-objects插件导致,解决方法如下:在es的bin目录下执行命令elasticsearch-plugin list,查看当前的插件,未安装则使用命令elasticsearch-plugin install com.floragunn:encrypted-saved-objects:x.y.z进行安装,其中x.y.z 是插件的版本号,你需要替换为适合你 Elasticsearch 版本的正确版本。

输入elasticsearch-plugin list
输入elasticsearch-plugin install com.floragunn:encrypted-saved-objects:9.0.3

但是使用该命令安装失败,deepseek回答说是9版本不支持插件方式,可回退至6或者7的版本,不影响使用,跳过。

配置说明

因为elasticsearch是以ssl方式启动的,所以kibana配置也是如此,启动成功后会出现下面配置。

FILEBATE安装

下载完成后解压对应的zip包,在当前文件夹下输入cmd跳转到命令台,分别输入filebeat.exe setupfilebeat.exe -e -c filebeat.yml即可,其实只输入第二个命令即可启动,第一个命令作用如下,首次时间较久:

Index setup finished

在这个步骤中,它将创建一个 index template,并创建好相应的 index pattern。我们可以在 Kibana 中看到如下的一个新的 index pattern 被创建,所有的 filebeat 导入的文件将会自动被这个 index pattern 所访问。

配置管理

V9版本的filebeat.inputs.types属性应该有了变更,要用filestream不能用log,或许是我没弄明白,用log是启动不了的,会报错误文件找不到。

type: filestream,指定输入类型为 filestream,表示监控文件内容的变化(实时读取追加的内容,适合日志或持续更新的文件)

id,唯一标识符,用于区分多个输入配置。在日志或监控中可通过此 ID 追踪数据来源。

enabled,是否启用此输入配置。true 表示启用,false 表示禁用(如你最初的配置)。

paths,指定要监控的文件路径,支持通配符(如 *.csv)。注意:路径需用引号包裹(尤其含中文或空格时)。建议使用绝对路径。跨平台路径建议用 /(Windows 也支持)。

配置文件

注意output只能有一个,如果先以es启动,再换成logstash,启动不了的,会报错Exiting: index management requested but the Elasticsearch output is not configured/enabled,我试了加配置setup.ilm.enabled: false
setup.template.enabled: false没有作用,删了对应的索引也没用,只能删掉es和kibana重新启动初始化,有大神指导的话请说一下。

bash 复制代码
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input-specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: csv

  # Change to true to enable this input configuration.
  enabled: true
  
  encoding: utf-8

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
    -D:/Maruko/AI智能体/民航代理/*.csv
  fields:
    data_source: "hu_csys"    # 数据来源标识
    index_prefix: "hu-csys"   # ES索引前缀
  fields_under_root: true     # 提升字段到根级别
  close_eof: true             # 读取到文件末尾后关闭    
  parsers:
    - multiline:                            # 处理多行日志(如有)
        pattern: '^[^,]+(,[^,]+){30,}'      # 匹配包含30+字段的行
        negate: false
        match: after
# - type: filestream

  # # Unique ID among all inputs, an ID is required.
  # id: xlsx

  # # Change to true to enable this input configuration.
  # enabled: true
  
  # encoding: utf-8

  # # Paths that should be crawled and fetched. Glob based paths.
  # paths:
    # #- /var/log/*.log
    # #- c:\programdata\elasticsearch\logs\*
    # -D:/Maruko/AI智能体/民航代理/*.xlsx
  # fields:
    # file_type: "xlsx"
  # fields_under_root: true
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# journald is an input for collecting logs from Journald
#- type: journald

  # Unique ID among all inputs, if the ID changes, all entries
  # will be re-ingested
  #id: my-journald-id

  # The position to start reading from the journal, valid options are:
  #  - head: Starts reading at the beginning of the journal.
  #  - tail: Starts reading at the end of the journal.
  #    This means that no events will be sent until a new message is written.
  #  - since: Use also the `since` option to determine when to start reading from.
  #seek: head

  # A time offset from the current time to start reading from.
  # To use since, seek option must be set to since.
  #since: -24h

  # Collect events from the service and messages about the service,
  # including coredumps.
  #units:
    #- docker.service

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboard archive. By default, this URL
# has a value that is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
# output.elasticsearch:
  # enabled: false
  # # Array of hosts to connect to.
  # hosts: ["https://localhost:9200"]
  # ssl:
    # # 替换为你的 CA 证书路径  注意使用/不要用\
    # certificate_authorities: ["D:/Program Files/elasticsearch-9.0.3/config/certs/http_ca.crt"]
    # # 严格验证证书
    # verification_mode: "full"

  # # Performance preset - one of "balanced", "throughput", "scale",
  # # "latency", or "custom".
  # preset: balanced

  # # Protocol - either `http` (default) or `https`.
  # #protocol: "https"

  # # Authentication credentials - either API key or username/password.
  # #api_key: "id:api_key"
  # username: "elastic"
  # #替换为自己的密码
  # password: "8fy0k7b_mAu-m+aCD+rX"
  # index: "critical-%{[fields.log_type]}"

# ------------------------------ Logstash Output -------------------------------
output.logstash:
  hosts: ["logstash-server:5044"]     # Logstash服务器地址
  loadbalance: true                   # 负载均衡
  worker: 4                           # 并发线程数
  bulk_max_size: 512                  # 每批发送事件数

# ====== 索引管理配置 ======
#setup.ilm.enabled: false       # 关键配置
#setup.template.enabled: false
  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"
# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors, use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch outputs are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

LOGSTASH安装

因为要处理csv文件及xlsx文件为JSON格式,看了下filebeat好像处理比较麻烦,所以还是引入logstash处理。解压zip后。

配置说明

在logstash目录下config新增一个自己要处理对应文件逻辑的conf文件,如我要弄一个csv文件转换为JSON格式的,新增一个csv_to_json.conf文件,配置内容如下:

bash 复制代码
input {
  beats {
    port => 5044
    codec => "plain"
  }
}

filter {
  # 解析CSV(假设第一行为表头)
  csv {
    separator => ","
    skip_header => true
    columns => [
      "t_date", "host_name", "t_csn", "collect_date", "block_num",
      "block_time", "step_seq_num", "step_num", "pid", "orig_run_id",
      "gen_run_id", "major_code", "minor_code", "host_num", "agent",
      "office", "in_pid", "app_level", "usr_level", "usr_group",
      "func_num", "func_code", "text_log", "t_date_orig", "protime",
      "cust_num", "six_pid_orgin", "six_pid_indicator", "six_pid",
      "sys", "t_topic", "t_partition", "t_offset", "year", "month",
      "day", "hour"
    ]
    convert => {
      "block_num" => "integer"
      "step_seq_num" => "integer"
      "host_num" => "integer"
      "year" => "integer"
      "month" => "integer"
      "day" => "integer"
      "hour" => "integer"
    }
  }

  # 日期字段格式化
  date {
    match => ["t_date", "ISO8601"]          # 根据实际格式调整
    target => "@timestamp"                  # 覆盖默认时间戳
  }

  # 清理字段
  mutate {
    remove_field => ["message", "host", "log"]
    rename => {
      "t_date" => "[@metadata][event_date]"
      "host_name" => "[host][name]"
    }
    gsub => [
      "text_log", "\n", " ",                # 替换换行符
      "text_log", "\t", " "                 # 替换制表符
    ]
  }

  # 添加元数据
  fingerprint {
    source => ["host_name", "t_csn", "collect_date"]
    target => "[@metadata][_id]"
    method => "SHA1"
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "%{[fields][index_prefix]}-%{+YYYY.MM.dd}"  # 动态索引名
    document_id => "%{[@metadata][_id]}"    # 使用指纹ID避免重复
    document_type => "_doc"
    pipeline => "hu-csys-pipeline"          # 可选:ES预处理管道
  }

  # 调试输出(可选)
  stdout {
    codec => json_lines
  }
}

启动

错误说明1

bash 复制代码
Exiting: couldn't connect to any of the configured Elasticsearch hosts. Errors: [error connecting to Elasticsearch at http://localhost:9200: Get "http://localhost:9200": EOF]

出现该错误的原因是filebeat默认用的是非ssl模式,所以需要修改filebeat.yml配置,将host改为https,证书路径在es路径下面的config/certs下面。

bash 复制代码
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["https://localhost:9200"]
  ssl:
    # 替换为你的 CA 证书路径  注意使用/不要用\
    certificate_authorities: ["D:/Program Files/elasticsearch-9.0.3/config/certs/http_ca.crt"]
    # 严格验证证书
    verification_mode: "full"

  # Performance preset - one of "balanced", "throughput", "scale",
  # "latency", or "custom".
  preset: balanced

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  #替换为自己的密码
  password: "8fy0k7b_mAu-m+aCD+rX"

错误说明2

Exiting: error loading config file: yaml: line 166: found unknown escape character

YAML 文件对 反斜杠 \ 敏感,如果路径或字符串中包含 \ 但没有正确转义,就会报错,改为正斜杠(/),certificate_authorities证书路径修改。