在K8S集群中部署EFK日志收集

目录

引言

  • 系统版本为 Centos7.9
  • 内核版本为 6.3.5-1.el7
  • K8S版本为 v1.26.14
  • ES官网
  • 本次部署已经尽量避免踩坑,直接使用官方的方法有点问题。

环境准备

  • 准备ceph存储或者nfs存储
  • NFS存储安装方法
  • 本次安装使用官方ECK方式部署 EFK(老版本,7.17.3。 现存的生产环境版本基本都是这个版本。)
  • 增加RBAC权限和日志模板相关内容方便在输出日志的时候添加K8S相关内容

安装自定义资源

bash 复制代码
kubectl create -f https://download.elastic.co/downloads/eck/1.7.1/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/1.7.1/operator.yaml

部署Elasticsearch

yaml 复制代码
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
  namespace: elastic-system
spec:
  version: 7.17.3
  nodeSets:
  - name: masters
    count: 1
    config:
      node.roles: ["master"]
      xpack.ml.enabled: true
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        storageClassName: nfs-dynamic
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  - name: data
    count: 1
    config:
      node.roles: ["data", "ingest", "ml", "transform"]
    podTemplate:
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        storageClassName: nfs-dynamic
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi

生产环境建议按照下面的方式配置,我这个是测试环境怎么省事怎么来。

Master 节点与 Data 节点的区别

特性 Master 节点 Data 节点
核心职责 管理集群元数据(如索引、分片分配、节点状态) 存储数据(主分片和副本分片),执行读写操作(搜索、聚合)
配置中的角色定义 node.roles: ["master"] node.roles: ["data", "ingest", "ml", "transform"]
资源需求 低 CPU/内存(轻量级元数据管理) 高 CPU/内存/磁盘(处理数据和计算)
高可用性要求 必须冗余(生产环境至少 3 个,避免脑裂) 可水平扩展(根据数据量和负载动态增减)
示例场景 集群协调、分片分配、状态维护 文档写入、搜索请求处理、机器学习任务

生产优化建议

Master 节点配置优化

yaml 复制代码
nodeSets:
- name: masters
  count: 3  # 生产环境至少部署 3 个
  config:
    node.roles: ["master"]
  # 禁用非必要功能(节省资源)
    xpack.ml.enabled: false

Data 节点角色分离

yaml 复制代码
- name: data-only
  count: 2
  config:
    node.roles: ["data"]  # 专注数据存储

- name: ingest
  count: 2
  config:
    node.roles: ["ingest"]  # 专用写入节点

- name: ml
  count: 1
  config:
    node.roles: ["ml", "transform"]  # 独立计算节点

安装好以后测试ES是否正常

bash 复制代码
## 打开两个终端测试或者后台运行一个命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200

## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')

## 访问一下测试
curl -u "elastic:$PASSWORD" -k "https://localhost:9200"

部署Fluentd

  • 提供Fluentd的DaemonSet配置文件示例
yaml 复制代码
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
  namespace: elastic-system
spec:
  type: filebeat
  version: 7.17.3
  elasticsearchRef:
    name: quickstart             # 关联的 Elasticsearch 资源对象名
    namespace: elastic-system    # Elasticsearch 所在 Namespace
  config:
    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
    
    processors:
      - add_kubernetes_metadata: # 增加 k8s label 等相关信息。
          host: ${NODE_NAME} 
          matchers:
          - logs_path:
              logs_path: "/var/log/containers/"
      - drop_fields: # 这里可以根据需求增减需要去除的值
          fields: ["agent", "ecs", "container", "host","host.name","input", "log", "offset", "stream","kubernetes.namespace","kubernetes.labels.app","kubernetes.node", "kubernetes.pod", "kubernetes.replicaset", "kubernetes.namespace_uid", "kubernetes.labels.pod-template-hash"]
          ignore_missing: true # 字段不存在时不报错
      - decode_json_fields: 
          fields: ["message"]  # 要解析的原始字段
          target: ""           # 解析到根层级(平铺字段)
          overwrite_keys: false # 是否覆盖原有值
          process_array: false # 是否解析数组格式
          max_depth: 1         # 仅解析一层 JSON

    output.elasticsearch:
      username: "elastic"  # 使用 Elastic 内置超级用户(生产环境不推荐)
      password: "5ypyQpuC6BB191Si9w1209MM"    # 这里需要改成正确的密码(生产环境建议使用Secret注入)
      index: "filebeat-other-log-%{+yyyy.MM.dd}"
      indices: # # 索引路由规则(按条件分流)
        - index: "filebeat-containers-log-%{+yyyy.MM.dd}"  # 默认索引格式(按日滚动)
          when.or:
            - contains:
                kubernetes.labels.app: "etcd"
        - index: "filebeat-services-log-%{+yyyy.MM.dd}"
          when.contains:
            kubernetes.labels.type: "service"
      pipelines: # 引用 Ingest Pipeline 处理数据流
        - pipeline: "filebeat-containers-log-pipeline"
          when.or:
            - contains:
                kubernetes.labels.app: "etcd"
        - pipeline: "filebeat-services-log-pipeline"
          when.contains:
            kubernetes.labels.type: "service"
    setup.template.settings:
      index:
        number_of_shards: 1    # 主分片数设为 1
        number_of_replicas: 0  # 副本数设为 0  ## 生产环境至少为1.
    
    setup.template.enabled: true      # 必须开启模板功能
    setup.template.overwrite: true    # 强制覆盖旧模板
    setup.template.name: "filebeat-log-template"  # ✅ 自定义模板名
    setup.template.pattern: "filebeat-*-log-*"    # ✅ 匹配所有日
    setup.ilm.enabled: false           # 禁用 ILM(与手动模板配置兼容)
  daemonSet:
    podTemplate:
      spec:
        serviceAccount: elastic-beat-filebeat-quickstart
        automountServiceAccountToken: true
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
        - name: filebeat
          env: 
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/containers
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/containers
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-beat-filebeat-quickstart
  namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-beat-autodiscover-binding
  namespace: elastic-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: elastic-beat-autodiscover
subjects:
- kind: ServiceAccount
  name: elastic-beat-filebeat-quickstart
  namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-beat-autodiscover
  namespace: elastic-system
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - namespaces
  - events
  - pods
  verbs:
  - get
  - list
  - watch

测试filebeat是否正常推送日志

bash 复制代码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/filebeat-*/_search"

部署Kibana

yaml 复制代码
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kibana-data-pvc
  namespace: elastic-system
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: nfs-dynamic
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: quickstart
  namespace: elastic-system
spec:
  version: 7.17.3
  count: 1
  elasticsearchRef:
    name: quickstart
    namespace: elastic-system
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  config:
    i18n.locale: "zh-CN" # 添加中文支持
  podTemplate:
    spec:
      containers:
      - name: kibana
        env:
          - name: NODE_OPTIONS
            value: "--max-old-space-size=2048"
        volumeMounts:
          - mountPath: /usr/share/kibana/data
            name: kibana-data 
      volumes:
        - name: kibana-data
          persistentVolumeClaim:
            claimName: kibana-data-pvc 
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ingress
  namespace: elastic-system
spec:
  ingressClassName: nginx
  rules:
  - host: kibana.deployers.cn
    http:
      paths:
      - backend:
          service:
            name: quickstart-kb-http
            port:
              name: http
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - kibana.deployers.cn

获取账号密码,账号是:elastic

bash 复制代码
## 获取密码
kubectl get secret -n elastic-system quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode; echo

集群测试

查询集群健康状态

bash 复制代码
## 新建一个窗口,执行这条命令。
kubectl port-forward -n elastic-system services/quickstart-es-http 9200

## 获取密码
PASSWORD=$(kubectl get secret -n elastic-system quickstart-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
## 查看状态
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cluster/health?pretty"

正常输出状态

json 复制代码
{
  "cluster_name" : "quickstart",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 14,
  "active_shards" : 14,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

查看未分配分片详细信息

  • prirep:r 表示副本分片
  • unassigned.reason:未分配原因(如 NODE_LEFT、INDEX_CREATED 等)
bash 复制代码
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason"

输出结果

bash 复制代码
index                                                         shard prirep state   unassigned.reason
.async-search                                                 0     p      STARTED 
.apm-agent-configuration                                      0     p      STARTED 
.apm-custom-link                                              0     p      STARTED 
.kibana-event-log-7.17.3-000001                               0     p      STARTED 
.geoip_databases                                              0     p      STARTED 
.kibana_security_session_1                                    0     p      STARTED 
.ds-ilm-history-5-2025.05.09-000001                           0     p      STARTED 
.kibana_task_manager_7.17.3_001                               0     p      STARTED 
.security-7                                                   0     p      STARTED 
.ds-.logs-deprecation.elasticsearch-default-2025.05.09-000001 0     p      STARTED 
product-other-log-2025.05.12                                  0     p      STARTED 
.tasks                                                        0     p      STARTED 
.kibana_7.17.3_001                                            0     p      STARTED 
product-other-log-2025.05.09                                  0     p      STARTED

检查节点资源使用情况

bash 复制代码
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v&h=name,disk.used_percent,ram.percent,cpu"

输出结果

bash 复制代码
name                    disk.used_percent ram.percent cpu
quickstart-es-masters-0              1.73          54   16
quickstart-es-data-0                 1.73          55   16
  • 阈值参考‌:
    -- 磁盘使用率 ≤85%
    -- 内存使用率 ≤80%

查看es节点状态

bash 复制代码
curl -u "elastic:$PASSWORD" -k "https://localhost:9200/_cat/nodes?v"
 
ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.20.129.129            8          55   16    0.96    0.76     0.57 m         *      quickstart-es-masters-0
172.20.129.130           70          56   16    0.96    0.76     0.57 dilt      -      quickstart-es-data-0
相关推荐
功德+n1 小时前
Linux下安装与配置Docker完整详细步骤
linux·运维·服务器·开发语言·docker·centos
小敬爱吃饭2 小时前
Ragflow Docker部署及问题解决方案(界面为Welcome to nginx,ragflow上传文件失败,Docker中的ragflow-cpu-1一直重启)
人工智能·python·nginx·docker·语言模型·容器·数据挖掘
木子欢儿2 小时前
Docker Hub 镜像发布指南
java·spring cloud·docker·容器·eureka
coppher3 小时前
Ubuntu 22.04 amd64 离线安装 Docker 完整教程
linux·docker
虚伪的空想家5 小时前
k8s集群configmap和secrets备份脚本
linux·容器·kubernetes
SXJR5 小时前
k8s中的Pod
云原生·容器·kubernetes
文静小土豆5 小时前
K8s 滚动更新在 Java 应用中的实践与优化
java·容器·kubernetes
w6100104665 小时前
CKA-2026-Ingress
云原生·容器·kubernetes·cka
bloglin999996 小时前
docker logs 如何一直监听日志输出
运维·docker·容器
说实话起个名字真难啊7 小时前
Docker 入门之网络基础
网络·docker·php