【ELK】filebeat采集数据输出到kafka指定topic

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

背景
filebeat主体配置
- filebeat.inputs部分
- filebeat.output部分
filebeat完整配置

背景

今天收到需求，生产环境中通需要优化filebeat的输出，将不同的内容输出到kafka不同的topic中，没看过上集的兄弟可以去看上集

filebeat主体配置

这次filebeat主体部分也是参考 File beat官方文档配置如下↓

复制代码

    filebeat.inputs:
    - type: log    
      paths:
        - /data/filebeat-data/*.log
      processors:
      - add_fields:
          target: ""
          fields:
            log_type: "bizlog"


    # To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true
          hints.default_config:
            type: container
            paths:
              - /var/log/containers/*.log
            processors:
            - add_fields:
                target: ""
                fields:
                  log_type: "bizlog"
   

    #output.elasticsearch:
    #  hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
    #  username: ${ELASTICSEARCH_USERNAME}
    #  password: ${ELASTICSEARCH_PASSWORD}
    output.kafka:
      hosts: ['${KAFKA_HOST:kafka}:${KAFKA_PORT:9092}']
      topic: log_topic_all
      topics:
        - topic: "bizlog-%{[agent.version]}"
          when.contains:
            log_type: "bizlog"
        - topic: "k8slog-%{[agent.version]}"
          when.contains:
            log_type: "k8slog"
---

filebeat.inputs部分

我们在input中添加了processors模块，它可以将自定义的标签注入到输出文档中

复制代码

      processors:
      - add_fields:
          target: ""      # target为空可以让文档直接加入根节点
          fields:
            log_type: "bizlog"      # 自定义标签

输出文本效果如下

filebeat.output部分

输出直接指向kafka接口

本例中使用了环境变量方式

输出到不同文档可以直接在topics模块中配置

复制代码

    output.kafka:
      hosts: ['${KAFKA_HOST:kafka}:${KAFKA_PORT:9092}']
      topic: log_topic_all    # 文本没有有符合下面条件时，则输出到这个标题
      topics:              
        - topic: "bizlog-%{[agent.version]}"      # 指定输出topic
          when.contains:
            log_type: "bizlog"                    # input中我们注入的标签这里可以作为判断条件使用
        - topic: "k8slog-%{[agent.version]}"
          when.contains:
            log_type: "k8slog"

输出效果如下

完美符合需求

filebeat完整配置

本文采用daemonset方式部署了filebeat，完整的yaml文件如下↓

复制代码

apiVersion: v1
data:
  .dockerconfigjson: ewogICJhdXRocyI6IHsKICAgICJmdmhiLmZqZWNsb3VkLmNvbSI6IHsKICAgICAgInVzZXJuYW1lIjogImFkbWluIiwKICAgICAgInBhc3N3b3JkIjogIkF0eG41WW5MWG5KS3JsVFciCiAgICB9CiAgfQp9Cg==
kind: Secret
metadata:
  name: harbor-login
  namespace: kube-system
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups: ["apps"]
  resources:
    - replicasets
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources:
    - jobs
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat
  # should be the namespace where filebeat is running
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.inputs:
    - type: log    
      paths:
        - /data/filebeat-data/*.log
      processors:
      - add_fields:
          target: ""
          fields:
            log_type: "bizlog"


    # To enable hints based autodiscover, remove `filebeat.inputs` configuration and uncomment this:
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true
          hints.default_config:
            type: container
            paths:
              - /var/log/containers/*.log
            processors:
            - add_fields:
                target: ""
                fields:
                  log_type: "bizlog"
   

    #output.elasticsearch:
    #  hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
    #  username: ${ELASTICSEARCH_USERNAME}
    #  password: ${ELASTICSEARCH_PASSWORD}
    output.kafka:
      hosts: ['${KAFKA_HOST:kafka}:${KAFKA_PORT:9092}']
      topic: log_topic_all
      topics:
        - topic: "bizlog-%{[agent.version]}"
          when.contains:
            log_type: "bizlog"
        - topic: "k8slog-%{[agent.version]}"
          when.contains:
            log_type: "k8slog"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: filebeat
        image: fvhb.fjecloud.com/beats/filebeat:8.9.2
        securityContext:
          runAsUser: 0
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: KAFKA_HOST
          value: "100.99.17.19"
        - name: KAFKA_PORT
          value: "9092" 
        - name: ELASTICSEARCH_HOST
          value: "100.99.17.19"
        - name: ELASTICSEARCH_PORT
          value: "19200"
        - name: ELASTICSEARCH_USERNAME
          value: elastic
        - name: ELASTICSEARCH_PASSWORD
          value: qianyue@2024
        - name: ELASTIC_CLOUD_ID
          value:
        - name: ELASTIC_CLOUD_AUTH
          value:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0
          # If using Red Hat OpenShift uncomment this:
          #privileged: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlibdockercontainers
          mountPath: /data/docker/containers
          readOnly: true
        - name: filebeat-data
          mountPath: /data/filebeat-data
        - name: varlog
          mountPath: /var/log
          readOnly: true
      imagePullSecrets:
      - name: harbor-login
      volumes:
      - name: config
        configMap:
          defaultMode: 0640
          name: filebeat-config
      - name: varlibdockercontainers
        hostPath:
          path: /data/docker/containers
      - name: filebeat-data
        hostPath:
          path: /data/filebeat-data
          type: DirectoryOrCreate
      - name: varlog
        hostPath:
          path: /var/log
      # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
      - name: data
        hostPath:
          # When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate
---