fluent-bit使用kafka作为数据源采集问题

#作者:程宏斌

文章目录

业务需求

file采集的时候,input是透传的json,没加处理。但从kafka作为input时候,我out的日志里面多了一层payload。报错400格式异常。

原数据格式:

复制代码
{"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}

从kafka轮转之后输出的格式如下:

复制代码
[0]kafka: [[1734422739.976720818, {}], {"topic"=>"Fluentbit", "partition"=>0, "offset"=>3, "error"=>nil, "key"=>nil, "payload"=>"{"@timestamp":1734422739.232958,"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}"}]

前面加了kafka的信息如:topic、partition等,数据被多套了一层,payload是实际想要的内容,把payload提取到最外层,且不带key,只要value。

实现方案

配置fluent-bit的过滤规则如下

添加 Parsers 配置

  1. 将 payload 字段解析为独立的 JSON 内容,配置 parsers.conf 文件。

    PARSER

    Name json_payload
    Format json
    Time_Key @timestamp
    Time_Format %s
  2. 使用 Filter 提取 payload 字段
    Fluent Bit 的 Modify 插件可以帮助我们提取和替换消息中的字段。将 Kafka 输入中的 payload 字段提取并解析成独立的 JSON。

示例 fluent-bit.conf 配置继续添加:

复制代码
[FILTER]
    Name          modify
    Match          *
    Rename        payload message_raw

[FILTER]
    Name          parser
    Match          *
    Key_Name      message_raw
    Parser        json_payload
  1. 输出配置

    将处理后的数据输出到 stdout(终端)以进行验证:

    OUTPUT

    Name stdout

    Match *

    Format json_lines

  2. 配置说明

    Kafka Input

    使用 Format json 读取 Kafka 消息,使 Fluent Bit 能够识别并读取消息中的 payload 字段。

    Parsers

    json_payload 解析器专门用于将 payload 字段中的内容解析成 JSON。

    Filters

    modify 插件重命名原始 payload 为 message_raw,防止覆盖其他字段。

    parser 插件将 message_raw 作为 JSON 解析,从而获取你需要的日志内容。

    Output

    输出到 stdout 进行调试,确保数据正确提取后可替换为其他输出。

    json_lines: 将每条日志作为单独的 JSON 对象输出,并以换行符 \n 分隔。

实现效果

如下格式在output到客户http端正常。@timestamp,这个官方有去掉参数 但是还没开放出来 标准版本不能去掉。

复制代码
{"date":1734428094.976708,"@timestamp":1734428094.549618,"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}
{"date":1734428094.978698,"@timestamp":1734428094.549646,"log":""}
相关推荐
工程师小星星3 小时前
消息队列Apache Kafka教程
分布式·kafka·apache
风跟我说过她5 小时前
Hadoop HA (高可用) 配置与操作指南
大数据·hadoop·分布式·zookeeper·centos
还是大剑师兰特6 小时前
Kafka 面试题及详细答案100道(66-80)-- 运维与部署
分布式·kafka·大剑师·kafka面试题
whltaoin12 小时前
SpringCloud 项目阶段九:Kafka 接入实战指南 —— 从基础概念、安装配置到 Spring Boot 实战及高可用设计
spring boot·spring cloud·kafka
Insist75316 小时前
基于OpenEuler部署kafka消息队列
分布式·docker·kafka
在未来等你17 小时前
Elasticsearch面试精讲 Day 20:集群监控与性能评估
大数据·分布式·elasticsearch·搜索引擎·面试
励志成为糕手19 小时前
Kafka选举机制深度解析:分布式系统中的民主与效率
分布式·kafka·linq·controller·isr机制
echoyu.21 小时前
微服务-分布式追踪 / 监控工具大全
分布式·微服务·架构
飞鱼&1 天前
Kafka-保证消息消费的顺序性及高可用机制
分布式·kafka