fluent-bit使用kafka作为数据源采集问题

#作者:程宏斌

文章目录

业务需求

file采集的时候,input是透传的json,没加处理。但从kafka作为input时候,我out的日志里面多了一层payload。报错400格式异常。

原数据格式:

复制代码
{"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}

从kafka轮转之后输出的格式如下:

复制代码
[0]kafka: [[1734422739.976720818, {}], {"topic"=>"Fluentbit", "partition"=>0, "offset"=>3, "error"=>nil, "key"=>nil, "payload"=>"{"@timestamp":1734422739.232958,"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}"}]

前面加了kafka的信息如:topic、partition等,数据被多套了一层,payload是实际想要的内容,把payload提取到最外层,且不带key,只要value。

实现方案

配置fluent-bit的过滤规则如下

添加 Parsers 配置

  1. 将 payload 字段解析为独立的 JSON 内容,配置 parsers.conf 文件。

    PARSER

    Name json_payload
    Format json
    Time_Key @timestamp
    Time_Format %s
  2. 使用 Filter 提取 payload 字段
    Fluent Bit 的 Modify 插件可以帮助我们提取和替换消息中的字段。将 Kafka 输入中的 payload 字段提取并解析成独立的 JSON。

示例 fluent-bit.conf 配置继续添加:

复制代码
[FILTER]
    Name          modify
    Match          *
    Rename        payload message_raw

[FILTER]
    Name          parser
    Match          *
    Key_Name      message_raw
    Parser        json_payload
  1. 输出配置

    将处理后的数据输出到 stdout(终端)以进行验证:

    OUTPUT

    Name stdout

    Match *

    Format json_lines

  2. 配置说明

    Kafka Input

    使用 Format json 读取 Kafka 消息,使 Fluent Bit 能够识别并读取消息中的 payload 字段。

    Parsers

    json_payload 解析器专门用于将 payload 字段中的内容解析成 JSON。

    Filters

    modify 插件重命名原始 payload 为 message_raw,防止覆盖其他字段。

    parser 插件将 message_raw 作为 JSON 解析,从而获取你需要的日志内容。

    Output

    输出到 stdout 进行调试,确保数据正确提取后可替换为其他输出。

    json_lines: 将每条日志作为单独的 JSON 对象输出,并以换行符 \n 分隔。

实现效果

如下格式在output到客户http端正常。@timestamp,这个官方有去掉参数 但是还没开放出来 标准版本不能去掉。

复制代码
{"date":1734428094.976708,"@timestamp":1734428094.549618,"hostname":"uos20","output":"10:16:32.056070324: Critical High-risk command executed outside maintenance window:\nrm -i extract_payload.lua\nbash\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/sbin/sshd\n/usr/lib/systemd/systemd\ncgroups=cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope\nproc_exe_ino_ctime=1686728950587813749\nprocess=rm\npid=2320\nprocexe=rm\nfile=<NA>\naction=execve\nparent_process=bash\nparent_exepath=/usr/bin/rm\nuser=root user_uid=0 user_loginuid=0\nterminal=34817\ncontainer_info=container_id=host container_name=host","output_fields":{"container.id":"host","container.name":"host","evt.time":1734401792056070324,"evt.type":"execve","fd.name":null,"proc.acmdline[0]":"rm -i extract_payload.lua","proc.acmdline[1]":"bash","proc.aexepath[2]":"/usr/sbin/sshd","proc.aexepath[3]":"/usr/sbin/sshd","proc.aexepath[4]":"/usr/sbin/sshd","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.exe":"rm","proc.exe_ino.ctime":1686728950587813749,"proc.exepath":"/usr/bin/rm","proc.name":"rm","proc.pid":2320,"proc.pname":"bash","proc.tty":34817,"thread.cgroups":"cpuset=/ cpu=/user.slice cpuacct=/user.slice blkio=/user.slice memory=/user.slice/user-0.slice/session-4.scope","user.loginuid":0,"user.name":"root","user.uid":0},"priority":"Critical","rule":"High-Risk Command Executed Outside Maintenance Window","source":"syscall","tags":["attack_detection","host","process","security"],"time":"2024-12-17T02:16:32.056070324Z"}
{"date":1734428094.978698,"@timestamp":1734428094.549646,"log":""}
相关推荐
不爱编程的小陈3 小时前
事务的进化:从MySQL单机事务到TiDB分布式事务的探究
分布式·mysql·tidb
是小王同学啊~8 小时前
Kafka 面试通关笔记:高频八股 + 生产实战 + 追问链路(上)
笔记·面试·kafka
Devin~Y8 小时前
从内容社区到AIGC客服:Spring Boot、Redis、Kafka、K8s、RAG的三轮大厂Java面试对话(附标准答案)
java·spring boot·redis·spring cloud·kafka·kubernetes·micrometer
Hello_worlds9 小时前
Kafka InconsistentClusterIdException 导致容器无限重启,磁盘打满排查与修复
docker·kafka·磁盘·排障
007张三丰10 小时前
AIoT与嵌入式系统深度解析:2026软考案例核心考点全攻略
物联网·mqtt·kafka·freertos·时序数据库·tdengine·aiot
Java 码思客10 小时前
【Redis分布式缓存实战】第4章 单机Redis部署、配置与基础优化
redis·分布式·缓存
卷毛迷你猪10 小时前
快速实验篇(A3)基于 Hive 的气象数据数仓构建与干旱指标初步分析
大数据·hadoop·分布式
卷毛迷你猪10 小时前
快速实验篇(A4)Hive 数据仓库进阶:全站点干旱事件识别与多维统计分析
数据仓库·hive·hadoop·分布式
RingWu12 小时前
高并发三板斧-异步
分布式·微服务·架构
搞科研的小刘选手21 小时前
【中山大学主办】第六届计算机科学与区块链国际学术会议(CCSB 2026)
分布式·神经网络·计算机视觉·区块链·计算机科学·共识算法·自然语言