Doris的Routine Load方式消费Kafka数据进入Doris

假设kafka已有嵌套JSON数据格式为

复制代码
{
    "appId": "10000",
    "platform": "YY",
    "userId": "007",
    "userAgent": "6",
    "event": "login",
    "package": "org.apache.doris",
    "properties": {
        "phoneNumber": "13814516235",
        "actionTime": "1694928000",
        "deviceID": "device123",
        "deviceType": "smartphone",
        "appVersion": "1.0.0",
        "networkType": "WiFi",
        "os": "Android",
        "userUID": "user123",
        "nickname": "小王",
        "clientIp": "192.168.1.1"
    },
    "clientIp": "10.225.36.85",
    "timestamp": "1694928000",
    "source": "mobileApp",
    "sessionId": "12554122524422"
}

1、创建建表语句

复制代码
CREATE TABLE test_app_dwh.rt_ods_log_app_loginout (
    appId VARCHAR(20) NOT NULL COMMENT "应用ID",
    userId VARCHAR(20) NOT NULL COMMENT "用户ID",
    timestamp BIGINT COMMENT "时间戳",
    platform VARCHAR(20) NOT NULL COMMENT "平台",
    userAgent VARCHAR(20) NOT NULL COMMENT "用户代理",
    event VARCHAR(20) NOT NULL COMMENT "事件类型",
    package VARCHAR(100) NOT NULL COMMENT "包名",
    phoneNumber VARCHAR(20) COMMENT "电话号码",
    actionTime BIGINT COMMENT "动作时间戳",
    deviceID VARCHAR(50) COMMENT "设备ID",
    deviceType VARCHAR(20) COMMENT "设备类型",
    appVersion VARCHAR(20) COMMENT "应用版本",
    networkType VARCHAR(20) COMMENT "网络类型",
    os VARCHAR(20) COMMENT "操作系统",
    userUID VARCHAR(50) COMMENT "用户唯一标识",
    nickname VARCHAR(50) COMMENT "昵称",
    clientIp VARCHAR(20) COMMENT "客户端IP",
    source VARCHAR(20) COMMENT "来源",
    sessionId VARCHAR(50) COMMENT "会话ID"
)
DUPLICATE KEY(appId, userId, timestamp)
DISTRIBUTED BY HASH(appId) BUCKETS 1;

2、导入命令

复制代码
CREATE ROUTINE LOAD test_game_dwh.kafkajob_rt_ods_log_app_loginout ON rt_ods_log_app_loginout
COLUMNS(appId, userId, timestamp, platform, userAgent, event, package, phoneNumber, actionTime, deviceID, deviceType, appVersion, networkType, os, userUID, nickname, clientIp, source, sessionId)
PROPERTIES
(
    "desired_concurrent_number" = "1",
    "format" = "json",
    "strict_mode" = "false",
    "jsonpaths" = "[\"$.appId\",\"$.userId\",\"$.timestamp\",\"$.platform\",\"$.userAgent\",\"$.event\",\"$.package\",\"$.properties.phoneNumber\",\"$.properties.actionTime\",\"$.properties.deviceID\",\"$.properties.deviceType\",\"$.properties.appVersion\",\"$.properties.networkType\",\"$.properties.os\",\"$.properties.userUID\",\"$.properties.nickname\",\"$.properties.clientIp\",\"$.source\",\"$.sessionId\"]"
)
FROM KAFKA
(
    "kafka_broker_list" = "ip1:9092,ip2:9092,ip3:9092",
    "kafka_topic" = "loginout",
    "property.group.id" = "kafka_job",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

最后kafka的数据就可以源源不断的存储到doris表里面了

相关推荐
掘金-我是哪吒24 分钟前
分布式微服务系统架构第157集:JavaPlus技术文档平台日更-Java多线程编程技巧
java·分布式·微服务·云原生·架构
掘金-我是哪吒1 小时前
分布式微服务系统架构第155集:JavaPlus技术文档平台日更-Java线程池实现原理
java·分布式·微服务·云原生·架构
Bug退退退12310 小时前
RabbitMQ 高级特性之死信队列
java·分布式·spring·rabbitmq
prince0511 小时前
Kafka 生产者和消费者高级用法
分布式·kafka·linq
菜萝卜子12 小时前
【Project】基于kafka的高可用分布式日志监控与告警系统
分布式·kafka
csdn_aspnet13 小时前
在 Windows 上安装和运行 Apache Kafka
windows·kafka
幼稚园的山代王19 小时前
RabbitMQ 4.1.1初体验-队列和交换机
分布式·rabbitmq·ruby
小新学习屋20 小时前
Spark从入门到熟悉(篇三)
大数据·分布式·spark
沉着的码农1 天前
【设计模式】基于责任链模式的参数校验
java·spring boot·分布式
ZHOU_WUYI1 天前
一个简单的分布式追踪系统
分布式