Doris的Routine Load方式消费Kafka数据进入Doris

假设kafka已有嵌套JSON数据格式为

复制代码
{
    "appId": "10000",
    "platform": "YY",
    "userId": "007",
    "userAgent": "6",
    "event": "login",
    "package": "org.apache.doris",
    "properties": {
        "phoneNumber": "13814516235",
        "actionTime": "1694928000",
        "deviceID": "device123",
        "deviceType": "smartphone",
        "appVersion": "1.0.0",
        "networkType": "WiFi",
        "os": "Android",
        "userUID": "user123",
        "nickname": "小王",
        "clientIp": "192.168.1.1"
    },
    "clientIp": "10.225.36.85",
    "timestamp": "1694928000",
    "source": "mobileApp",
    "sessionId": "12554122524422"
}

1、创建建表语句

复制代码
CREATE TABLE test_app_dwh.rt_ods_log_app_loginout (
    appId VARCHAR(20) NOT NULL COMMENT "应用ID",
    userId VARCHAR(20) NOT NULL COMMENT "用户ID",
    timestamp BIGINT COMMENT "时间戳",
    platform VARCHAR(20) NOT NULL COMMENT "平台",
    userAgent VARCHAR(20) NOT NULL COMMENT "用户代理",
    event VARCHAR(20) NOT NULL COMMENT "事件类型",
    package VARCHAR(100) NOT NULL COMMENT "包名",
    phoneNumber VARCHAR(20) COMMENT "电话号码",
    actionTime BIGINT COMMENT "动作时间戳",
    deviceID VARCHAR(50) COMMENT "设备ID",
    deviceType VARCHAR(20) COMMENT "设备类型",
    appVersion VARCHAR(20) COMMENT "应用版本",
    networkType VARCHAR(20) COMMENT "网络类型",
    os VARCHAR(20) COMMENT "操作系统",
    userUID VARCHAR(50) COMMENT "用户唯一标识",
    nickname VARCHAR(50) COMMENT "昵称",
    clientIp VARCHAR(20) COMMENT "客户端IP",
    source VARCHAR(20) COMMENT "来源",
    sessionId VARCHAR(50) COMMENT "会话ID"
)
DUPLICATE KEY(appId, userId, timestamp)
DISTRIBUTED BY HASH(appId) BUCKETS 1;

2、导入命令

复制代码
CREATE ROUTINE LOAD test_game_dwh.kafkajob_rt_ods_log_app_loginout ON rt_ods_log_app_loginout
COLUMNS(appId, userId, timestamp, platform, userAgent, event, package, phoneNumber, actionTime, deviceID, deviceType, appVersion, networkType, os, userUID, nickname, clientIp, source, sessionId)
PROPERTIES
(
    "desired_concurrent_number" = "1",
    "format" = "json",
    "strict_mode" = "false",
    "jsonpaths" = "[\"$.appId\",\"$.userId\",\"$.timestamp\",\"$.platform\",\"$.userAgent\",\"$.event\",\"$.package\",\"$.properties.phoneNumber\",\"$.properties.actionTime\",\"$.properties.deviceID\",\"$.properties.deviceType\",\"$.properties.appVersion\",\"$.properties.networkType\",\"$.properties.os\",\"$.properties.userUID\",\"$.properties.nickname\",\"$.properties.clientIp\",\"$.source\",\"$.sessionId\"]"
)
FROM KAFKA
(
    "kafka_broker_list" = "ip1:9092,ip2:9092,ip3:9092",
    "kafka_topic" = "loginout",
    "property.group.id" = "kafka_job",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

最后kafka的数据就可以源源不断的存储到doris表里面了

相关推荐
阿里云云原生1 天前
数据链路再精简:Kafka 如何做到“零 ETL”一键写入 Apache Iceberg?
kafka
阿里云云原生7 天前
告别冗长链路!Kafka × Table Bucket 实现开放表格式零 ETL 实时入湖
云原生·kafka
风吹夏回13 天前
RabbitMQ 核心术语 + Python pika 方法完整讲解
分布式·python·rabbitmq
风吹夏回13 天前
RabbitMQ 三种模式入门:HelloWorld、WorkQueue、PubSub
分布式·rabbitmq·ruby
霸道流氓气质13 天前
分布式追踪与 RequestId 传播完全指南
分布式
cheems952713 天前
[RabbitMQ高级特性] 消息确认机制:从 Ready / Unacked 到 basicAck、basicReject、basicNack 的底层拆解
分布式·rabbitmq·ruby
whaledown13 天前
Kafka 与 Java 消息队列入门:用订单场景理解核心机制
java·kafka·消息队列·springboot
枫华落尽13 天前
【Hadoop01-完全分布式运行模式】
分布式
隔壁阿布都13 天前
ShedLock 分布式定时任务锁框架介绍
spring boot·分布式
文艺倾年13 天前
【强化学习】数学推导专题,20W字总结(十五)
人工智能·分布式·大模型·强化学习·vibecoding