Doris的Routine Load方式消费Kafka数据进入Doris

假设kafka已有嵌套JSON数据格式为

复制代码
{
    "appId": "10000",
    "platform": "YY",
    "userId": "007",
    "userAgent": "6",
    "event": "login",
    "package": "org.apache.doris",
    "properties": {
        "phoneNumber": "13814516235",
        "actionTime": "1694928000",
        "deviceID": "device123",
        "deviceType": "smartphone",
        "appVersion": "1.0.0",
        "networkType": "WiFi",
        "os": "Android",
        "userUID": "user123",
        "nickname": "小王",
        "clientIp": "192.168.1.1"
    },
    "clientIp": "10.225.36.85",
    "timestamp": "1694928000",
    "source": "mobileApp",
    "sessionId": "12554122524422"
}

1、创建建表语句

复制代码
CREATE TABLE test_app_dwh.rt_ods_log_app_loginout (
    appId VARCHAR(20) NOT NULL COMMENT "应用ID",
    userId VARCHAR(20) NOT NULL COMMENT "用户ID",
    timestamp BIGINT COMMENT "时间戳",
    platform VARCHAR(20) NOT NULL COMMENT "平台",
    userAgent VARCHAR(20) NOT NULL COMMENT "用户代理",
    event VARCHAR(20) NOT NULL COMMENT "事件类型",
    package VARCHAR(100) NOT NULL COMMENT "包名",
    phoneNumber VARCHAR(20) COMMENT "电话号码",
    actionTime BIGINT COMMENT "动作时间戳",
    deviceID VARCHAR(50) COMMENT "设备ID",
    deviceType VARCHAR(20) COMMENT "设备类型",
    appVersion VARCHAR(20) COMMENT "应用版本",
    networkType VARCHAR(20) COMMENT "网络类型",
    os VARCHAR(20) COMMENT "操作系统",
    userUID VARCHAR(50) COMMENT "用户唯一标识",
    nickname VARCHAR(50) COMMENT "昵称",
    clientIp VARCHAR(20) COMMENT "客户端IP",
    source VARCHAR(20) COMMENT "来源",
    sessionId VARCHAR(50) COMMENT "会话ID"
)
DUPLICATE KEY(appId, userId, timestamp)
DISTRIBUTED BY HASH(appId) BUCKETS 1;

2、导入命令

复制代码
CREATE ROUTINE LOAD test_game_dwh.kafkajob_rt_ods_log_app_loginout ON rt_ods_log_app_loginout
COLUMNS(appId, userId, timestamp, platform, userAgent, event, package, phoneNumber, actionTime, deviceID, deviceType, appVersion, networkType, os, userUID, nickname, clientIp, source, sessionId)
PROPERTIES
(
    "desired_concurrent_number" = "1",
    "format" = "json",
    "strict_mode" = "false",
    "jsonpaths" = "[\"$.appId\",\"$.userId\",\"$.timestamp\",\"$.platform\",\"$.userAgent\",\"$.event\",\"$.package\",\"$.properties.phoneNumber\",\"$.properties.actionTime\",\"$.properties.deviceID\",\"$.properties.deviceType\",\"$.properties.appVersion\",\"$.properties.networkType\",\"$.properties.os\",\"$.properties.userUID\",\"$.properties.nickname\",\"$.properties.clientIp\",\"$.source\",\"$.sessionId\"]"
)
FROM KAFKA
(
    "kafka_broker_list" = "ip1:9092,ip2:9092,ip3:9092",
    "kafka_topic" = "loginout",
    "property.group.id" = "kafka_job",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

最后kafka的数据就可以源源不断的存储到doris表里面了

相关推荐
明明跟你说过7 小时前
Kafka 与 Elasticsearch 的集成应用案例深度解析
大数据·elk·elasticsearch·kafka·big data·bigdata
lifewange9 小时前
Nginx + Kafka 可编程精细控制 完整版(可直接落地运行)
运维·nginx·kafka
qq_4523962311 小时前
第十三篇:《分布式压测:JMeter Master-Slave集群》
分布式·jmeter
小英雄大肚腩丶12 小时前
RabbitMQ消息队列
java·数据结构·spring boot·分布式·rabbitmq·java-rabbitmq
MXsoft61813 小时前
**一套平台管全域****IT****:分布式一体化监控的实战演进**
分布式
古怪今人13 小时前
etcd分布式键值存储系统 Windows下搭建etcd集群
数据库·分布式·etcd
LT101579744414 小时前
2026年微服务性能测试平台选型指南:分布式架构适配与服务联动测试
分布式·微服务·架构
颯沓如流星14 小时前
ZKube:优雅易用的 ZooKeeper 可视化管理工具
分布式·zookeeper·云原生
数据库小学妹15 小时前
CDC实时数据同步:让数据库变更秒级流向大数据平台!
大数据·数据库·mysql·kafka·dba
码农的神经元15 小时前
考虑通信时延的直流微电网分布式电-氢混合储能协同控制仿真复现与改进
分布式·wpf