实现一个实时数据平台的小型demo

近期自己梳理了一下自己所属业务线上的数据中台技术栈,以常见的实时链路为例,从最初的埋点到数据服务层查询到结果,依次经过:

1、埋点上报

2、写入消息队列

3、flink读取队列

4、flink写入clickhouse或hbase

5、spring项目提供查询和接口返回

搭建个简易版的实时数据平台流程跑通,手操实践一下,对自己使用过的技术做一个总结,对整体脉络做一个梳理。

完整工程结构

my-java-project
│
├── src
│   ├── main
│   │   ├── java
│   │   │   ├── com
│   │   │   │   ├── example
│   │   │   │   │   ├── controller
│   │   │   │   │   │   ├── EventController.java
│   │   │   │   │   │   ├── QueryController.java
│   │   │   │   │   ├── flink
│   │   │   │   │   │   ├── FlinkKafkaConsumerJob.java
│   │   │   │   │   │   ├── ClickHouseSinkFunction.java
│   │   ├── resources
│   │   │   ├── application.properties
│
├── pom.xml

1. 埋点上报

1.1 创建埋点上报服务
java 复制代码
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/events")
public class EventController {

    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    @PostMapping("/report")
    public String reportEvent(@RequestBody String event) {
        kafkaTemplate.send("events_topic", event);
        return "Event reported successfully";
    }
}
1.2 配置 Kafka

application.properties 中添加 Kafka 配置:

properties 复制代码
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

2. 写入消息队列

使用 Kafka 作为消息队列,已经在上面的代码中实现了写入 Kafka 的功能。

pom.xml 中添加 Flink 和 Kafka 依赖:

xml 复制代码
<dependencies>
    <!-- Flink dependencies -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.14.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.12</artifactId>
        <version>1.14.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_2.12</artifactId>
        <version>1.14.2</version>
    </dependency>
    <!-- Add dependencies for ClickHouse or HBase -->
</dependencies>
java 复制代码
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.api.datastream.DataStream;

import java.util.Properties;

public class FlinkKafkaConsumerJob {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "localhost:9092");
        properties.setProperty("group.id", "flink_consumer");

        FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(
                "events_topic",
                new SimpleStringSchema(),
                properties
        );

        DataStream<String> stream = env.addSource(kafkaConsumer);

        // Process stream and write to ClickHouse or HBase
        stream.addSink(new ClickHouseSinkFunction());

        env.execute("Flink Kafka Consumer Job");
    }
}
4.1 写入 ClickHouse

添加 ClickHouse 依赖:

xml 复制代码
<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.2.4</version>
</dependency>

实现写入 ClickHouse 的 Sink Function:

java 复制代码
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import ru.yandex.clickhouse.ClickHouseConnection;
import ru.yandex.clickhouse.ClickHouseDataSource;

import java.sql.PreparedStatement;
import java.sql.SQLException;

public class ClickHouseSinkFunction implements SinkFunction<String> {

    private transient ClickHouseConnection connection;
    private transient PreparedStatement statement;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
        connection = dataSource.getConnection();
        statement = connection.prepareStatement("INSERT INTO events (event) VALUES (?)");
    }

    @Override
    public void invoke(String value, Context context) throws SQLException {
        statement.setString(1, value);
        statement.executeUpdate();
    }

    @Override
    public void close() throws Exception {
        super.close();
        if (statement != null) {
            statement.close();
        }
        if (connection != null) {
            connection.close();
        }
    }
}

5. Spring 项目提供查询和接口返回

5.1 创建查询接口
java 复制代码
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;

@RestController
@RequestMapping("/api")
public class QueryController {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @GetMapping("/events")
    public List<String> getEvents() {
        return jdbcTemplate.queryForList("SELECT event FROM events", String.class);
    }
}
5.2 配置 ClickHouse 数据源

application.properties 中添加 ClickHouse 配置:

properties 复制代码
spring.datasource.url=jdbc:clickhouse://localhost:8123/default
spring.datasource.username=default
spring.datasource.password=
spring.datasource.driver-class-name=ru.yandex.clickhouse.ClickHouseDriver
相关推荐
路在脚下@36 分钟前
spring boot的配置文件属性注入到类的静态属性
java·spring boot·sql
森屿Serien39 分钟前
Spring Boot常用注解
java·spring boot·后端
WTT00112 小时前
2024楚慧杯WP
大数据·运维·网络·安全·web安全·ctf
苹果醋32 小时前
React源码02 - 基础知识 React API 一览
java·运维·spring boot·mysql·nginx
Hello.Reader2 小时前
深入解析 Apache APISIX
java·apache
菠萝蚊鸭2 小时前
Dhatim FastExcel 读写 Excel 文件
java·excel·fastexcel
旭东怪3 小时前
EasyPoi 使用$fe:模板语法生成Word动态行
java·前端·word
007php0073 小时前
Go语言zero项目部署后启动失败问题分析与解决
java·服务器·网络·python·golang·php·ai编程
∝请叫*我简单先生3 小时前
java如何使用poi-tl在word模板里渲染多张图片
java·后端·poi-tl
ssr——ssss3 小时前
SSM-期末项目 - 基于SSM的宠物信息管理系统
java·ssm