实现一个实时数据平台的小型demo

近期自己梳理了一下自己所属业务线上的数据中台技术栈,以常见的实时链路为例,从最初的埋点到数据服务层查询到结果,依次经过:

1、埋点上报

2、写入消息队列

3、flink读取队列

4、flink写入clickhouse或hbase

5、spring项目提供查询和接口返回

搭建个简易版的实时数据平台流程跑通,手操实践一下,对自己使用过的技术做一个总结,对整体脉络做一个梳理。

完整工程结构

my-java-project
│
├── src
│   ├── main
│   │   ├── java
│   │   │   ├── com
│   │   │   │   ├── example
│   │   │   │   │   ├── controller
│   │   │   │   │   │   ├── EventController.java
│   │   │   │   │   │   ├── QueryController.java
│   │   │   │   │   ├── flink
│   │   │   │   │   │   ├── FlinkKafkaConsumerJob.java
│   │   │   │   │   │   ├── ClickHouseSinkFunction.java
│   │   ├── resources
│   │   │   ├── application.properties
│
├── pom.xml

1. 埋点上报

1.1 创建埋点上报服务
java 复制代码
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("/events")
public class EventController {

    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    @PostMapping("/report")
    public String reportEvent(@RequestBody String event) {
        kafkaTemplate.send("events_topic", event);
        return "Event reported successfully";
    }
}
1.2 配置 Kafka

application.properties 中添加 Kafka 配置:

properties 复制代码
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

2. 写入消息队列

使用 Kafka 作为消息队列,已经在上面的代码中实现了写入 Kafka 的功能。

pom.xml 中添加 Flink 和 Kafka 依赖:

xml 复制代码
<dependencies>
    <!-- Flink dependencies -->
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.14.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.12</artifactId>
        <version>1.14.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_2.12</artifactId>
        <version>1.14.2</version>
    </dependency>
    <!-- Add dependencies for ClickHouse or HBase -->
</dependencies>
java 复制代码
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.api.datastream.DataStream;

import java.util.Properties;

public class FlinkKafkaConsumerJob {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "localhost:9092");
        properties.setProperty("group.id", "flink_consumer");

        FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(
                "events_topic",
                new SimpleStringSchema(),
                properties
        );

        DataStream<String> stream = env.addSource(kafkaConsumer);

        // Process stream and write to ClickHouse or HBase
        stream.addSink(new ClickHouseSinkFunction());

        env.execute("Flink Kafka Consumer Job");
    }
}
4.1 写入 ClickHouse

添加 ClickHouse 依赖:

xml 复制代码
<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.2.4</version>
</dependency>

实现写入 ClickHouse 的 Sink Function:

java 复制代码
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import ru.yandex.clickhouse.ClickHouseConnection;
import ru.yandex.clickhouse.ClickHouseDataSource;

import java.sql.PreparedStatement;
import java.sql.SQLException;

public class ClickHouseSinkFunction implements SinkFunction<String> {

    private transient ClickHouseConnection connection;
    private transient PreparedStatement statement;

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
        connection = dataSource.getConnection();
        statement = connection.prepareStatement("INSERT INTO events (event) VALUES (?)");
    }

    @Override
    public void invoke(String value, Context context) throws SQLException {
        statement.setString(1, value);
        statement.executeUpdate();
    }

    @Override
    public void close() throws Exception {
        super.close();
        if (statement != null) {
            statement.close();
        }
        if (connection != null) {
            connection.close();
        }
    }
}

5. Spring 项目提供查询和接口返回

5.1 创建查询接口
java 复制代码
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.List;

@RestController
@RequestMapping("/api")
public class QueryController {

    @Autowired
    private JdbcTemplate jdbcTemplate;

    @GetMapping("/events")
    public List<String> getEvents() {
        return jdbcTemplate.queryForList("SELECT event FROM events", String.class);
    }
}
5.2 配置 ClickHouse 数据源

application.properties 中添加 ClickHouse 配置:

properties 复制代码
spring.datasource.url=jdbc:clickhouse://localhost:8123/default
spring.datasource.username=default
spring.datasource.password=
spring.datasource.driver-class-name=ru.yandex.clickhouse.ClickHouseDriver
相关推荐
Francek Chen43 分钟前
【大数据技术基础 | 实验十二】Hive实验:Hive分区
大数据·数据仓库·hive·hadoop·分布式
吾日三省吾码1 小时前
JVM 性能调优
java
弗拉唐2 小时前
springBoot,mp,ssm整合案例
java·spring boot·mybatis
oi773 小时前
使用itextpdf进行pdf模版填充中文文本时部分字不显示问题
java·服务器
少说多做3433 小时前
Android 不同情况下使用 runOnUiThread
android·java
知兀3 小时前
Java的方法、基本和引用数据类型
java·笔记·黑马程序员
蓝黑20203 小时前
IntelliJ IDEA常用快捷键
java·ide·intellij-idea
Ysjt | 深3 小时前
C++多线程编程入门教程(优质版)
java·开发语言·jvm·c++
shuangrenlong4 小时前
slice介绍slice查看器
java·ubuntu