近期自己梳理了一下自己所属业务线上的数据中台技术栈,以常见的实时链路为例,从最初的埋点到数据服务层查询到结果,依次经过:
1、埋点上报
2、写入消息队列
3、flink读取队列
4、flink写入clickhouse或hbase
5、spring项目提供查询和接口返回
搭建个简易版的实时数据平台流程跑通,手操实践一下,对自己使用过的技术做一个总结,对整体脉络做一个梳理。
完整工程结构
my-java-project
│
├── src
│ ├── main
│ │ ├── java
│ │ │ ├── com
│ │ │ │ ├── example
│ │ │ │ │ ├── controller
│ │ │ │ │ │ ├── EventController.java
│ │ │ │ │ │ ├── QueryController.java
│ │ │ │ │ ├── flink
│ │ │ │ │ │ ├── FlinkKafkaConsumerJob.java
│ │ │ │ │ │ ├── ClickHouseSinkFunction.java
│ │ ├── resources
│ │ │ ├── application.properties
│
├── pom.xml
1. 埋点上报
1.1 创建埋点上报服务
java
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/events")
public class EventController {
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
@PostMapping("/report")
public String reportEvent(@RequestBody String event) {
kafkaTemplate.send("events_topic", event);
return "Event reported successfully";
}
}
1.2 配置 Kafka
在 application.properties
中添加 Kafka 配置:
properties
spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
2. 写入消息队列
使用 Kafka 作为消息队列,已经在上面的代码中实现了写入 Kafka 的功能。
3. Flink 读取队列
3.1 创建 Flink 项目
在 pom.xml
中添加 Flink 和 Kafka 依赖:
xml
<dependencies>
<!-- Flink dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.14.2</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.14.2</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.12</artifactId>
<version>1.14.2</version>
</dependency>
<!-- Add dependencies for ClickHouse or HBase -->
</dependencies>
3.2 Flink 读取 Kafka 数据
java
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.api.datastream.DataStream;
import java.util.Properties;
public class FlinkKafkaConsumerJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "flink_consumer");
FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(
"events_topic",
new SimpleStringSchema(),
properties
);
DataStream<String> stream = env.addSource(kafkaConsumer);
// Process stream and write to ClickHouse or HBase
stream.addSink(new ClickHouseSinkFunction());
env.execute("Flink Kafka Consumer Job");
}
}
4. Flink 写入 ClickHouse 或 HBase
4.1 写入 ClickHouse
添加 ClickHouse 依赖:
xml
<dependency>
<groupId>ru.yandex.clickhouse</groupId>
<artifactId>clickhouse-jdbc</artifactId>
<version>0.2.4</version>
</dependency>
实现写入 ClickHouse 的 Sink Function:
java
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import ru.yandex.clickhouse.ClickHouseConnection;
import ru.yandex.clickhouse.ClickHouseDataSource;
import java.sql.PreparedStatement;
import java.sql.SQLException;
public class ClickHouseSinkFunction implements SinkFunction<String> {
private transient ClickHouseConnection connection;
private transient PreparedStatement statement;
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default");
connection = dataSource.getConnection();
statement = connection.prepareStatement("INSERT INTO events (event) VALUES (?)");
}
@Override
public void invoke(String value, Context context) throws SQLException {
statement.setString(1, value);
statement.executeUpdate();
}
@Override
public void close() throws Exception {
super.close();
if (statement != null) {
statement.close();
}
if (connection != null) {
connection.close();
}
}
}
5. Spring 项目提供查询和接口返回
5.1 创建查询接口
java
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;
@RestController
@RequestMapping("/api")
public class QueryController {
@Autowired
private JdbcTemplate jdbcTemplate;
@GetMapping("/events")
public List<String> getEvents() {
return jdbcTemplate.queryForList("SELECT event FROM events", String.class);
}
}
5.2 配置 ClickHouse 数据源
在 application.properties
中添加 ClickHouse 配置:
properties
spring.datasource.url=jdbc:clickhouse://localhost:8123/default
spring.datasource.username=default
spring.datasource.password=
spring.datasource.driver-class-name=ru.yandex.clickhouse.ClickHouseDriver