Flink 1.20 自定义 SQL 连接器详细教程 - HTTP 连接器完整实现
📚 完整讲解 Apache Flink 1.20 自定义 Table/SQL 连接器开发,包含完整 HTTP 连接器源码和示例
目录
- [1. 概述](#1. 概述)
- [2. 核心架构](#2. 核心架构)
- [3. Maven 项目配置](#3. Maven 项目配置)
- [4. Source 连接器实现](#4. Source 连接器实现)
- [5. Sink 连接器实现](#5. Sink 连接器实现)
- [6. SPI 服务注册](#6. SPI 服务注册)
- [7. 完整使用示例](#7. 完整使用示例)
- [8. 测试与调试](#8. 测试与调试)
- [9. 最佳实践](#9. 最佳实践)
1. 概述
1.1 什么是自定义连接器
Flink 连接器是 Flink 与外部系统的桥梁:
- Source(源):从外部系统读取数据
- Sink(汇):向外部系统写入数据
1.2 HTTP 连接器应用场景
- ✅ 调用 RESTful API 获取实时数据
- ✅ 轮询 HTTP 接口获取更新
- ✅ 将处理结果 POST 到 Webhook
- ✅ 对接第三方数据服务
2. 核心架构
2.1 组件关系图
SQL/Table API
↓
Factory (工厂类)
↓
DynamicTableSource/Sink
↓
SourceFunction/SinkFunction
↓
外部 HTTP API
2.2 核心接口
| 接口 | 作用 |
|---|---|
DynamicTableSourceFactory |
创建 Source |
DynamicTableSinkFactory |
创建 Sink |
ScanTableSource |
扫描表源 |
RichSourceFunction<RowData> |
读取数据逻辑 |
RichSinkFunction<RowData> |
写入数据逻辑 |
3. Maven 项目配置
3.1 pom.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.flink</groupId>
<artifactId>flink-connector-http</artifactId>
<version>1.0.0</version>
<properties>
<flink.version>1.20.0</flink.version>
<java.version>11</java.version>
</properties>
<dependencies>
<!-- Flink Table API -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- HTTP 客户端 -->
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.2.1</version>
</dependency>
<!-- JSON 处理 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.15.2</version>
</dependency>
</dependencies>
</project>
4. Source 连接器实现
4.1 配置选项类
java
package com.example.flink.http;
import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ConfigOptions;
import java.time.Duration;
public class HttpOptions {
public static final String IDENTIFIER = "http";
public static final ConfigOption<String> URL =
ConfigOptions.key("url")
.stringType()
.noDefaultValue()
.withDescription("HTTP endpoint URL");
public static final ConfigOption<String> METHOD =
ConfigOptions.key("method")
.stringType()
.defaultValue("GET")
.withDescription("HTTP method");
public static final ConfigOption<String> AUTH_TOKEN =
ConfigOptions.key("auth.token")
.stringType()
.noDefaultValue()
.withDescription("Bearer token");
public static final ConfigOption<Duration> POLL_INTERVAL =
ConfigOptions.key("poll.interval")
.durationType()
.defaultValue(Duration.ofSeconds(5))
.withDescription("Polling interval");
public static final ConfigOption<String> JSON_PATH =
ConfigOptions.key("json.path")
.stringType()
.noDefaultValue()
.withDescription("JSON path to extract data");
public static final ConfigOption<Integer> MAX_RETRIES =
ConfigOptions.key("max.retries")
.intType()
.defaultValue(3)
.withDescription("Max retry attempts");
}
4.2 Source Factory
java
package com.example.flink.http;
import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.table.connector.source.DynamicTableSource;
import org.apache.flink.table.factories.DynamicTableSourceFactory;
import org.apache.flink.table.factories.FactoryUtil;
import java.util.HashSet;
import java.util.Set;
public class HttpDynamicTableSourceFactory implements DynamicTableSourceFactory {
@Override
public String factoryIdentifier() {
return HttpOptions.IDENTIFIER;
}
@Override
public Set<ConfigOption<?>> requiredOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(HttpOptions.URL);
return options;
}
@Override
public Set<ConfigOption<?>> optionalOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(HttpOptions.METHOD);
options.add(HttpOptions.AUTH_TOKEN);
options.add(HttpOptions.POLL_INTERVAL);
options.add(HttpOptions.JSON_PATH);
options.add(HttpOptions.MAX_RETRIES);
return options;
}
@Override
public DynamicTableSource createDynamicTableSource(Context context) {
FactoryUtil.TableFactoryHelper helper =
FactoryUtil.createTableFactoryHelper(this, context);
helper.validate();
ReadableConfig config = helper.getOptions();
return new HttpDynamicTableSource(
context.getCatalogTable().getResolvedSchema(),
config
);
}
}
4.3 Dynamic Table Source
java
package com.example.flink.http;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.connector.ChangelogMode;
import org.apache.flink.table.connector.source.DynamicTableSource;
import org.apache.flink.table.connector.source.ScanTableSource;
import org.apache.flink.table.connector.source.SourceFunctionProvider;
import org.apache.flink.table.data.RowData;
public class HttpDynamicTableSource implements ScanTableSource {
private final ResolvedSchema schema;
private final ReadableConfig config;
public HttpDynamicTableSource(ResolvedSchema schema, ReadableConfig config) {
this.schema = schema;
this.config = config;
}
@Override
public ChangelogMode getChangelogMode() {
return ChangelogMode.insertOnly();
}
@Override
public ScanRuntimeProvider getScanRuntimeProvider(ScanContext context) {
SourceFunction<RowData> sourceFunction =
new HttpSourceFunction(schema, config);
return SourceFunctionProvider.of(sourceFunction, false);
}
@Override
public DynamicTableSource copy() {
return new HttpDynamicTableSource(schema, config);
}
@Override
public String asSummaryString() {
return "HTTP Table Source";
}
}
4.4 Source Function(核心逻辑)
java
package com.example.flink.http;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.data.GenericRowData;
import org.apache.flink.table.data.RowData;
import org.apache.flink.table.data.StringData;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.Duration;
public class HttpSourceFunction extends RichSourceFunction<RowData> {
private static final Logger LOG = LoggerFactory.getLogger(HttpSourceFunction.class);
private static final ObjectMapper MAPPER = new ObjectMapper();
private final ResolvedSchema schema;
private final ReadableConfig config;
private volatile boolean running = true;
private CloseableHttpClient httpClient;
public HttpSourceFunction(ResolvedSchema schema, ReadableConfig config) {
this.schema = schema;
this.config = config;
}
@Override
public void open(org.apache.flink.configuration.Configuration parameters) {
httpClient = HttpClients.createDefault();
LOG.info("HTTP Source opened");
}
@Override
public void run(SourceContext<RowData> ctx) throws Exception {
String url = config.get(HttpOptions.URL);
Duration pollInterval = config.get(HttpOptions.POLL_INTERVAL);
String jsonPath = config.getOptional(HttpOptions.JSON_PATH).orElse(null);
LOG.info("Start polling from: {}", url);
while (running) {
try {
// 执行 HTTP GET 请求
String response = fetchData(url);
if (response != null) {
// 解析 JSON
JsonNode root = MAPPER.readTree(response);
JsonNode dataNode = extractDataNode(root, jsonPath);
// 转换为 RowData 并发送
if (dataNode != null) {
if (dataNode.isArray()) {
for (JsonNode item : dataNode) {
RowData row = jsonToRowData(item);
if (row != null) {
ctx.collect(row);
}
}
} else {
RowData row = jsonToRowData(dataNode);
if (row != null) {
ctx.collect(row);
}
}
}
}
Thread.sleep(pollInterval.toMillis());
} catch (InterruptedException e) {
LOG.info("HTTP source interrupted");
break;
} catch (Exception e) {
LOG.error("Error fetching HTTP data", e);
Thread.sleep(pollInterval.toMillis());
}
}
}
@Override
public void cancel() {
running = false;
}
@Override
public void close() throws Exception {
if (httpClient != null) {
httpClient.close();
}
}
/**
* 从 HTTP 端点获取数据
*/
private String fetchData(String url) {
try {
HttpGet request = new HttpGet(url);
// 设置认证 Token
config.getOptional(HttpOptions.AUTH_TOKEN).ifPresent(token ->
request.setHeader("Authorization", "Bearer " + token)
);
try (CloseableHttpResponse response = httpClient.execute(request)) {
if (response.getCode() >= 200 && response.getCode() < 300) {
return EntityUtils.toString(response.getEntity());
} else {
LOG.warn("HTTP request failed: {}", response.getCode());
}
}
} catch (Exception e) {
LOG.error("HTTP request error", e);
}
return null;
}
/**
* 从 JSON 响应中提取数据节点
*/
private JsonNode extractDataNode(JsonNode root, String jsonPath) {
if (jsonPath == null || jsonPath.isEmpty()) {
return root;
}
JsonNode current = root;
for (String part : jsonPath.split("\\.")) {
current = current.get(part);
if (current == null) {
LOG.warn("JSON path not found: {}", jsonPath);
return null;
}
}
return current;
}
/**
* 将 JSON 转换为 RowData
*/
private RowData jsonToRowData(JsonNode json) {
try {
int fieldCount = schema.getColumnCount();
GenericRowData row = new GenericRowData(fieldCount);
for (int i = 0; i < fieldCount; i++) {
String fieldName = schema.getColumnNames().get(i);
LogicalType type = schema.getColumnDataTypes().get(i).getLogicalType();
JsonNode fieldNode = json.get(fieldName);
if (fieldNode == null || fieldNode.isNull()) {
row.setField(i, null);
} else {
row.setField(i, convertValue(fieldNode, type));
}
}
return row;
} catch (Exception e) {
LOG.error("Error converting JSON to RowData", e);
return null;
}
}
/**
* 类型转换
*/
private Object convertValue(JsonNode node, LogicalType type) {
LogicalTypeRoot typeRoot = type.getTypeRoot();
switch (typeRoot) {
case VARCHAR:
case CHAR:
return StringData.fromString(node.asText());
case BOOLEAN:
return node.asBoolean();
case INTEGER:
return node.asInt();
case BIGINT:
return node.asLong();
case DOUBLE:
return node.asDouble();
case FLOAT:
return (float) node.asDouble();
default:
return StringData.fromString(node.asText());
}
}
}
5. Sink 连接器实现
5.1 Sink Factory
java
package com.example.flink.http;
import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.factories.DynamicTableSinkFactory;
import org.apache.flink.table.factories.FactoryUtil;
import java.util.HashSet;
import java.util.Set;
public class HttpDynamicTableSinkFactory implements DynamicTableSinkFactory {
@Override
public String factoryIdentifier() {
return HttpOptions.IDENTIFIER;
}
@Override
public Set<ConfigOption<?>> requiredOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(HttpOptions.URL);
return options;
}
@Override
public Set<ConfigOption<?>> optionalOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(HttpOptions.METHOD);
options.add(HttpOptions.AUTH_TOKEN);
options.add(HttpOptions.MAX_RETRIES);
return options;
}
@Override
public DynamicTableSink createDynamicTableSink(Context context) {
FactoryUtil.TableFactoryHelper helper =
FactoryUtil.createTableFactoryHelper(this, context);
helper.validate();
return new HttpDynamicTableSink(
context.getCatalogTable().getResolvedSchema(),
helper.getOptions()
);
}
}
5.2 Dynamic Table Sink
java
package com.example.flink.http;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.connector.ChangelogMode;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.connector.sink.SinkFunctionProvider;
import org.apache.flink.table.data.RowData;
public class HttpDynamicTableSink implements DynamicTableSink {
private final ResolvedSchema schema;
private final ReadableConfig config;
public HttpDynamicTableSink(ResolvedSchema schema, ReadableConfig config) {
this.schema = schema;
this.config = config;
}
@Override
public ChangelogMode getChangelogMode(ChangelogMode requestedMode) {
return ChangelogMode.insertOnly();
}
@Override
public SinkRuntimeProvider getSinkRuntimeProvider(Context context) {
SinkFunction<RowData> sinkFunction =
new HttpSinkFunction(schema, config);
return SinkFunctionProvider.of(sinkFunction);
}
@Override
public DynamicTableSink copy() {
return new HttpDynamicTableSink(schema, config);
}
@Override
public String asSummaryString() {
return "HTTP Table Sink";
}
}
5.3 Sink Function
java
package com.example.flink.http;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.data.RowData;
import org.apache.flink.table.data.StringData;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HttpSinkFunction extends RichSinkFunction<RowData> {
private static final Logger LOG = LoggerFactory.getLogger(HttpSinkFunction.class);
private static final ObjectMapper MAPPER = new ObjectMapper();
private final ResolvedSchema schema;
private final ReadableConfig config;
private CloseableHttpClient httpClient;
public HttpSinkFunction(ResolvedSchema schema, ReadableConfig config) {
this.schema = schema;
this.config = config;
}
@Override
public void open(org.apache.flink.configuration.Configuration parameters) {
httpClient = HttpClients.createDefault();
LOG.info("HTTP Sink opened");
}
@Override
public void invoke(RowData value, Context context) throws Exception {
String url = config.get(HttpOptions.URL);
// 转换 RowData 为 JSON
ObjectNode json = rowDataToJson(value);
String jsonString = MAPPER.writeValueAsString(json);
// 发送 HTTP POST 请求
sendData(url, jsonString);
}
@Override
public void close() throws Exception {
if (httpClient != null) {
httpClient.close();
}
}
/**
* 将 RowData 转换为 JSON
*/
private ObjectNode rowDataToJson(RowData row) {
ObjectNode json = MAPPER.createObjectNode();
for (int i = 0; i < schema.getColumnCount(); i++) {
String fieldName = schema.getColumnNames().get(i);
LogicalType type = schema.getColumnDataTypes().get(i).getLogicalType();
if (row.isNullAt(i)) {
json.putNull(fieldName);
continue;
}
switch (type.getTypeRoot()) {
case VARCHAR:
case CHAR:
json.put(fieldName, row.getString(i).toString());
break;
case BOOLEAN:
json.put(fieldName, row.getBoolean(i));
break;
case INTEGER:
json.put(fieldName, row.getInt(i));
break;
case BIGINT:
json.put(fieldName, row.getLong(i));
break;
case DOUBLE:
json.put(fieldName, row.getDouble(i));
break;
case FLOAT:
json.put(fieldName, row.getFloat(i));
break;
default:
json.put(fieldName, row.getString(i).toString());
}
}
return json;
}
/**
* 发送数据到 HTTP 端点
*/
private void sendData(String url, String jsonData) {
int maxRetries = config.get(HttpOptions.MAX_RETRIES);
for (int retry = 0; retry <= maxRetries; retry++) {
try {
HttpPost request = new HttpPost(url);
request.setHeader("Content-Type", "application/json");
// 设置认证
config.getOptional(HttpOptions.AUTH_TOKEN).ifPresent(token ->
request.setHeader("Authorization", "Bearer " + token)
);
request.setEntity(new StringEntity(jsonData));
try (CloseableHttpResponse response = httpClient.execute(request)) {
if (response.getCode() >= 200 && response.getCode() < 300) {
LOG.debug("Data sent successfully");
return;
} else {
LOG.warn("HTTP request failed: {}", response.getCode());
}
}
} catch (Exception e) {
LOG.error("Error sending data (attempt {}/{})", retry + 1, maxRetries + 1, e);
if (retry < maxRetries) {
try {
Thread.sleep(1000 * (retry + 1));
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
break;
}
}
}
}
}
}
6. SPI 服务注册
6.1 创建服务配置文件
在 src/main/resources/META-INF/services/ 目录下创建文件:
文件名: org.apache.flink.table.factories.Factory
文件内容:
com.example.flink.http.HttpDynamicTableSourceFactory
com.example.flink.http.HttpDynamicTableSinkFactory
这是 Java SPI 机制,Flink 会自动发现并注册你的连接器。
7. 完整使用示例
7.1 模拟 HTTP API 服务
首先创建一个简单的 HTTP 服务用于测试:
python
# http_api_server.py
from flask import Flask, request, jsonify
import random
from datetime import datetime
app = Flask(__name__)
# 模拟订单数据
orders = []
@app.route('/api/orders', methods=['GET'])
def get_orders():
"""返回订单列表"""
# 生成一些测试数据
new_orders = [
{
'order_id': f'ORD{i}',
'user_id': random.randint(1000, 1005),
'product_name': random.choice(['手机', '电脑', '耳机']),
'amount': round(random.uniform(100, 5000), 2),
'order_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}
for i in range(random.randint(1, 5))
]
return jsonify({
'code': 200,
'data': new_orders,
'message': 'success'
})
@app.route('/api/webhook', methods=['POST'])
def webhook():
"""接收 Flink 发送的数据"""
data = request.json
print(f"收到数据: {data}")
orders.append(data)
return jsonify({'code': 200, 'message': 'received'})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True)
启动服务:
bash
pip install flask
python http_api_server.py
7.2 Flink SQL 使用示例
java
package com.example.flink.demo;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
public class HttpConnectorDemo {
public static void main(String[] args) throws Exception {
// 1. 创建环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
// 2. 创建 HTTP Source 表
tableEnv.executeSql(
"CREATE TABLE http_orders (" +
" order_id STRING," +
" user_id BIGINT," +
" product_name STRING," +
" amount DOUBLE," +
" order_time STRING" +
") WITH (" +
" 'connector' = 'http'," +
" 'url' = 'http://localhost:5000/api/orders'," +
" 'method' = 'GET'," +
" 'poll.interval' = '5s'," +
" 'json.path' = 'data'" + // 从响应的 data 字段提取数组
")"
);
// 3. 创建 HTTP Sink 表(Webhook)
tableEnv.executeSql(
"CREATE TABLE http_webhook (" +
" user_id BIGINT," +
" total_amount DOUBLE," +
" order_count BIGINT" +
") WITH (" +
" 'connector' = 'http'," +
" 'url' = 'http://localhost:5000/api/webhook'," +
" 'method' = 'POST'," +
" 'max.retries' = '3'" +
")"
);
// 4. 查询并打印数据(测试)
tableEnv.executeSql("SELECT * FROM http_orders").print();
// 5. 实时统计并发送到 Webhook
tableEnv.executeSql(
"INSERT INTO http_webhook " +
"SELECT " +
" user_id," +
" SUM(amount) AS total_amount," +
" COUNT(*) AS order_count " +
"FROM http_orders " +
"GROUP BY user_id"
);
env.execute("HTTP Connector Demo");
}
}
7.3 纯 SQL 方式使用
sql
-- 创建 HTTP 源表
CREATE TABLE http_api_source (
order_id STRING,
user_id BIGINT,
product_name STRING,
amount DOUBLE,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'http',
'url' = 'http://localhost:5000/api/orders',
'method' = 'GET',
'poll.interval' = '10s',
'json.path' = 'data',
'max.retries' = '3'
);
-- 创建 HTTP Sink 表
CREATE TABLE http_result_sink (
user_id BIGINT,
total_amount DOUBLE,
order_count BIGINT,
update_time TIMESTAMP(3)
) WITH (
'connector' = 'http',
'url' = 'http://localhost:5000/api/webhook',
'method' = 'POST'
);
-- 实时统计并写入 Webhook
INSERT INTO http_result_sink
SELECT
user_id,
SUM(amount) AS total_amount,
COUNT(*) AS order_count,
CURRENT_TIMESTAMP AS update_time
FROM http_api_source
GROUP BY user_id;
7.4 带认证的示例
sql
-- 使用 Bearer Token 认证
CREATE TABLE secure_http_source (
id BIGINT,
name STRING,
value DOUBLE
) WITH (
'connector' = 'http',
'url' = 'https://api.example.com/data',
'method' = 'GET',
'auth.token' = 'your-bearer-token-here',
'poll.interval' = '30s'
);
8. 测试与调试
8.1 单元测试
java
package com.example.flink.http;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.junit.jupiter.api.Test;
public class HttpConnectorTest {
@Test
public void testHttpSource() throws Exception {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
// 创建表
tableEnv.executeSql(
"CREATE TABLE test_http (" +
" id INT," +
" name STRING" +
") WITH (" +
" 'connector' = 'http'," +
" 'url' = 'http://localhost:5000/api/test'," +
" 'poll.interval' = '5s'" +
")"
);
// 查询测试
tableEnv.executeSql("SELECT * FROM test_http LIMIT 10").print();
}
}
8.2 日志调试
在 log4j.properties 中启用调试日志:
properties
log4j.logger.com.example.flink.http=DEBUG
8.3 常见问题排查
问题 1: 连接器未找到
Caused by: org.apache.flink.table.api.ValidationException:
Could not find any factory for identifier 'http'
解决方案:
- 检查 SPI 配置文件是否正确
- 确认 JAR 包已添加到 Flink lib 目录
- 验证工厂类的
factoryIdentifier()返回值
问题 2: 数据无法解析
Error converting JSON to RowData
解决方案:
- 检查 JSON 结构是否匹配表定义
- 使用
json.path正确提取数据 - 验证字段类型映射
9. 最佳实践
9.1 性能优化
✅ 批量处理
java
// 在 Sink 中累积数据批量发送
private List<RowData> buffer = new ArrayList<>();
private static final int BATCH_SIZE = 100;
@Override
public void invoke(RowData value, Context context) {
buffer.add(value);
if (buffer.size() >= BATCH_SIZE) {
flushBuffer();
}
}
✅ 连接池复用
java
// 使用连接池管理 HTTP 客户端
private static final PoolingHttpClientConnectionManager connManager =
new PoolingHttpClientConnectionManager();
static {
connManager.setMaxTotal(200);
connManager.setDefaultMaxPerRoute(20);
}
✅ 异步请求
java
// 使用异步 HTTP 客户端提高吞吐量
CloseableHttpAsyncClient asyncClient = HttpAsyncClients.createDefault();
9.2 容错处理
✅ 重试机制
java
// 指数退避重试
for (int retry = 0; retry <= maxRetries; retry++) {
try {
// 执行请求
return executeRequest();
} catch (Exception e) {
if (retry < maxRetries) {
Thread.sleep((long) Math.pow(2, retry) * 1000);
} else {
throw e;
}
}
}
✅ 超时设置
java
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(Timeout.ofSeconds(10))
.setResponseTimeout(Timeout.ofSeconds(30))
.build();
9.3 监控指标
java
// 添加自定义 Metrics
public class HttpSourceFunction extends RichSourceFunction<RowData> {
private transient Counter requestCounter;
private transient Meter errorMeter;
@Override
public void open(Configuration parameters) {
requestCounter = getRuntimeContext()
.getMetricGroup()
.counter("http_requests");
errorMeter = getRuntimeContext()
.getMetricGroup()
.meter("http_errors", new MeterView(60));
}
private void executeRequest() {
requestCounter.inc();
try {
// HTTP 请求
} catch (Exception e) {
errorMeter.markEvent();
throw e;
}
}
}
10. 总结
本教程完整讲解了 Flink 1.20 自定义 SQL 连接器的开发:
✅ 核心组件 :Factory、DynamicTableSource/Sink、SourceFunction/SinkFunction
✅ 完整实现 :HTTP 连接器的 Source 和 Sink
✅ 实战示例 :Flask API + Flink SQL 完整演示
✅ 最佳实践:性能优化、容错处理、监控指标
通过这个 HTTP 连接器示例,你可以举一反三,开发其他自定义连接器,如:
- WebSocket 连接器
- MongoDB 连接器
- Redis 连接器
- 企业内部系统连接器