Flink 1.20 自定义SQL连接器实战

📚 完整讲解 Apache Flink 1.20 自定义 Table/SQL 连接器开发,包含完整 HTTP 连接器源码和示例

目录

  • [1. 概述](#1. 概述)
  • [2. 核心架构](#2. 核心架构)
  • [3. Maven 项目配置](#3. Maven 项目配置)
  • [4. Source 连接器实现](#4. Source 连接器实现)
  • [5. Sink 连接器实现](#5. Sink 连接器实现)
  • [6. SPI 服务注册](#6. SPI 服务注册)
  • [7. 完整使用示例](#7. 完整使用示例)
  • [8. 测试与调试](#8. 测试与调试)
  • [9. 最佳实践](#9. 最佳实践)

1. 概述

1.1 什么是自定义连接器

Flink 连接器是 Flink 与外部系统的桥梁:

  • Source(源):从外部系统读取数据
  • Sink(汇):向外部系统写入数据

1.2 HTTP 连接器应用场景

  • ✅ 调用 RESTful API 获取实时数据
  • ✅ 轮询 HTTP 接口获取更新
  • ✅ 将处理结果 POST 到 Webhook
  • ✅ 对接第三方数据服务

2. 核心架构

2.1 组件关系图

复制代码
SQL/Table API
     ↓
Factory (工厂类)
     ↓
DynamicTableSource/Sink
     ↓
SourceFunction/SinkFunction
     ↓
外部 HTTP API

2.2 核心接口

接口 作用
DynamicTableSourceFactory 创建 Source
DynamicTableSinkFactory 创建 Sink
ScanTableSource 扫描表源
RichSourceFunction<RowData> 读取数据逻辑
RichSinkFunction<RowData> 写入数据逻辑

3. Maven 项目配置

3.1 pom.xml

xml 复制代码
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
    
    <groupId>com.example.flink</groupId>
    <artifactId>flink-connector-http</artifactId>
    <version>1.0.0</version>
    
    <properties>
        <flink.version>1.20.0</flink.version>
        <java.version>11</java.version>
    </properties>
    
    <dependencies>
        <!-- Flink Table API -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
        
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
        
        <!-- HTTP 客户端 -->
        <dependency>
            <groupId>org.apache.httpcomponents.client5</groupId>
            <artifactId>httpclient5</artifactId>
            <version>5.2.1</version>
        </dependency>
        
        <!-- JSON 处理 -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.15.2</version>
        </dependency>
    </dependencies>
</project>

4. Source 连接器实现

4.1 配置选项类

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ConfigOptions;
import java.time.Duration;

public class HttpOptions {
    public static final String IDENTIFIER = "http";
    
    public static final ConfigOption<String> URL =
        ConfigOptions.key("url")
            .stringType()
            .noDefaultValue()
            .withDescription("HTTP endpoint URL");
    
    public static final ConfigOption<String> METHOD =
        ConfigOptions.key("method")
            .stringType()
            .defaultValue("GET")
            .withDescription("HTTP method");
    
    public static final ConfigOption<String> AUTH_TOKEN =
        ConfigOptions.key("auth.token")
            .stringType()
            .noDefaultValue()
            .withDescription("Bearer token");
    
    public static final ConfigOption<Duration> POLL_INTERVAL =
        ConfigOptions.key("poll.interval")
            .durationType()
            .defaultValue(Duration.ofSeconds(5))
            .withDescription("Polling interval");
    
    public static final ConfigOption<String> JSON_PATH =
        ConfigOptions.key("json.path")
            .stringType()
            .noDefaultValue()
            .withDescription("JSON path to extract data");
    
    public static final ConfigOption<Integer> MAX_RETRIES =
        ConfigOptions.key("max.retries")
            .intType()
            .defaultValue(3)
            .withDescription("Max retry attempts");
}

4.2 Source Factory

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.table.connector.source.DynamicTableSource;
import org.apache.flink.table.factories.DynamicTableSourceFactory;
import org.apache.flink.table.factories.FactoryUtil;
import java.util.HashSet;
import java.util.Set;

public class HttpDynamicTableSourceFactory implements DynamicTableSourceFactory {
    
    @Override
    public String factoryIdentifier() {
        return HttpOptions.IDENTIFIER;
    }
    
    @Override
    public Set<ConfigOption<?>> requiredOptions() {
        Set<ConfigOption<?>> options = new HashSet<>();
        options.add(HttpOptions.URL);
        return options;
    }
    
    @Override
    public Set<ConfigOption<?>> optionalOptions() {
        Set<ConfigOption<?>> options = new HashSet<>();
        options.add(HttpOptions.METHOD);
        options.add(HttpOptions.AUTH_TOKEN);
        options.add(HttpOptions.POLL_INTERVAL);
        options.add(HttpOptions.JSON_PATH);
        options.add(HttpOptions.MAX_RETRIES);
        return options;
    }
    
    @Override
    public DynamicTableSource createDynamicTableSource(Context context) {
        FactoryUtil.TableFactoryHelper helper = 
            FactoryUtil.createTableFactoryHelper(this, context);
        helper.validate();
        
        ReadableConfig config = helper.getOptions();
        
        return new HttpDynamicTableSource(
            context.getCatalogTable().getResolvedSchema(),
            config
        );
    }
}

4.3 Dynamic Table Source

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.connector.ChangelogMode;
import org.apache.flink.table.connector.source.DynamicTableSource;
import org.apache.flink.table.connector.source.ScanTableSource;
import org.apache.flink.table.connector.source.SourceFunctionProvider;
import org.apache.flink.table.data.RowData;

public class HttpDynamicTableSource implements ScanTableSource {
    
    private final ResolvedSchema schema;
    private final ReadableConfig config;
    
    public HttpDynamicTableSource(ResolvedSchema schema, ReadableConfig config) {
        this.schema = schema;
        this.config = config;
    }
    
    @Override
    public ChangelogMode getChangelogMode() {
        return ChangelogMode.insertOnly();
    }
    
    @Override
    public ScanRuntimeProvider getScanRuntimeProvider(ScanContext context) {
        SourceFunction<RowData> sourceFunction = 
            new HttpSourceFunction(schema, config);
        return SourceFunctionProvider.of(sourceFunction, false);
    }
    
    @Override
    public DynamicTableSource copy() {
        return new HttpDynamicTableSource(schema, config);
    }
    
    @Override
    public String asSummaryString() {
        return "HTTP Table Source";
    }
}

4.4 Source Function(核心逻辑)

java 复制代码
package com.example.flink.http;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.data.GenericRowData;
import org.apache.flink.table.data.RowData;
import org.apache.flink.table.data.StringData;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.Duration;

public class HttpSourceFunction extends RichSourceFunction<RowData> {
    
    private static final Logger LOG = LoggerFactory.getLogger(HttpSourceFunction.class);
    private static final ObjectMapper MAPPER = new ObjectMapper();
    
    private final ResolvedSchema schema;
    private final ReadableConfig config;
    private volatile boolean running = true;
    private CloseableHttpClient httpClient;
    
    public HttpSourceFunction(ResolvedSchema schema, ReadableConfig config) {
        this.schema = schema;
        this.config = config;
    }
    
    @Override
    public void open(org.apache.flink.configuration.Configuration parameters) {
        httpClient = HttpClients.createDefault();
        LOG.info("HTTP Source opened");
    }
    
    @Override
    public void run(SourceContext<RowData> ctx) throws Exception {
        String url = config.get(HttpOptions.URL);
        Duration pollInterval = config.get(HttpOptions.POLL_INTERVAL);
        String jsonPath = config.getOptional(HttpOptions.JSON_PATH).orElse(null);
        
        LOG.info("Start polling from: {}", url);
        
        while (running) {
            try {
                // 执行 HTTP GET 请求
                String response = fetchData(url);
                
                if (response != null) {
                    // 解析 JSON
                    JsonNode root = MAPPER.readTree(response);
                    JsonNode dataNode = extractDataNode(root, jsonPath);
                    
                    // 转换为 RowData 并发送
                    if (dataNode != null) {
                        if (dataNode.isArray()) {
                            for (JsonNode item : dataNode) {
                                RowData row = jsonToRowData(item);
                                if (row != null) {
                                    ctx.collect(row);
                                }
                            }
                        } else {
                            RowData row = jsonToRowData(dataNode);
                            if (row != null) {
                                ctx.collect(row);
                            }
                        }
                    }
                }
                
                Thread.sleep(pollInterval.toMillis());
                
            } catch (InterruptedException e) {
                LOG.info("HTTP source interrupted");
                break;
            } catch (Exception e) {
                LOG.error("Error fetching HTTP data", e);
                Thread.sleep(pollInterval.toMillis());
            }
        }
    }
    
    @Override
    public void cancel() {
        running = false;
    }
    
    @Override
    public void close() throws Exception {
        if (httpClient != null) {
            httpClient.close();
        }
    }
    
    /**
     * 从 HTTP 端点获取数据
     */
    private String fetchData(String url) {
        try {
            HttpGet request = new HttpGet(url);
            
            // 设置认证 Token
            config.getOptional(HttpOptions.AUTH_TOKEN).ifPresent(token ->
                request.setHeader("Authorization", "Bearer " + token)
            );
            
            try (CloseableHttpResponse response = httpClient.execute(request)) {
                if (response.getCode() >= 200 && response.getCode() < 300) {
                    return EntityUtils.toString(response.getEntity());
                } else {
                    LOG.warn("HTTP request failed: {}", response.getCode());
                }
            }
        } catch (Exception e) {
            LOG.error("HTTP request error", e);
        }
        return null;
    }
    
    /**
     * 从 JSON 响应中提取数据节点
     */
    private JsonNode extractDataNode(JsonNode root, String jsonPath) {
        if (jsonPath == null || jsonPath.isEmpty()) {
            return root;
        }
        
        JsonNode current = root;
        for (String part : jsonPath.split("\\.")) {
            current = current.get(part);
            if (current == null) {
                LOG.warn("JSON path not found: {}", jsonPath);
                return null;
            }
        }
        return current;
    }
    
    /**
     * 将 JSON 转换为 RowData
     */
    private RowData jsonToRowData(JsonNode json) {
        try {
            int fieldCount = schema.getColumnCount();
            GenericRowData row = new GenericRowData(fieldCount);
            
            for (int i = 0; i < fieldCount; i++) {
                String fieldName = schema.getColumnNames().get(i);
                LogicalType type = schema.getColumnDataTypes().get(i).getLogicalType();
                JsonNode fieldNode = json.get(fieldName);
                
                if (fieldNode == null || fieldNode.isNull()) {
                    row.setField(i, null);
                } else {
                    row.setField(i, convertValue(fieldNode, type));
                }
            }
            
            return row;
        } catch (Exception e) {
            LOG.error("Error converting JSON to RowData", e);
            return null;
        }
    }
    
    /**
     * 类型转换
     */
    private Object convertValue(JsonNode node, LogicalType type) {
        LogicalTypeRoot typeRoot = type.getTypeRoot();
        
        switch (typeRoot) {
            case VARCHAR:
            case CHAR:
                return StringData.fromString(node.asText());
            case BOOLEAN:
                return node.asBoolean();
            case INTEGER:
                return node.asInt();
            case BIGINT:
                return node.asLong();
            case DOUBLE:
                return node.asDouble();
            case FLOAT:
                return (float) node.asDouble();
            default:
                return StringData.fromString(node.asText());
        }
    }
}

5. Sink 连接器实现

5.1 Sink Factory

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.factories.DynamicTableSinkFactory;
import org.apache.flink.table.factories.FactoryUtil;
import java.util.HashSet;
import java.util.Set;

public class HttpDynamicTableSinkFactory implements DynamicTableSinkFactory {
    
    @Override
    public String factoryIdentifier() {
        return HttpOptions.IDENTIFIER;
    }
    
    @Override
    public Set<ConfigOption<?>> requiredOptions() {
        Set<ConfigOption<?>> options = new HashSet<>();
        options.add(HttpOptions.URL);
        return options;
    }
    
    @Override
    public Set<ConfigOption<?>> optionalOptions() {
        Set<ConfigOption<?>> options = new HashSet<>();
        options.add(HttpOptions.METHOD);
        options.add(HttpOptions.AUTH_TOKEN);
        options.add(HttpOptions.MAX_RETRIES);
        return options;
    }
    
    @Override
    public DynamicTableSink createDynamicTableSink(Context context) {
        FactoryUtil.TableFactoryHelper helper = 
            FactoryUtil.createTableFactoryHelper(this, context);
        helper.validate();
        
        return new HttpDynamicTableSink(
            context.getCatalogTable().getResolvedSchema(),
            helper.getOptions()
        );
    }
}

5.2 Dynamic Table Sink

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.connector.ChangelogMode;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.connector.sink.SinkFunctionProvider;
import org.apache.flink.table.data.RowData;

public class HttpDynamicTableSink implements DynamicTableSink {
    
    private final ResolvedSchema schema;
    private final ReadableConfig config;
    
    public HttpDynamicTableSink(ResolvedSchema schema, ReadableConfig config) {
        this.schema = schema;
        this.config = config;
    }
    
    @Override
    public ChangelogMode getChangelogMode(ChangelogMode requestedMode) {
        return ChangelogMode.insertOnly();
    }
    
    @Override
    public SinkRuntimeProvider getSinkRuntimeProvider(Context context) {
        SinkFunction<RowData> sinkFunction = 
            new HttpSinkFunction(schema, config);
        return SinkFunctionProvider.of(sinkFunction);
    }
    
    @Override
    public DynamicTableSink copy() {
        return new HttpDynamicTableSink(schema, config);
    }
    
    @Override
    public String asSummaryString() {
        return "HTTP Table Sink";
    }
}

5.3 Sink Function

java 复制代码
package com.example.flink.http;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import org.apache.flink.configuration.ReadableConfig;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.table.catalog.ResolvedSchema;
import org.apache.flink.table.data.RowData;
import org.apache.flink.table.data.StringData;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HttpSinkFunction extends RichSinkFunction<RowData> {
    
    private static final Logger LOG = LoggerFactory.getLogger(HttpSinkFunction.class);
    private static final ObjectMapper MAPPER = new ObjectMapper();
    
    private final ResolvedSchema schema;
    private final ReadableConfig config;
    private CloseableHttpClient httpClient;
    
    public HttpSinkFunction(ResolvedSchema schema, ReadableConfig config) {
        this.schema = schema;
        this.config = config;
    }
    
    @Override
    public void open(org.apache.flink.configuration.Configuration parameters) {
        httpClient = HttpClients.createDefault();
        LOG.info("HTTP Sink opened");
    }
    
    @Override
    public void invoke(RowData value, Context context) throws Exception {
        String url = config.get(HttpOptions.URL);
        
        // 转换 RowData 为 JSON
        ObjectNode json = rowDataToJson(value);
        String jsonString = MAPPER.writeValueAsString(json);
        
        // 发送 HTTP POST 请求
        sendData(url, jsonString);
    }
    
    @Override
    public void close() throws Exception {
        if (httpClient != null) {
            httpClient.close();
        }
    }
    
    /**
     * 将 RowData 转换为 JSON
     */
    private ObjectNode rowDataToJson(RowData row) {
        ObjectNode json = MAPPER.createObjectNode();
        
        for (int i = 0; i < schema.getColumnCount(); i++) {
            String fieldName = schema.getColumnNames().get(i);
            LogicalType type = schema.getColumnDataTypes().get(i).getLogicalType();
            
            if (row.isNullAt(i)) {
                json.putNull(fieldName);
                continue;
            }
            
            switch (type.getTypeRoot()) {
                case VARCHAR:
                case CHAR:
                    json.put(fieldName, row.getString(i).toString());
                    break;
                case BOOLEAN:
                    json.put(fieldName, row.getBoolean(i));
                    break;
                case INTEGER:
                    json.put(fieldName, row.getInt(i));
                    break;
                case BIGINT:
                    json.put(fieldName, row.getLong(i));
                    break;
                case DOUBLE:
                    json.put(fieldName, row.getDouble(i));
                    break;
                case FLOAT:
                    json.put(fieldName, row.getFloat(i));
                    break;
                default:
                    json.put(fieldName, row.getString(i).toString());
            }
        }
        
        return json;
    }
    
    /**
     * 发送数据到 HTTP 端点
     */
    private void sendData(String url, String jsonData) {
        int maxRetries = config.get(HttpOptions.MAX_RETRIES);
        
        for (int retry = 0; retry <= maxRetries; retry++) {
            try {
                HttpPost request = new HttpPost(url);
                request.setHeader("Content-Type", "application/json");
                
                // 设置认证
                config.getOptional(HttpOptions.AUTH_TOKEN).ifPresent(token ->
                    request.setHeader("Authorization", "Bearer " + token)
                );
                
                request.setEntity(new StringEntity(jsonData));
                
                try (CloseableHttpResponse response = httpClient.execute(request)) {
                    if (response.getCode() >= 200 && response.getCode() < 300) {
                        LOG.debug("Data sent successfully");
                        return;
                    } else {
                        LOG.warn("HTTP request failed: {}", response.getCode());
                    }
                }
            } catch (Exception e) {
                LOG.error("Error sending data (attempt {}/{})", retry + 1, maxRetries + 1, e);
                if (retry < maxRetries) {
                    try {
                        Thread.sleep(1000 * (retry + 1));
                    } catch (InterruptedException ie) {
                        Thread.currentThread().interrupt();
                        break;
                    }
                }
            }
        }
    }
}

6. SPI 服务注册

6.1 创建服务配置文件

src/main/resources/META-INF/services/ 目录下创建文件:

文件名: org.apache.flink.table.factories.Factory

文件内容:

复制代码
com.example.flink.http.HttpDynamicTableSourceFactory
com.example.flink.http.HttpDynamicTableSinkFactory

这是 Java SPI 机制,Flink 会自动发现并注册你的连接器。


7. 完整使用示例

7.1 模拟 HTTP API 服务

首先创建一个简单的 HTTP 服务用于测试:

python 复制代码
# http_api_server.py
from flask import Flask, request, jsonify
import random
from datetime import datetime

app = Flask(__name__)

# 模拟订单数据
orders = []

@app.route('/api/orders', methods=['GET'])
def get_orders():
    """返回订单列表"""
    # 生成一些测试数据
    new_orders = [
        {
            'order_id': f'ORD{i}',
            'user_id': random.randint(1000, 1005),
            'product_name': random.choice(['手机', '电脑', '耳机']),
            'amount': round(random.uniform(100, 5000), 2),
            'order_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        }
        for i in range(random.randint(1, 5))
    ]
    
    return jsonify({
        'code': 200,
        'data': new_orders,
        'message': 'success'
    })

@app.route('/api/webhook', methods=['POST'])
def webhook():
    """接收 Flink 发送的数据"""
    data = request.json
    print(f"收到数据: {data}")
    orders.append(data)
    return jsonify({'code': 200, 'message': 'received'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

启动服务:

bash 复制代码
pip install flask
python http_api_server.py
java 复制代码
package com.example.flink.demo;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

public class HttpConnectorDemo {
    
    public static void main(String[] args) throws Exception {
        // 1. 创建环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
        
        // 2. 创建 HTTP Source 表
        tableEnv.executeSql(
            "CREATE TABLE http_orders (" +
            "  order_id STRING," +
            "  user_id BIGINT," +
            "  product_name STRING," +
            "  amount DOUBLE," +
            "  order_time STRING" +
            ") WITH (" +
            "  'connector' = 'http'," +
            "  'url' = 'http://localhost:5000/api/orders'," +
            "  'method' = 'GET'," +
            "  'poll.interval' = '5s'," +
            "  'json.path' = 'data'" +  // 从响应的 data 字段提取数组
            ")"
        );
        
        // 3. 创建 HTTP Sink 表(Webhook)
        tableEnv.executeSql(
            "CREATE TABLE http_webhook (" +
            "  user_id BIGINT," +
            "  total_amount DOUBLE," +
            "  order_count BIGINT" +
            ") WITH (" +
            "  'connector' = 'http'," +
            "  'url' = 'http://localhost:5000/api/webhook'," +
            "  'method' = 'POST'," +
            "  'max.retries' = '3'" +
            ")"
        );
        
        // 4. 查询并打印数据(测试)
        tableEnv.executeSql("SELECT * FROM http_orders").print();
        
        // 5. 实时统计并发送到 Webhook
        tableEnv.executeSql(
            "INSERT INTO http_webhook " +
            "SELECT " +
            "  user_id," +
            "  SUM(amount) AS total_amount," +
            "  COUNT(*) AS order_count " +
            "FROM http_orders " +
            "GROUP BY user_id"
        );
        
        env.execute("HTTP Connector Demo");
    }
}

7.3 纯 SQL 方式使用

sql 复制代码
-- 创建 HTTP 源表
CREATE TABLE http_api_source (
    order_id STRING,
    user_id BIGINT,
    product_name STRING,
    amount DOUBLE,
    order_time TIMESTAMP(3)
) WITH (
    'connector' = 'http',
    'url' = 'http://localhost:5000/api/orders',
    'method' = 'GET',
    'poll.interval' = '10s',
    'json.path' = 'data',
    'max.retries' = '3'
);

-- 创建 HTTP Sink 表
CREATE TABLE http_result_sink (
    user_id BIGINT,
    total_amount DOUBLE,
    order_count BIGINT,
    update_time TIMESTAMP(3)
) WITH (
    'connector' = 'http',
    'url' = 'http://localhost:5000/api/webhook',
    'method' = 'POST'
);

-- 实时统计并写入 Webhook
INSERT INTO http_result_sink
SELECT 
    user_id,
    SUM(amount) AS total_amount,
    COUNT(*) AS order_count,
    CURRENT_TIMESTAMP AS update_time
FROM http_api_source
GROUP BY user_id;

7.4 带认证的示例

sql 复制代码
-- 使用 Bearer Token 认证
CREATE TABLE secure_http_source (
    id BIGINT,
    name STRING,
    value DOUBLE
) WITH (
    'connector' = 'http',
    'url' = 'https://api.example.com/data',
    'method' = 'GET',
    'auth.token' = 'your-bearer-token-here',
    'poll.interval' = '30s'
);

8. 测试与调试

8.1 单元测试

java 复制代码
package com.example.flink.http;

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.junit.jupiter.api.Test;

public class HttpConnectorTest {
    
    @Test
    public void testHttpSource() throws Exception {
        StreamExecutionEnvironment env = 
            StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        
        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
        
        // 创建表
        tableEnv.executeSql(
            "CREATE TABLE test_http (" +
            "  id INT," +
            "  name STRING" +
            ") WITH (" +
            "  'connector' = 'http'," +
            "  'url' = 'http://localhost:5000/api/test'," +
            "  'poll.interval' = '5s'" +
            ")"
        );
        
        // 查询测试
        tableEnv.executeSql("SELECT * FROM test_http LIMIT 10").print();
    }
}

8.2 日志调试

log4j.properties 中启用调试日志:

properties 复制代码
log4j.logger.com.example.flink.http=DEBUG

8.3 常见问题排查

问题 1: 连接器未找到
复制代码
Caused by: org.apache.flink.table.api.ValidationException: 
Could not find any factory for identifier 'http'

解决方案:

  • 检查 SPI 配置文件是否正确
  • 确认 JAR 包已添加到 Flink lib 目录
  • 验证工厂类的 factoryIdentifier() 返回值
问题 2: 数据无法解析
复制代码
Error converting JSON to RowData

解决方案:

  • 检查 JSON 结构是否匹配表定义
  • 使用 json.path 正确提取数据
  • 验证字段类型映射

9. 最佳实践

9.1 性能优化

批量处理

java 复制代码
// 在 Sink 中累积数据批量发送
private List<RowData> buffer = new ArrayList<>();
private static final int BATCH_SIZE = 100;

@Override
public void invoke(RowData value, Context context) {
    buffer.add(value);
    if (buffer.size() >= BATCH_SIZE) {
        flushBuffer();
    }
}

连接池复用

java 复制代码
// 使用连接池管理 HTTP 客户端
private static final PoolingHttpClientConnectionManager connManager = 
    new PoolingHttpClientConnectionManager();

static {
    connManager.setMaxTotal(200);
    connManager.setDefaultMaxPerRoute(20);
}

异步请求

java 复制代码
// 使用异步 HTTP 客户端提高吞吐量
CloseableHttpAsyncClient asyncClient = HttpAsyncClients.createDefault();

9.2 容错处理

重试机制

java 复制代码
// 指数退避重试
for (int retry = 0; retry <= maxRetries; retry++) {
    try {
        // 执行请求
        return executeRequest();
    } catch (Exception e) {
        if (retry < maxRetries) {
            Thread.sleep((long) Math.pow(2, retry) * 1000);
        } else {
            throw e;
        }
    }
}

超时设置

java 复制代码
RequestConfig requestConfig = RequestConfig.custom()
    .setConnectTimeout(Timeout.ofSeconds(10))
    .setResponseTimeout(Timeout.ofSeconds(30))
    .build();

9.3 监控指标

java 复制代码
// 添加自定义 Metrics
public class HttpSourceFunction extends RichSourceFunction<RowData> {
    
    private transient Counter requestCounter;
    private transient Meter errorMeter;
    
    @Override
    public void open(Configuration parameters) {
        requestCounter = getRuntimeContext()
            .getMetricGroup()
            .counter("http_requests");
            
        errorMeter = getRuntimeContext()
            .getMetricGroup()
            .meter("http_errors", new MeterView(60));
    }
    
    private void executeRequest() {
        requestCounter.inc();
        try {
            // HTTP 请求
        } catch (Exception e) {
            errorMeter.markEvent();
            throw e;
        }
    }
}

10. 总结

本教程完整讲解了 Flink 1.20 自定义 SQL 连接器的开发:

核心组件 :Factory、DynamicTableSource/Sink、SourceFunction/SinkFunction

完整实现 :HTTP 连接器的 Source 和 Sink

实战示例 :Flask API + Flink SQL 完整演示

最佳实践:性能优化、容错处理、监控指标

通过这个 HTTP 连接器示例,你可以举一反三,开发其他自定义连接器,如:

  • WebSocket 连接器
  • MongoDB 连接器
  • Redis 连接器
  • 企业内部系统连接器

参考资源

相关推荐
数字化脑洞实验室10 分钟前
智能决策系统落地后如何进行数据集成与安全保障?
大数据
学习中的程序媛~38 分钟前
Spring 事务(@Transactional)与异步(@Async / CompletableFuture)结合的陷阱与最佳实践
java·数据库·sql
老葱头蒸鸡1 小时前
(4)Kafka消费者分区策略、Rebalance、Offset存储机制
sql·kafka·linq
员大头硬花生1 小时前
九、InnoDB引擎-MVCC
数据库·sql·mysql
微学AI3 小时前
面向大数据与物联网的下一代时序数据库选型指南:Apache IoTDB 解析与应用
大数据·物联网·时序数据库
人大博士的交易之路4 小时前
今日行情明日机会——20251113
大数据·数据挖掘·数据分析·缠论·道琼斯结构·涨停板
B站计算机毕业设计之家4 小时前
基于Python+Django+双协同过滤豆瓣电影推荐系统 协同过滤推荐算法 爬虫 大数据毕业设计(源码+文档)✅
大数据·爬虫·python·机器学习·数据分析·django·推荐算法
羑悻的小杀马特4 小时前
openGauss 数据库快速上手评测:从 Docker 安装到SQL 实战
数据库·sql·docker·opengauss
德迅云安全-小潘4 小时前
SQL:从数据基石到安全前线的双重审视
数据库·sql·安全
WLJT1231231235 小时前
方寸之间藏智慧:家用电器的进化与生活革新
大数据·人工智能