Flink DataStream2Table 总结

前言

平时工作中有时会遇到 DataStream 转 Table 的需求,之前也写过几次,但是不总结就会忘掉还得专门扒拉之前写的代码,比较麻烦,现在总结一下,方便后面再有需要时查找。

DataStream

DataStream<Tuple> 转 Table

java 复制代码
DataStream<Tuple5<Integer, String, Double, Long, String>> dataStream = env.fromCollection(
        java.util.Arrays.asList(
                Tuple5.of(1, "手机", 5999.99, 1733400000000L, "2025-12-05"),
                Tuple5.of(2, "电脑", 8999.50, 1733400001000L, "2025-12-05"),
                Tuple5.of(3, "耳机", 399.00, 1733400002000L, "2025-12-05")
        )
);

// 字段顺序必须和 Tuple5 一致:id(第1位)、name(第2位)、price(第3位)、ts(第4位)、dt(第5位)
Table table = tableEnv.fromDataStream(
        dataStream,
        "id, name, price, ts, dt" // 直接指定字段名,类型会自动匹配(Tuple5 的泛型)
);

table.execute().print();    
table.printSchema();

打印如下:

java 复制代码
+-------------+--------------------------------+--------------------------------+----------------------+--------------------------------+
|          id |                           name |                          price |                   ts |                             dt |
+-------------+--------------------------------+--------------------------------+----------------------+--------------------------------+
|           1 |                           手机 |                        5999.99 |        1733400000000 |                     2025-12-05 |
|           2 |                           电脑 |                         8999.5 |        1733400001000 |                     2025-12-05 |
|           3 |                           耳机 |                          399.0 |        1733400002000 |                     2025-12-05 |
+-------------+--------------------------------+--------------------------------+----------------------+--------------------------------+
3 rows in set
(
  `id` INT,
  `name` STRING,
  `price` DOUBLE,
  `ts` BIGINT,
  `dt` STRING
)

如果不指定字段名称,则默认的字段名称为 f0,f1,f2,f3,f4

DataStream

DataStream<Row> 转 Table

java 复制代码
DataStream<Row> dataStream = env.fromCollection(
        java.util.Arrays.asList(
                Row.of(1, "手机", 5999.99, 1733400000000L, "2025-12-05"),
                Row.of(2, "电脑", 8999.50, 1733400001000L, "2025-12-05"),
                Row.of(3, "耳机", 399.00, 1733400002000L, "2025-12-05")
        )
);

Table table = tableEnv.fromDataStream(
        dataStream,
        "id, name, price, ts, dt"
);

table.execute().print();

结果和 DataStream 一样。

DataStream

其实 DataStream<Tuple>DataStream<Row> 都比较简单,DataStream<JSONArray> 稍微复杂一些,需要先将其转换为 DataStream<Tuple>DataStream<Row>,复杂点就主要在于这个转换的过程。本文主要记录的就是 DataStream<JSONArray> 怎么转化为 Table 的, 这个可能实际需求也会更多一点,比如将从接口返回的 JSONArray 转为 Table。

造数

java 复制代码
private static JSONArray getJsonArray() {
    JSONArray jsonArray = new JSONArray();
    // 第一条数据
    JSONObject json1 = new JSONObject();
    json1.put("id", 1);
    json1.put("name", "手机");
    json1.put("price", 5999.99);
    json1.put("ts", 1733400000000L);
    json1.put("dt", "2025-12-05");
    jsonArray.add(json1);

    // 第二条数据
    JSONObject json2 = new JSONObject();
    json2.put("id", 2);
    json2.put("name", "电脑");
    json2.put("price", 8999.50);
    json2.put("ts", 1733400001000L);
    json2.put("dt", "2025-12-05");
    jsonArray.add(json2);

    // 第三条数据
    JSONObject json3 = new JSONObject();
    json3.put("id", 3);
    json3.put("name", "耳机");
    json3.put("price", 399.00);
    json3.put("ts", 1733400002000L);
    json3.put("dt", "2025-12-05");
    jsonArray.add(json3);
    return jsonArray;
}

构造 DataStream 和 DataStream

JSONArray 可以直接构造出 DataStream 或者 DataStream

java 复制代码
JSONArray jsonArray = getJsonArray();
DataStream<JSONArray> jsonArrayDataStream = env.fromElements(jsonArray);
jsonArrayDataStream.print();
env.execute();

List<JSONObject> jsonList = new ArrayList<>();
for (int i = 0; i < jsonArray.size(); i++) {
    jsonList.add(jsonArray.getJSONObject(i));
}

DataStream<JSONObject> jsonObjectDataStream = env.fromCollection(jsonList);
jsonObjectDataStream.print();
env.execute();

打印结果:

java 复制代码
[{"dt":"2025-12-05","price":5999.99,"name":"手机","id":1,"ts":1733400000000},{"dt":"2025-12-05","price":8999.5,"name":"电脑","id":2,"ts":1733400001000},{"dt":"2025-12-05","price":399.0,"name":"耳机","id":3,"ts":1733400002000}]
{"dt":"2025-12-05","price":5999.99,"name":"手机","id":1,"ts":1733400000000}
{"dt":"2025-12-05","price":8999.5,"name":"电脑","id":2,"ts":1733400001000}
{"dt":"2025-12-05","price":399.0,"name":"耳机","id":3,"ts":1733400002000}

然后 DataStream 通过 flatMap 转成 DataStream 和 DataStream (DataStream 通过Map),后面就和DataStream 和 DataStream转 Table 一样了,但是在转 DataStream 和 DataStream 时会遇到点问题,我们分别来看

先转 DataStream

java 复制代码
JSONArray jsonArray = getJsonArray();
DataStream<JSONArray> jsonArrayDataStream = env.fromElements(jsonArray);

DataStream<Tuple5<Integer, String, Double, Long, String>> dataStream = jsonArrayDataStream
        .flatMap((FlatMapFunction<JSONArray, Tuple5<Integer, String, Double, Long, String>>) (jsonArray1, out) -> {
            // 遍历 JSONArray 中的每个 JSONObject
            for (int i = 0; i < jsonArray1.size(); i++) {
                JSONObject object = jsonArray1.getJSONObject(i);
                // 提取字段,构造 Tuple5
                out.collect(Tuple5.of(
                        object.getInteger("id"),
                        object.getString("name"),
                        object.getDouble("price"),
                        object.getLong("ts"),
                        object.getString("dt")
                ));
            }
        });
Table table = tableEnv.fromDataStream(
        dataStream,
        "id, name, price, ts, dt" // 直接指定字段名,类型会自动匹配(Tuple5 的泛型)
);

table.execute().print();

但是这样会报错:

java 复制代码
The return type of function 'dataStreamTupleLambda2Table(JsonArrayToTable.java:104)' could not be determined automatically, due to type erasure. You can give type information hints by using the returns(...) method on the result of the transformation call, or by letting your function implement the 'ResultTypeQueryable' interface.
    at org.apache.flink.api.dag.Transformation.getOutputType(Transformation.java:508)
    at org.apache.flink.streaming.api.datastream.DataStream.getType(DataStream.java:191)
    at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.asQueryOperation(AbstractStreamTableEnvironmentImpl.java:277)
    at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.fromDataStream(StreamTableEnvironmentImpl.java:295)
    at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.fromDataStream(StreamTableEnvironmentImpl.java:290)
    at com.dkl.flink.stream.JsonArrayToTable.dataStreamTupleLambda2Table(JsonArrayToTable.java:118)
    at com.dkl.flink.stream.JsonArrayToTable.main(JsonArrayToTable.java:25)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The generic type parameters of 'Collector' are missing. In many cases lambda methods don't provide enough information for automatic type extraction when Java generics are involved. An easy workaround is to use an (anonymous) class instead that implements the 'org.apache.flink.api.common.functions.FlatMapFunction' interface. Otherwise the type has to be specified explicitly using type information.
    at org.apache.flink.api.java.typeutils.TypeExtractionUtils.validateLambdaType(TypeExtractionUtils.java:371)
    at org.apache.flink.api.java.typeutils.TypeExtractionUtils.extractTypeFromLambda(TypeExtractionUtils.java:188)
    at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:557)
    at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:174)
    at org.apache.flink.streaming.api.datastream.DataStream.flatMap(DataStream.java:610)
    at com.dkl.flink.stream.JsonArrayToTable.dataStreamTupleLambda2Table(JsonArrayToTable.java:104)
    ... 1 more

有两个异常信息提示:

  • The return type of function 'dataStreamTupleLambda2Table(JsonArrayToTable.java:104)' could not be determined automatically, due to type erasure. You can give type information hints by using the returns(...) method on the result of the transformation call, or by letting your function implement the 'ResultTypeQueryable' interface.
  • 由于类型擦除,无法自动确定函数"dataStreamTupleLambda2Table(JsonArrayToTable.java:104)"的返回类型。您可以通过在转换调用的结果上使用returns(...)方法,或者让您的函数实现'ResultTypeQueryable'接口,来给出类型信息提示。
  • The generic type parameters of 'Collector' are missing. In many cases lambda methods don't provide enough information for automatic type extraction when Java generics are involved. An easy workaround is to use an (anonymous) class instead that implements the 'org.apache.flink.api.common.functions.FlatMapFunction' interface. Otherwise the type has to be specified explicitly using type information.
  • "Collector"的泛型类型参数缺失。在涉及Java泛型时,lambda方法往往无法提供足够的信息以实现自动类型提取。一个简单的解决办法是使用一个(匿名)类来实现"org.apache.flink.api.common.functions.FlatMapFunction"接口。否则,必须使用类型信息来显式指定类型。

接下来按照上面两个提示分别修改测试

结果上使用 returns
java 复制代码
DataStream<Tuple5<Integer, String, Double, Long, String>> dataStream = jsonArrayDataStream
        .flatMap((FlatMapFunction<JSONArray, Tuple5<Integer, String, Double, Long, String>>) (jsonArray1, out) -> {
            // 遍历 JSONArray 中的每个 JSONObject
            for (int i = 0; i < jsonArray1.size(); i++) {
                JSONObject object = jsonArray1.getJSONObject(i);
                // 提取字段,构造 Tuple5
                out.collect(Tuple5.of(
                        object.getInteger("id"),
                        object.getString("name"),
                        object.getDouble("price"),
                        object.getLong("ts"),
                        object.getString("dt")
                ));
            }
        })
        .returns(new TypeHint<Tuple5<Integer, String, Double, Long, String>>() {});
匿名内部类
java 复制代码
DataStream<Tuple5<Integer, String, Double, Long, String>> dataStream = jsonArrayDataStream
        .flatMap(new FlatMapFunction<JSONArray, Tuple5<Integer, String, Double, Long, String>>() {
            @Override
            public void flatMap(JSONArray jsonArray, Collector<Tuple5<Integer, String, Double, Long, String>> out) throws Exception {
                for (int i = 0; i < jsonArray.size(); i++) {
                    JSONObject object = jsonArray.getJSONObject(i); // 修正:用 org.json 的 getJSONObject 方法
                    out.collect(Tuple5.of(
                            object.getInteger("id"),
                            object.getString("name"),
                            object.getDouble("price"),
                            object.getLong("ts"),
                            object.getString("dt")
                    ));
                }
            }
        });

上面两种方法都可行

先转 DataStream

和先转 DataStream会遇到一样的问题,那么也是按照两种方式进行修改测试

结果上使用 returns
java 复制代码
DataStream<Row> dataStream = jsonArrayDataStream
        .flatMap((FlatMapFunction<JSONArray, Row>) (jsonArray1, out) -> {
            for (int i = 0; i < jsonArray1.size(); i++) {
                JSONObject object = jsonArray1.getJSONObject(i);
                out.collect(Row.of(
                        object.getInteger("id"),
                        object.getString("name"),
                        object.getDouble("price"),
                        object.getLong("ts"),
                        object.getString("dt")
                ));
            }
        })
        .returns(new RowTypeInfo(
                Types.INT,
                Types.STRING,
                Types.DOUBLE,
                Types.LONG,
                Types.STRING
        ));
匿名内部类
java 复制代码
DataStream<Row> dataStream = jsonArrayDataStream
        .flatMap(new FlatMapFunction<JSONArray, Row>() {
            @Override
            public void flatMap(JSONArray jsonArray, Collector<Row> out) throws Exception {
                for (int i = 0; i < jsonArray.size(); i++) {
                    JSONObject object = jsonArray.getJSONObject(i);
                    out.collect(Row.of(
                            object.getInteger("id"),
                            object.getString("name"),
                            object.getDouble("price"),
                            object.getLong("ts"),
                            object.getString("dt")
                    ));
                }
            }
        });

但是这种方式依然会报错

java 复制代码
Exception in thread "main" org.apache.flink.table.api.ValidationException: An input of GenericTypeInfo<Row> cannot be converted to Table. Please specify the type of the input with a RowTypeInfo.
    at org.apache.flink.table.typeutils.FieldInfoUtils.extractFieldInformation(FieldInfoUtils.java:287)
    at org.apache.flink.table.typeutils.FieldInfoUtils.getFieldsInfo(FieldInfoUtils.java:259)
    at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.lambda$asQueryOperation$2(AbstractStreamTableEnvironmentImpl.java:284)
    at java.util.Optional.map(Optional.java:215)
    at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.asQueryOperation(AbstractStreamTableEnvironmentImpl.java:281)
    at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.fromDataStream(StreamTableEnvironmentImpl.java:295)
    at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.fromDataStream(StreamTableEnvironmentImpl.java:290)

GenericTypeInfo 的输入无法转换为 Table。请使用 RowTypeInfo 指定输入的类型。核心原因是 Flink 无法自动推断 Row 的结构(字段名、类型、数量),需要显式指定 RowTypeInfo 告诉框架 Row 的元数据

RowTypeInfo

除了前面提到的在 flatMap / map 后面通过 returns 方法指定 RowTypeInfo 之外,还可以直接在 flatMap / map 的参数中指定:

java 复制代码
static RowTypeInfo rowTypeInfo = new RowTypeInfo(
        Types.INT,    // id 类型
        Types.STRING, // name 类型
        Types.DOUBLE, // price 类型
        Types.LONG,   // ts 类型
        Types.STRING  // dt 类型
);

DataStream<Row> dataStream = jsonArrayDataStream
        .flatMap((FlatMapFunction<JSONArray, Row>) (jsonArray1, out) -> {
            for (int i = 0; i < jsonArray1.size(); i++) {
                JSONObject object = jsonArray1.getJSONObject(i);
                out.collect(Row.of(
                        object.getInteger("id"),
                        object.getString("name"),
                        object.getDouble("price"),
                        object.getLong("ts"),
                        object.getString("dt")
                ));
            }
        }, rowTypeInfo);

另外 RowTypeInfo 支持设置字段名称,这样DataStream转Table时就不用指定字段名称了。

java 复制代码
static TypeInformation<?>[] types = new TypeInformation[]{
        Types.INT,    // id 字段类型
        Types.STRING, // name 字段类型
        Types.DOUBLE, // price 字段类型
        Types.LONG,   // ts 字段类型
        Types.STRING  // dt 字段类型
};
static String[] fieldNames = new String[]{"id", "name", "price", "ts", "dt"};
static RowTypeInfo rowTypeInfoWithFields = new RowTypeInfo(types, fieldNames);

DataStream<Row> dataStream = jsonArrayDataStream
        .flatMap((FlatMapFunction<JSONArray, Row>) (jsonArray1, out) -> {
            for (int i = 0; i < jsonArray1.size(); i++) {
                JSONObject object = jsonArray1.getJSONObject(i);
                out.collect(Row.of(
                        object.getInteger("id"),
                        object.getString("name"),
                        object.getDouble("price"),
                        object.getLong("ts"),
                        object.getString("dt")
                ));
            }
        }, rowTypeInfoWithFields);
Table table = tableEnv.fromDataStream(dataStream);

最终建议使用这种方式。

DataStream

上面都是通过DataStream<JSONArray> flatMap 进行转化的,下面给出 DataStream map的代码实现。

java 复制代码
DataStream<Row> dataStream = jsonObjectDataStream
        .map((MapFunction<JSONObject, Row>) (jsonObject) -> Row.of(
                    jsonObject.getInteger("id"),
                    jsonObject.getString("name"),
                    jsonObject.getDouble("price"),
                    jsonObject.getLong("ts"),
                    jsonObject.getString("dt")
            ), rowTypeInfoWithFields);
Table table = tableEnv.fromDataStream(dataStream);
相关推荐
amcomputer2 小时前
服务器数据如何实现备份同步?
运维·服务器
岁岁种桃花儿2 小时前
Flink从入门到上天系列第二十二篇:Flink当中的FlinkSQL
大数据·flink
Cc琎2 小时前
api接口分布在多台服务器, 如何同步用户的每日请求次数
java·运维·服务器·redis·php
小码吃趴菜2 小时前
服务器预约系统linux小项目-第一节课
运维·服务器
2401_879693874 小时前
Python深度学习入门:TensorFlow 2.0/Keras实战
jvm·数据库·python
User_芊芊君子10 小时前
影音自由新玩法:Plex+cpolar 解锁异地访问,告别网盘限速烦恼
服务器·nginx·测评
xixihaha132410 小时前
将Python Web应用部署到服务器(Docker + Nginx)
jvm·数据库·python
wanhengidc10 小时前
云手机的运行环境如何
运维·服务器·游戏·智能手机·生活