Flink - sink算子

水善利万物而不争,处众人之所恶,故几于道💦

文章目录

[  1. Kafka_Sink](#  1. Kafka_Sink)
[  2. Kafka_Sink - 自定义序列化器](#  2. Kafka_Sink - 自定义序列化器)
[  3. Redis_Sink_String](#  3. Redis_Sink_String)
[  4. Redis_Sink_list](#  4. Redis_Sink_list)
[  5. Redis_Sink_set](#  5. Redis_Sink_set)
[  6. Redis_Sink_hash](#  6. Redis_Sink_hash)
[  7. 有界流数据写入到ES](#  7. 有界流数据写入到ES)
[  8. 无界流数据写入到ES](#  8. 无界流数据写入到ES)
[  9. 自定义sink - mysql_Sink](#  9. 自定义sink - mysql_Sink)
[  10. Jdbc_Sink](#  10. Jdbc_Sink)

官方文档 - Flink1.13


1. Kafka_Sink

addSink(new FlinkKafkaProducer< String>(kafka_address,topic,序列化器)

要先添加依赖:

xml 复制代码
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka_2.12</artifactId>
    <version>1.13.6</version>
</dependency>
java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    stream
            .keyBy(WaterSensor::getId)
            .sum("vc")
            .map(JSON::toJSONString)
            .addSink(new FlinkKafkaProducer<String>(
                    "hadoop101:9092",  // kafaka地址
                    "flink_sink_kafka",  //要写入的Kafkatopic
                    new SimpleStringSchema()  // 序列化器
            ));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

2. Kafka_Sink - 自定义序列化器

自定义序列化器,new FlinkKafkaProducer()的时候,选择四个参数的构造方法,然后使用new KafkaSerializationSchema序列化器。然后重写serialize方法

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    Properties sinkConfig = new Properties();
    sinkConfig.setProperty("bootstrap.servers","hadoop101:9092");
    stream
            .keyBy(WaterSensor::getId)
            .sum("vc")
            .addSink(new FlinkKafkaProducer<WaterSensor>(
                    "defaultTopic",  // 默认发往的topic ,一般用不上
                    new KafkaSerializationSchema<WaterSensor>() {  // 自定义的序列化器
                        @Override
                        public ProducerRecord<byte[], byte[]> serialize(
                                WaterSensor waterSensor,
                                @Nullable Long aLong
                        ) {
                            String s = JSON.toJSONString(waterSensor);
                            return new ProducerRecord<>("flink_sink_kafka",s.getBytes(StandardCharsets.UTF_8));
                        }
                    },
                    sinkConfig,  // Kafka的配置
                    FlinkKafkaProducer.Semantic.AT_LEAST_ONCE  // 一致性语义:现在只能传入至少一次
            ));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

3. Redis_Sink_String

addSink(new RedisSink<>(config, new RedisMapper< WaterSensor>() {}

写到String结构里面

添加依赖:

xml 复制代码
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.83</version>
</dependency>

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-redis_2.11</artifactId>
    <version>1.1.5</version>
</dependency>
java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

/*
往redis里面写字符串,string   命令提示符用set
假设写的key是id,value是整个json格式的字符串
key         value
sensor_1    json格式字符串
 */
		// new一个单机版的配置
    FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
            .setHost("hadoop101")
            .setPort(6379)
            .setMaxTotal(100)  //最大连接数量
            .setMaxIdle(10)  // 连接池里面的最大空闲
            .setMinIdle(2)   // 连接池里面的最小空闲
            .setTimeout(10*1000)  // 超时时间
            .build();
    // 写出到redis中
    result.addSink(new RedisSink<>(config, new RedisMapper<WaterSensor>() {
        // 返回命令描述符:往不同的数据结构写数据用的方法不一样
        @Override
        public RedisCommandDescription getCommandDescription() {
            // 写入到字符串,用set
            return new RedisCommandDescription(RedisCommand.SET);
        }

        @Override
        public String getKeyFromData(WaterSensor waterSensor) {
            return waterSensor.getId();
        }

        @Override
        public String getValueFromData(WaterSensor waterSensor) {
            return JSON.toJSONString(waterSensor);
        }
    }));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

4. Redis_Sink_list

addSink(new RedisSink<>(config, new RedisMapper< WaterSensor>() {}

写到 list 结构里面

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");
            
    // key是id,value是处理后的json格式字符串
    FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
            .setHost("hadoop101")
            .setPort(6379)
            .setMaxTotal(100)  //最大连接数量
            .setMaxIdle(10)  // 连接池里面的最大空闲
            .setMinIdle(2)   // 连接池里面的最小空闲
            .setTimeout(10*1000)  // 超时时间
            .build();
    result.addSink(new RedisSink<>(config, new RedisMapper<WaterSensor>() {
        @Override
        public RedisCommandDescription getCommandDescription() {
            // 写入list
            return new RedisCommandDescription(RedisCommand.RPUSH);
        }

        @Override
        public String getKeyFromData(WaterSensor waterSensor) {
            return waterSensor.getId();
        }

        @Override
        public String getValueFromData(WaterSensor waterSensor) {
            return JSON.toJSONString(waterSensor);
        }
    }));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

5. Redis_Sink_set

addSink(new RedisSink<>(config, new RedisMapper< WaterSensor>() {}

写到 set 结构里面

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

    FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
            .setHost("hadoop101")
            .setPort(6379)
            .setMaxTotal(100)
            .setMaxIdle(10)
            .setMinIdle(2)
            .setTimeout(10*1000)
            .build();
    result.addSink(new RedisSink<>(config, new RedisMapper<WaterSensor>() {
        @Override
        public RedisCommandDescription getCommandDescription() {
            // 数据写入set集合
            return new RedisCommandDescription(RedisCommand.SADD);
        }

        @Override
        public String getKeyFromData(WaterSensor waterSensor) {
            return waterSensor.getId();
        }

        @Override
        public String getValueFromData(WaterSensor waterSensor) {
            return JSON.toJSONString(waterSensor);
        }
    }));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

6. Redis_Sink_hash

addSink(new RedisSink<>(config, new RedisMapper< WaterSensor>() {}

写到 hash结构里面

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

    FlinkJedisPoolConfig config = new FlinkJedisPoolConfig.Builder()
            .setHost("hadoop101")
            .setPort(6379)
            .setMaxTotal(100)
            .setMaxIdle(10)
            .setMinIdle(2)
            .setTimeout(10*1000)
            .build();
    result.addSink(new RedisSink<>(config, new RedisMapper<WaterSensor>() {
        @Override
        public RedisCommandDescription getCommandDescription() {
            // 数据写入hash
            return new RedisCommandDescription(RedisCommand.HSET,"a");
        }

        @Override
        public String getKeyFromData(WaterSensor waterSensor) {
            return waterSensor.getId();
        }

        @Override
        public String getValueFromData(WaterSensor waterSensor) {
            return JSON.toJSONString(waterSensor);
        }
    }));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

7. 有界流数据写入到ES中

new ElasticsearchSink.Builder()

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));

    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

    List<HttpHost> hosts = Arrays.asList(
            new HttpHost("hadoop101", 9200),
            new HttpHost("hadoop102", 9200),
            new HttpHost("hadoop103", 9200)
    );

    ElasticsearchSink.Builder<WaterSensor> builder = new ElasticsearchSink.Builder<WaterSensor>(
            hosts,
            new ElasticsearchSinkFunction<WaterSensor>() {
                @Override
                public void process(WaterSensor element,  // 需要写出的元素
                                    RuntimeContext runtimeContext, // 运行时上下文   不是context上下文对象
                                    RequestIndexer requestIndexer) {  // 把要写出的数据,封装到RequestIndexer里面
                    String msg = JSON.toJSONString(element);

                    IndexRequest ir = Requests
                            .indexRequest("sensor")
                            .type("_doc")  // 定义type的时候, 不能下划线开头. _doc是唯一的特殊情况
                            .id(element.getId())  // 定义每条数据的id. 如果不指定id, 会随机分配一个id. id重复的时候会更新数据
                            .source(msg, XContentType.JSON);

                    requestIndexer.add(ir);  // 把ir存入到indexer, 就会自动的写入到es中
                }
            }
    );

    result.addSink(builder.build());

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

8. 无界流数据写入到ES

和有界差不多 ,只不过把数据源换成socket,然后因为无界流,它高效不是你来一条就刷出去,所以设置刷新时间、大小、条数,才能看到结果。

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);
    
    SingleOutputStreamOperator<WaterSensor> result = env.socketTextStream("hadoop101",9999)
            .map(line->{
                String[] data = line.split(",");
                return new WaterSensor(data[0],Long.valueOf(data[1]),Integer.valueOf(data[2]));
            })
            .keyBy(WaterSensor::getId)
            .sum("vc");

    List<HttpHost> hosts = Arrays.asList(
            new HttpHost("hadoop101", 9200),
            new HttpHost("hadoop102", 9200),
            new HttpHost("hadoop103", 9200)
    );

    ElasticsearchSink.Builder<WaterSensor> builder = new ElasticsearchSink.Builder<WaterSensor>(
            hosts,
            new ElasticsearchSinkFunction<WaterSensor>() {
                @Override
                public void process(WaterSensor element,  // 需要写出的元素
                                    RuntimeContext runtimeContext, // 运行时上下文   不是context上下文对象
                                    RequestIndexer requestIndexer) {  // 把要写出的数据,封装到RequestIndexer里面
                    String msg = JSON.toJSONString(element);

                    IndexRequest ir = Requests
                            .indexRequest("sensor")
                            .type("_doc")  // 定义type的时候, 不能下划线开头. _doc是唯一的特殊情况
                            .id(element.getId())  // 定义每条数据的id. 如果不指定id, 会随机分配一个id. id重复的时候会更新数据
                            .source(msg, XContentType.JSON);

                    requestIndexer.add(ir);  // 把ir存入到indexer, 就会自动的写入到es中
                }
            }
    );

    // 自动刷新时间
    builder.setBulkFlushInterval(2000);  // 默认不会根据时间自动刷新
    builder.setBulkFlushMaxSizeMb(1024);  // 当批次中的数据大于等于这个值刷新
    builder.setBulkFlushMaxActions(2);   // 每来多少条数据刷新一次
    // 这三个是或的关系,只要有一个满足就会刷新

    result.addSink(builder.build());

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

9. 自定义sink - mysql_Sink

需要写一个类,实现RichSinkFunction,然后实现invoke方法。这里因为是写MySQL所以需要建立连接,那就用Rich版本。

记得导入MySQL依赖

java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port", 1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));


    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

    result.addSink(new MySqlSink());


    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }

}

public static class MySqlSink extends RichSinkFunction<WaterSensor> {

    private Connection connection;

    @Override
    public void open(Configuration parameters) throws Exception {
        Class.forName("com.mysql.cj.jdbc.Driver");
        connection = DriverManager.getConnection("jdbc:mysql://hadoop101:3306/test?useSSL=false", "root", "123456");
    }

    @Override
    public void close() throws Exception {
        if (connection!=null){
            connection.close();
        }
    }

    // 调用:每来一条元素,这个方法执行一次
    @Override
    public void invoke(WaterSensor value, Context context) throws Exception {
        // jdbc的方式想MySQL写数据
//            String sql = "insert into sensor(id,ts,vc)values(?,?,?)";
        //如果主键不重复就新增,主键重复就更新
//            String sql = "insert into sensor(id,ts,vc)values(?,?,?) duplicate key update vc=?";
        String sql = "replace into sensor(id,ts,vc)values(?,?,?)";
        // 1. 得到预处理语句
        PreparedStatement ps = connection.prepareStatement(sql);
        // 2. 给sql中的占位符进行赋值
        ps.setString(1,value.getId());
        ps.setLong(2,value.getTs());
        ps.setInt(3,value.getVc());
//            ps.setInt(4,value.getVc());
        // 3. 执行
        ps.execute();
        // 4. 提交
//            connection.commit();  MySQL默认自动提交,所以这个地方不用调用
        // 5. 关闭预处理
        ps.close();
    }
}

运行结果:

10. Jdbc_Sink

addSink(JdbcSink.sink(sql,JdbcStatementBuilder,执行参数,连接参数)

对于jdbc数据库,我们其实没必要自定义,因为官方给我们了一个JDBC Sink -> 官方JDBC Sink 传送门

xml 复制代码
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-jdbc_2.11</artifactId>
    <version>1.13.6</version>
</dependency>
java 复制代码
public static void main(String[] args) {
    Configuration conf = new Configuration();
    conf.setInteger("rest.port",1000);
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
    env.setParallelism(1);

    ArrayList<WaterSensor> waterSensors = new ArrayList<>();
    waterSensors.add(new WaterSensor("sensor_1", 1607527992000L, 20));
    waterSensors.add(new WaterSensor("sensor_1", 1607527994000L, 50));
    waterSensors.add(new WaterSensor("sensor_1", 1607527996000L, 50));
    waterSensors.add(new WaterSensor("sensor_2", 1607527993000L, 10));
    waterSensors.add(new WaterSensor("sensor_2", 1607527995000L, 30));


    DataStreamSource<WaterSensor> stream = env.fromCollection(waterSensors);

    SingleOutputStreamOperator<WaterSensor> result = stream
            .keyBy(WaterSensor::getId)
            .sum("vc");

    result.addSink(JdbcSink.sink(
            "replace into sensor(id,ts,vc)values(?,?,?)",
            new JdbcStatementBuilder<WaterSensor>() {
                @Override
                public void accept(
                        PreparedStatement ps,
                        WaterSensor waterSensor) throws SQLException {
                    // 只做一件事:给占位符赋值
                    ps.setString(1,waterSensor.getId());
                    ps.setLong(2,waterSensor.getTs());
                    ps.setInt(3,waterSensor.getVc());
                }
            },
            new JdbcExecutionOptions.Builder()  //设置执行参数
                .withBatchSize(1024)   // 刷新大小上限
                .withBatchIntervalMs(2000) //刷新间隔
                .withMaxRetries(3)  // 重试次数
                .build(),
            new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                .withDriverName("com.mysql.cj.jdbc.Driver")
                .withUrl("jdbc:mysql://hadoop101:3306/test?useSSL=false")
                .withUsername("root")
                .withPassword("123456")
                .build()
    ));

    try {
        env.execute();
    } catch (Exception e) {
        e.printStackTrace();
    }
}

运行结果:

相关推荐
HaoHao_0101 小时前
AWS Outposts
大数据·服务器·数据库·aws·云服务器
HaoHao_0101 小时前
VMware 的 AWS
大数据·服务器·数据库·云计算·aws·云服务器
Elastic 中国社区官方博客3 小时前
将 OneLake 数据索引到 Elasticsearch - 第二部分
大数据·数据库·elasticsearch·搜索引擎·信息可视化·全文检索
庄小焱3 小时前
Elasticsearch——Elasticsearch查询实战
大数据·elasticsearch·搜索引擎
金融OG4 小时前
99.17 金融难点通俗解释:归母净利润
大数据·数据库·python·机器学习·金融
豪越大豪5 小时前
智慧消防营区一体化安全管控 2024 年度深度剖析与展望
大数据·运维
weixin_307779136 小时前
性能优化案例:通过合理设置spark.storage.memoryFraction参数的值来优化PySpark程序的性能
大数据·python·spark
007php00713 小时前
在系统重构中的工作计划与总结
大数据·开发语言·人工智能·后端·重构·aigc·php
一條狗13 小时前
20250124 Flink 增量聚合 vs 全量聚合
flink