1. Flink自定义Source

一. Source 简介

DataStream是Flink的低级API,用于进行数据的实时处理,Flink编程模型分为Source、Transformation、Sink三个部分,如下图所示。

默认Flink提供了大量的内置Source,常见的Source如下:

  • 基于文件的Source
  • 基于Socket的Source
  • 基于集合的Source
  • 基于Kafka消息队列的Source

当以上内置Source不能满足业务需要时,可以实现自定义Source。

Flink中有关Source的接口类的继承关系如下:

  • SourceFunction:单并行度Source的基类
  • RichSourceFunction:单并行度增强型Source的基类
  • ParallelSourceFunction:多并行度Source的基类
  • RichParallelSourceFunction:多并行度增强型Source的基类
二. 自定义单并行度Source

自定义单并行度的source需要实现SourceFunction接口。

代码实现

MySource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class MySource implements SourceFunction<String> {
    boolean running = true;
    
    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
        	// "Num"加上0~100的随机数生成一个字符串
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MySource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
5> Num: 62
6> Num: 91
7> Num: 13
8> Num: 53
三. 自定义多并行度Source

自定义多并行度的source需要实现ParallelSourceFunction接口。

代码实现

MyParallelSource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.ParallelSourceFunction;
import java.util.Random;

public class MyParallelSource implements ParallelSourceFunction<String> {

    boolean running = true;

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MyParallelSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
7> Num: 43
8> Num: 30
1> Num: 92
2> Num: 50
5> Num: 39
6> Num: 6
4> Num: 20
3> Num: 2
四. 自定义单并行度增强型Source

增强型Source额外提供了open和close方法,可以用于自定义Source的初始化和清理工作。单并行度增强型Source需要实现RichSourceFunction接口。下面演示实现读取mysql表的单并行度Source。

在mysql中创建student表,并插入三条数据。

sql 复制代码
create table student (
	id int primary key,
	name varchar(50),
	age int
);

insert into student values(1, "name1", 20),(2, "name2", 30), (3, "name3", 15);

实现代码

Student.java

java 复制代码
package flink.basic.source;

public class Student {
    private int id;
    private String name;
    private int age;

    public Student(int id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public Student() {}


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "Student{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

MysqlSource.java

java 复制代码
package flink.basic.source;


import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class MysqlSource extends RichSourceFunction<Student> {

    Connection conn;
    Statement stmt;
    @Override
    public void open(Configuration parameters) throws Exception {
        Class.forName("com.mysql.cj.jdbc.Driver");
        String url = "jdbc:mysql://192.168.47.130:3306/test";
        String user = "root";
        String password = "root";

        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    @Override
    public void run(SourceContext<Student> ctx) throws Exception {
        ResultSet rs = stmt.executeQuery("select * from student");
        while (rs.next()) {
            int id = rs.getInt("id");
            String name = rs.getString("name");
            int age = rs.getInt("age");
            ctx.collect(new Student(id, name, age));
        }

        rs.close();
    }

    @Override
    public void cancel() {

    }

    @Override
    public void close() throws Exception {
        stmt.close();
        conn.close();
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
		// 添加mysql Source
        DataStreamSource<Student> source = env.addSource(new MysqlSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
1> Student{id=3, name='name3', age=15}
8> Student{id=2, name='name2', age=30}
7> Student{id=1, name='name1', age=20}
相关推荐
Hello World......2 小时前
Java求职面试揭秘:从Spring到微服务的技术挑战
大数据·hadoop·spring boot·微服务·spark·java面试·互联网大厂
数据与人工智能律师8 小时前
虚拟主播肖像权保护,数字时代的法律博弈
大数据·网络·人工智能·算法·区块链
一只专注api接口开发的技术猿9 小时前
企业级电商数据对接:1688 商品详情 API 接口开发与优化实践
大数据·前端·爬虫
今天我又学废了11 小时前
Spark,SparkSQL操作Mysql, 创建数据库和表
大数据·mysql·spark
杰克逊的日记13 小时前
Flink运维要点
大数据·运维·flink
markuszhang16 小时前
Elasticsearch 官网阅读之 Term-level Queries
大数据·elasticsearch·搜索引擎
Hello World......17 小时前
Java求职面试:从核心技术到大数据与AI的场景应用
大数据·java面试·技术栈·互联网大厂·ai服务
张伯毅18 小时前
Flink SQL 将kafka topic的数据写到另外一个topic里面
sql·flink·kafka
python算法(魔法师版)19 小时前
.NET NativeAOT 指南
java·大数据·linux·jvm·.net
星川皆无恙19 小时前
大模型学习:Deepseek+dify零成本部署本地运行实用教程(超级详细!建议收藏)
大数据·人工智能·学习·语言模型·架构