1. Flink自定义Source

一. Source 简介

DataStream是Flink的低级API,用于进行数据的实时处理,Flink编程模型分为Source、Transformation、Sink三个部分,如下图所示。

默认Flink提供了大量的内置Source,常见的Source如下:

  • 基于文件的Source
  • 基于Socket的Source
  • 基于集合的Source
  • 基于Kafka消息队列的Source

当以上内置Source不能满足业务需要时,可以实现自定义Source。

Flink中有关Source的接口类的继承关系如下:

  • SourceFunction:单并行度Source的基类
  • RichSourceFunction:单并行度增强型Source的基类
  • ParallelSourceFunction:多并行度Source的基类
  • RichParallelSourceFunction:多并行度增强型Source的基类
二. 自定义单并行度Source

自定义单并行度的source需要实现SourceFunction接口。

代码实现

MySource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class MySource implements SourceFunction<String> {
    boolean running = true;
    
    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
        	// "Num"加上0~100的随机数生成一个字符串
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MySource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
5> Num: 62
6> Num: 91
7> Num: 13
8> Num: 53
三. 自定义多并行度Source

自定义多并行度的source需要实现ParallelSourceFunction接口。

代码实现

MyParallelSource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.ParallelSourceFunction;
import java.util.Random;

public class MyParallelSource implements ParallelSourceFunction<String> {

    boolean running = true;

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MyParallelSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
7> Num: 43
8> Num: 30
1> Num: 92
2> Num: 50
5> Num: 39
6> Num: 6
4> Num: 20
3> Num: 2
四. 自定义单并行度增强型Source

增强型Source额外提供了open和close方法,可以用于自定义Source的初始化和清理工作。单并行度增强型Source需要实现RichSourceFunction接口。下面演示实现读取mysql表的单并行度Source。

在mysql中创建student表,并插入三条数据。

sql 复制代码
create table student (
	id int primary key,
	name varchar(50),
	age int
);

insert into student values(1, "name1", 20),(2, "name2", 30), (3, "name3", 15);

实现代码

Student.java

java 复制代码
package flink.basic.source;

public class Student {
    private int id;
    private String name;
    private int age;

    public Student(int id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public Student() {}


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "Student{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

MysqlSource.java

java 复制代码
package flink.basic.source;


import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class MysqlSource extends RichSourceFunction<Student> {

    Connection conn;
    Statement stmt;
    @Override
    public void open(Configuration parameters) throws Exception {
        Class.forName("com.mysql.cj.jdbc.Driver");
        String url = "jdbc:mysql://192.168.47.130:3306/test";
        String user = "root";
        String password = "root";

        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    @Override
    public void run(SourceContext<Student> ctx) throws Exception {
        ResultSet rs = stmt.executeQuery("select * from student");
        while (rs.next()) {
            int id = rs.getInt("id");
            String name = rs.getString("name");
            int age = rs.getInt("age");
            ctx.collect(new Student(id, name, age));
        }

        rs.close();
    }

    @Override
    public void cancel() {

    }

    @Override
    public void close() throws Exception {
        stmt.close();
        conn.close();
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
		// 添加mysql Source
        DataStreamSource<Student> source = env.addSource(new MysqlSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
1> Student{id=3, name='name3', age=15}
8> Student{id=2, name='name2', age=30}
7> Student{id=1, name='name1', age=20}
相关推荐
TOWE technology5 分钟前
从“制造”到“智造”:智能PDU如何成为智慧工厂的电力“神经中枢”
大数据·人工智能·制造·数据中心·电源管理·智能pdu
2401_8916558117 分钟前
Git误操作急救手册大纲
大数据·elasticsearch·搜索引擎
LaughingZhu29 分钟前
Product Hunt 每日热榜 | 2026-03-22
大数据·数据库·人工智能·经验分享·搜索引擎
进击的雷神38 分钟前
Trae AI IDE 完全指南:从入门到精通
大数据·ide·人工智能·trae
七夜zippoe1 小时前
OpenClaw 会话管理:单聊、群聊、多模型
大数据·人工智能·fastapi·token·openclaw
longxibo1 小时前
【Ubuntu datasophon1.2.1 二开之八:验证实时数据入湖】
大数据·linux·clickhouse·ubuntu·linq
一只努力的微服务1 小时前
【Calcite 系列】深入理解 Calcite 的 AggregateFilterTransposeRule
大数据·数据库·calcite·优化规则
小堃学编程2 小时前
【项目实战】基于protobuf的发布订阅式消息队列(1)—— 准备工作
java·大数据·开发语言
无忧智库2 小时前
破局与重构:大型集团财务共享业财一体化的数字基因革命(PPT)
大数据·架构
zxm85132 小时前
UV使用及UV与Anaconda的区别
大数据·学习·机器学习·uv