1. Flink自定义Source

一. Source 简介

DataStream是Flink的低级API,用于进行数据的实时处理,Flink编程模型分为Source、Transformation、Sink三个部分,如下图所示。

默认Flink提供了大量的内置Source,常见的Source如下:

  • 基于文件的Source
  • 基于Socket的Source
  • 基于集合的Source
  • 基于Kafka消息队列的Source

当以上内置Source不能满足业务需要时,可以实现自定义Source。

Flink中有关Source的接口类的继承关系如下:

  • SourceFunction:单并行度Source的基类
  • RichSourceFunction:单并行度增强型Source的基类
  • ParallelSourceFunction:多并行度Source的基类
  • RichParallelSourceFunction:多并行度增强型Source的基类
二. 自定义单并行度Source

自定义单并行度的source需要实现SourceFunction接口。

代码实现

MySource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class MySource implements SourceFunction<String> {
    boolean running = true;
    
    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
        	// "Num"加上0~100的随机数生成一个字符串
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MySource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
5> Num: 62
6> Num: 91
7> Num: 13
8> Num: 53
三. 自定义多并行度Source

自定义多并行度的source需要实现ParallelSourceFunction接口。

代码实现

MyParallelSource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.ParallelSourceFunction;
import java.util.Random;

public class MyParallelSource implements ParallelSourceFunction<String> {

    boolean running = true;

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MyParallelSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
7> Num: 43
8> Num: 30
1> Num: 92
2> Num: 50
5> Num: 39
6> Num: 6
4> Num: 20
3> Num: 2
四. 自定义单并行度增强型Source

增强型Source额外提供了open和close方法,可以用于自定义Source的初始化和清理工作。单并行度增强型Source需要实现RichSourceFunction接口。下面演示实现读取mysql表的单并行度Source。

在mysql中创建student表,并插入三条数据。

sql 复制代码
create table student (
	id int primary key,
	name varchar(50),
	age int
);

insert into student values(1, "name1", 20),(2, "name2", 30), (3, "name3", 15);

实现代码

Student.java

java 复制代码
package flink.basic.source;

public class Student {
    private int id;
    private String name;
    private int age;

    public Student(int id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public Student() {}


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "Student{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

MysqlSource.java

java 复制代码
package flink.basic.source;


import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class MysqlSource extends RichSourceFunction<Student> {

    Connection conn;
    Statement stmt;
    @Override
    public void open(Configuration parameters) throws Exception {
        Class.forName("com.mysql.cj.jdbc.Driver");
        String url = "jdbc:mysql://192.168.47.130:3306/test";
        String user = "root";
        String password = "root";

        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    @Override
    public void run(SourceContext<Student> ctx) throws Exception {
        ResultSet rs = stmt.executeQuery("select * from student");
        while (rs.next()) {
            int id = rs.getInt("id");
            String name = rs.getString("name");
            int age = rs.getInt("age");
            ctx.collect(new Student(id, name, age));
        }

        rs.close();
    }

    @Override
    public void cancel() {

    }

    @Override
    public void close() throws Exception {
        stmt.close();
        conn.close();
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
		// 添加mysql Source
        DataStreamSource<Student> source = env.addSource(new MysqlSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
1> Student{id=3, name='name3', age=15}
8> Student{id=2, name='name2', age=30}
7> Student{id=1, name='name1', age=20}
相关推荐
2401_8914092618 分钟前
Level2逐笔成交逐笔委托毫秒记录:今日分享优质股票数据20250102
大数据·数据库·python·金融·数据库开发
JermeryBesian1 小时前
Flink系统知识讲解之:容错与State状态管理
java·大数据·flink
无奈ieq3 小时前
spark汇总
大数据·分布式·spark
weixin_307779133 小时前
理解Spark中运行程序时数据被分区的过程
大数据·分布式·spark
viperrrrrrrrrr73 小时前
大数据学习(33)-spark-transformation算子
大数据·hive·学习·spark·mapreduce
zfj3213 小时前
学英语学Elasticsearch:04 Elastic integrations 工具箱实现对第三方数据源的采集、存储、可视化,开箱即用
大数据·elasticsearch·搜索引擎·logstash·elastic agent·与第三方集成
无奈ieq3 小时前
Spark Streaming专题
大数据·spark
Whacky-u4 小时前
Hive SQL必刷练习题:连续问题 & 间断连续
大数据·数据库·数据仓库·sql
徐礼昭|商派软件市场负责人4 小时前
支持各大平台账单处理,支持复杂业财数据的精细化对账|商派OMS
大数据·数据库·人工智能·oms·财务对账
泡芙萝莉酱5 小时前
省级-农业科技创新(农业科技专利)数据(2010-2022年)-社科数据
大数据·人工智能·科技·深度学习·数据挖掘·毕业论文