1. Flink自定义Source

一. Source 简介

DataStream是Flink的低级API,用于进行数据的实时处理,Flink编程模型分为Source、Transformation、Sink三个部分,如下图所示。

默认Flink提供了大量的内置Source,常见的Source如下:

  • 基于文件的Source
  • 基于Socket的Source
  • 基于集合的Source
  • 基于Kafka消息队列的Source

当以上内置Source不能满足业务需要时,可以实现自定义Source。

Flink中有关Source的接口类的继承关系如下:

  • SourceFunction:单并行度Source的基类
  • RichSourceFunction:单并行度增强型Source的基类
  • ParallelSourceFunction:多并行度Source的基类
  • RichParallelSourceFunction:多并行度增强型Source的基类
二. 自定义单并行度Source

自定义单并行度的source需要实现SourceFunction接口。

代码实现

MySource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class MySource implements SourceFunction<String> {
    boolean running = true;
    
    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
        	// "Num"加上0~100的随机数生成一个字符串
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MySource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
5> Num: 62
6> Num: 91
7> Num: 13
8> Num: 53
三. 自定义多并行度Source

自定义多并行度的source需要实现ParallelSourceFunction接口。

代码实现

MyParallelSource.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.functions.source.ParallelSourceFunction;
import java.util.Random;

public class MyParallelSource implements ParallelSourceFunction<String> {

    boolean running = true;

    @Override
    public void run(SourceContext<String> ctx) throws Exception {
        Random random = new Random();
        while (running) {
            ctx.collect("Num: " + random.nextInt(100));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        running = false;
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.addSource(new MyParallelSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
7> Num: 43
8> Num: 30
1> Num: 92
2> Num: 50
5> Num: 39
6> Num: 6
4> Num: 20
3> Num: 2
四. 自定义单并行度增强型Source

增强型Source额外提供了open和close方法,可以用于自定义Source的初始化和清理工作。单并行度增强型Source需要实现RichSourceFunction接口。下面演示实现读取mysql表的单并行度Source。

在mysql中创建student表,并插入三条数据。

sql 复制代码
create table student (
	id int primary key,
	name varchar(50),
	age int
);

insert into student values(1, "name1", 20),(2, "name2", 30), (3, "name3", 15);

实现代码

Student.java

java 复制代码
package flink.basic.source;

public class Student {
    private int id;
    private String name;
    private int age;

    public Student(int id, String name, int age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }

    public Student() {}


    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "Student{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

MysqlSource.java

java 复制代码
package flink.basic.source;


import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class MysqlSource extends RichSourceFunction<Student> {

    Connection conn;
    Statement stmt;
    @Override
    public void open(Configuration parameters) throws Exception {
        Class.forName("com.mysql.cj.jdbc.Driver");
        String url = "jdbc:mysql://192.168.47.130:3306/test";
        String user = "root";
        String password = "root";

        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    @Override
    public void run(SourceContext<Student> ctx) throws Exception {
        ResultSet rs = stmt.executeQuery("select * from student");
        while (rs.next()) {
            int id = rs.getInt("id");
            String name = rs.getString("name");
            int age = rs.getInt("age");
            ctx.collect(new Student(id, name, age));
        }

        rs.close();
    }

    @Override
    public void cancel() {

    }

    @Override
    public void close() throws Exception {
        stmt.close();
        conn.close();
    }
}

SourceDemo.java

java 复制代码
package flink.basic.source;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class SourceDemo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
		// 添加mysql Source
        DataStreamSource<Student> source = env.addSource(new MysqlSource());
        source.print();

        env.execute("source_demo");
    }
}

运行结果

bash 复制代码
1> Student{id=3, name='name3', age=15}
8> Student{id=2, name='name2', age=30}
7> Student{id=1, name='name1', age=20}
相关推荐
智慧化智能化数字化方案10 分钟前
2023 数据资产管理实践白皮书(6.0版)+解读
大数据
helloworld工程师28 分钟前
Dubbo的负载均衡及高性能RPC调用
java·大数据·人工智能
千瓜1 小时前
2024年特别报告,「十大生活方式」研究数据报告
大数据·数据挖掘·数据分析·业界资讯·新媒体
high20111 小时前
【Hadoop】-- hadoop3.x default port
大数据·hadoop·分布式
ueanaIU潇潇子1 小时前
Windows安装elasticsearch、Kibana以及IK分词器
大数据·elasticsearch·搜索引擎·kibana·ik分词器
一个略懂代码的程序员1 小时前
ES(elasticsearch)
大数据·elasticsearch·jenkins
说私域2 小时前
电商信任构建与创新模式:基于 2+1 链动模式与 S2B2C 商城小程序的分析
大数据·人工智能·微信小程序·小程序
奥顺2 小时前
从零开始:PHP基础教程系列-第3篇:控制结构:条件语句与循环
大数据·mysql·开源·php
图扑软件2 小时前
火电厂可视化助力提升运维效率
大数据·前端·javascript·人工智能·数字孪生·可视化·火电厂
ws2019073 小时前
AUTO TECH China 2025华南展:一起探索汽车零部件的未来
大数据·人工智能·汽车