Flink 自定义数据源开发流程

1 继承SourceFunction和ParallelSourceFunction

复制代码
import org.apache.flink.streaming.api.functions.source.SourceFunction;

重新run()和cancel()方法

2 AccessSource 代码

复制代码
package com.zyb.flink.basic.source;
import com.zyb.flink.basic.bean.Access;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class AccessSource implements SourceFunction<Access> {
    boolean isRunning = true;

    @Override
    public void run(SourceContext<Access> ctx) throws Exception {
        Random random = new Random();
        String[] domains = {"pk1.com","pk2.com","pk3.com","pk4.com","pk5."};

        while (isRunning){
            long time = System.currentTimeMillis();
            ctx.collect(new Access(time,domains[random.nextInt(domains.length)],random.nextInt(1000)));
        }
        Thread.sleep(2000);
    }

    @Override
    public void cancel() {
        isRunning = false;
    }
}

3 Access代码

复制代码
package com.zyb.flink.basic.bean;

public class Access {
    private long time;
    private String domain;
    private double traffic;

    @Override
    public String toString() {
        return "Access{" +
                "time=" + time +
                ", domain='" + domain + '\'' +
                ", traffic=" + traffic +
                '}';
    }

    public Access() {
    }

    public Access(long time, String domain, double traffic) {
        this.time = time;
        this.domain = domain;
        this.traffic = traffic;
    }

    public long getTime() {
        return time;
    }

    public void setTime(long time) {
        this.time = time;
    }

    public String getDomain() {
        return domain;
    }

    public void setDomain(String domain) {
        this.domain = domain;
    }

    public double getTraffic() {
        return traffic;
    }

    public void setTraffic(double traffic) {
        this.traffic = traffic;
    }
}

4 测试代码

复制代码
package com.zyb.flink.basic.source;
import com.zyb.flink.basic.bean.Access;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.Random;

public class AccessSource implements SourceFunction<Access> {
    boolean isRunning = true;

    @Override
    public void run(SourceContext<Access> ctx) throws Exception {
        Random random = new Random();
        String[] domains = {"pk1.com","pk2.com","pk3.com","pk4.com","pk5."};

        while (isRunning){
            long time = System.currentTimeMillis();
            ctx.collect(new Access(time,domains[random.nextInt(domains.length)],random.nextInt(1000)));
        }
        Thread.sleep(2000);
    }

    @Override
    public void cancel() {
        isRunning = false;
    }
}
相关推荐
BioRunYiXue2 小时前
Nature Methods:CellVoyager 自主 AI 智能体开启生物数据分析新时代
大数据·开发语言·前端·javascript·人工智能·数据挖掘·数据分析
TDengine (老段)3 小时前
TDengine IDMP 工业数据建模 —— 数据标准化
大数据·数据库·物联网·ai·时序数据库·tdengine·涛思数据
AI先驱体验官4 小时前
AI智能体赛道新机遇:2026机会与挑战深度解析
大数据·人工智能·深度学习·重构·aigc
被摘下的星星4 小时前
Hadoop伪分布式集群搭建实验原理概要
大数据·hadoop·分布式
ggabb4 小时前
以色列的科技实力与全球格局分析
大数据·人工智能
Ujimatsu4 小时前
数据分析相关面试题-Python部分
大数据·python·数据分析
Omics Pro5 小时前
空间组学下一代机器学习与深度学习
大数据·人工智能·深度学习·算法·机器学习·语言模型·自然语言处理
北京软秦科技有限公司5 小时前
AI报告文档审核深度赋能化工行业质量管理:IACheck驱动报告质量跃升与合规风险精准管控新范式
大数据·人工智能
TDengine (老段)5 小时前
TDengine IDMP 工业数据建模 —— 数据情景化
大数据·数据库·人工智能·时序数据库·iot·tdengine·涛思数据