使用Flink-JDBC将数据同步到Doris

在现代数据分析和处理环境中,数据同步是一个至关重要的环节。Apache Flink和Doris是两个强大的工具,分别用于实时数据处理和大规模并行处理(MPP)SQL数据库。本文将介绍如何使用Flink-JDBC连接器将数据同步到Doris。

一、背景介绍

1、Apache Flink:Flink是一个开源流处理框架,用于处理无界和有界数据流。它提供了高吞吐量、低延迟的数据处理能力,并支持复杂的状态管理和容错机制。

2、Doris:Doris(原百度Palo)是一个基于MPP架构的高性能、实时分析型数据库。它支持高并发查询和高吞吐量的复杂分析场景,具有亚秒级响应时间,并兼容MySQL协议。

3、JDBC:Java数据库连接(JDBC)是一种Java API,用于连接和操作数据库。Flink提供了JDBC连接器,允许从各种关系型数据库中读取和写入数据。

二、准备工作

1、安装Flink:确保你的环境中已经安装了Apache Flink。本文示例使用的是Flink 1.20.0版本。

2、安装Doris:确保你的环境中已经安装并配置了Doris。

3、准备相关测试数据集(excel)。

4、依赖库:使用flink-connector-jdbc和Doris的JDBC驱动。如果你使用的是Maven,可以在项目的pom.xml文件中添加以下依赖:

java 复制代码
<dependencies>
  <!--    flink-->
  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-java</artifactId>
    <version>${flink.version}</version>
  </dependency>

  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-clients</artifactId>
    <version>${flink.version}</version>
  </dependency>
  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-base</artifactId>
    <version>${flink.version}</version>
  </dependency>
  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka</artifactId>
    <version>${flink-kafka.version}</version>
  </dependency>
  <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-jdbc -->
  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-jdbc</artifactId>
    <version>${flink-jdbc.version}</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-files -->
  <dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-files</artifactId>
    <version>${flink.version}</version>
  </dependency>

  <!-- flink-doris-connector -->
  <dependency>
    <groupId>org.apache.doris</groupId>
    <artifactId>flink-doris-connector-1.16</artifactId>
    <version>${flink-doris.version}</version>
  </dependency>

  <!--    json处理-->
  <dependency>
    <groupId>com.alibaba.fastjson2</groupId>
    <artifactId>fastjson2</artifactId>
    <version>2.0.53</version>
  </dependency>
  <dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>easyexcel</artifactId>
    <version>4.0.3</version>
  </dependency>

  <dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <version>1.18.36</version>
  </dependency>
  <!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core -->
  <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.24.3</version>
  </dependency>
  <dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>8.0.27</version>
  </dependency>

  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>3.8.1</version>
    <scope>test</scope>
  </dependency>
</dependencies>

三、实现步骤

1.读取并解析excel中的测试数据集

java 复制代码
package org.example.day20250105;

import com.alibaba.excel.EasyExcel;
import com.alibaba.excel.context.AnalysisContext;
import com.alibaba.excel.read.listener.ReadListener;
import com.alibaba.excel.support.ExcelTypeEnum;
import org.example.day20250105.domain.*;
import org.example.day20250105.flink.jdbc.OdsXwDetailDataWriteByFlink;

import java.util.ArrayList;
import java.util.List;

public class ExcelDataHandle {
    private static final String filePath = "src/main/resources/20250105/WW2216069019-循环-90-3-1-20230104110446.xlsx";
    /**
     * 每隔5条存储数据库,实际使用中可以100条,然后清理list ,方便内存回收
     */
    private static final int BATCH_COUNT = 1000;
    /**
     * 缓存的数据
     */
    private static List<OdsXwCycleData> odsXwCycleDataList = new ArrayList<>(BATCH_COUNT);
    private static List<OdsXwDetailData> odsXwDetailDataList = new ArrayList<>(BATCH_COUNT);
    private static List<OdsXwStepData> odsXwStepDataList = new ArrayList<>(BATCH_COUNT);
    private static List<OdsTestFlow> odsTestFlowList = new ArrayList<>(BATCH_COUNT);
    private static final OdsTestFlow finalFlow = new OdsTestFlow();

    public static void main(String[] args) {
        EasyExcel.read(filePath, OdsTestFlowDataFormat.class, new OdsTestFlowDataHandle()).sheet(1).doRead();
        System.out.println(odsTestFlowList.add(finalFlow));
        System.out.println(odsTestFlowList);

    }

    public static List<OdsXwCycleData> readCycleData() {
        EasyExcel.read(filePath, OdsXwCycleData.class, new OdsXwCycleDataHandle()).sheet(2).doRead();
        return odsXwCycleDataList;
    }

    public static List<OdsTestFlow> readTestFlowData(){
        EasyExcel.read(filePath, OdsTestFlowDataFormat.class, new OdsTestFlowDataHandle()).sheet(1).doRead();
        return odsTestFlowList;
    }
    public static List<OdsXwStepData> readStepData() {
        EasyExcel.read(filePath, OdsXwStepData.class, new OdsXwStepDataHandle()).excelType(ExcelTypeEnum.XLSX).sheet(3).doRead();
        return odsXwStepDataList;
    }

    public static List<OdsXwDetailData> readDetailData() {
        EasyExcel.read(filePath, OdsXwDetailData.class, new OdsXwDetailDataHandle()).sheet(4).doRead();
        return odsXwDetailDataList;
    }

    /**
     * 新威原始测试流程数据   sheet 1
     */
    public static class OdsTestFlowDataHandle implements ReadListener<OdsTestFlowDataFormat> {
        private static int rowCount = 0;

        @Override
        public void invoke(OdsTestFlowDataFormat format, AnalysisContext analysisContext) {
            if (rowCount > 5) {
                return;
            }
            if (rowCount == 0) {
                finalFlow.setStartStepNo(format.getV1());
                finalFlow.setVoltageUpperLimit(format.getV2());
                finalFlow.setBatteryBatchNumber(format.getV3());
            } else if (rowCount == 1) {
                finalFlow.setCycleTimes(format.getV1());
                finalFlow.setVoltageLowerLimit(format.getV2());
                finalFlow.setCreator(format.getV3());
            } else if (rowCount == 2) {
                finalFlow.setRecordCondition(format.getV1());
                finalFlow.setCurrentUpperLimit(format.getV2());
                finalFlow.setRemark(format.getV3());
            } else if (rowCount == 3) {
                finalFlow.setVoltageRange(format.getV1());
                finalFlow.setCurrentLowerLimit(format.getV2());
            } else if (rowCount == 4) {
                finalFlow.setCurrentRange(format.getV1());
                finalFlow.setStartDatetime(format.getV2());
                finalFlow.setTestBarcode(format.getV3());
            } else if (rowCount == 5) {
                finalFlow.setActiveSubstances(format.getV1());
                finalFlow.setNominalCapacity(format.getV2());
            }
            rowCount++;
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {
            odsTestFlowList.add(finalFlow);
        }
    }

    /**
     * 循环数据处理   sheet 2
     */
    public static class OdsXwCycleDataHandle implements ReadListener<OdsXwCycleData> {
        @Override
        public void invoke(OdsXwCycleData odsXwCycleData, AnalysisContext analysisContext) {
            odsXwCycleDataList.add(odsXwCycleData);
//            if (odsXwCycleDataList.size() >= BATCH_COUNT) {
//                // 存储完成清理 list
//                odsXwCycleDataList.clear();
//            }
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {

        }
    }

    /**
     * 新威原始详情数据 sheet 4
     */
    public static class OdsXwDetailDataHandle implements ReadListener<OdsXwDetailData> {
        @Override
        public void invoke(OdsXwDetailData odsXwDetailData, AnalysisContext analysisContext) {
            odsXwDetailDataList.add(odsXwDetailData);
            if (odsXwDetailDataList.size() >= BATCH_COUNT) {
                //每1000条去调用存储方法一次
                OdsXwDetailDataWriteByFlink.save(odsXwDetailDataList);
                // 存储完成清理 list
                odsXwDetailDataList.clear();
            }
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {

        }
    }

    /**
     * 新威原始工步数据 sheet 3
     */
    public static class OdsXwStepDataHandle implements ReadListener<OdsXwStepData> {
        @Override
        public void invoke(OdsXwStepData odsXwStepData, AnalysisContext analysisContext) {
            odsXwStepDataList.add(odsXwStepData);
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {

        }
    }
}

2.通过filnk-jdbc写入到doris中

java 复制代码
package org.example.day20250105.flink;

public interface DorisConstant {
   public static final String FE_NODE_URL = "192.168.12.244:18030";
    String BE_NODE_URL = "192.168.12.244:18030";

    String DORIS_JDBC_URL = "jdbc:mysql://192.168.12.244:19030/test_db_czq";

    String USERNAME = "root";
    String PASSWORD = "*******";
    String DB_NAMME = "test_db_czq";

    String DRIVER_CLASS_NAME = "com.mysql.cj.jdbc.Driver";
}


package org.example.day20250105.flink.jdbc;

import org.apache.flink.connector.jdbc.JdbcConnectionOptions;
import org.apache.flink.connector.jdbc.JdbcSink;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.example.day20250105.ExcelDataHandle;
import org.example.day20250105.flink.DorisConstant;

/**
 * INSERT INTO `test_db_czq`.`ods_xw_step_data` (`cycle_id`, `step_id`, `step_no`, `step_type`, `step_time`, `start_absolute_time`, `end_absolute_time`, `capacity`, `specific_capacity`, `charge_capacity`, `charge_specific_capacity`, `discharge_capacity`, `discharge_specific_capacity`, `net_discharge_capacity`, `energy`, `specific_energy`, `charge_energy`, `charge_specific_energy`, `discharge_energy`, `discharge_specific_energy`, `net_discharge_energy`, `super_capacitor`, `initial_voltage`, `charge_initial_voltage`, `discharge_initial_voltage`, `end_voltage`, `charge_end_voltage`, `discharge_end_voltage`, `charge_median_voltage`, `discharge_median_voltage`, `initial_current`, `end_current`, `dcir`, `t1_start_temperature`, `t1_end_temperature`, `t1_max_temperature`, `t1_min_temperature`, `v1_start_voltage`, `v1_end_voltage`, `v1_max_voltage`, `v1_min_voltage`, `file_name`) VALUES ('5', '30', '45', '恒流放电', '02:00:19', '2023-01-03 03:47:37', '2023-01-03 05:47:56', '4.0266', '4026648', '0', '0', '4.0266', '4026648', '4.0266', '12.858', '12857977.78', '0', '0', '12.858', '12857977.78', '12.858', '10407.4004', '3.4573', '0', '3.4573', '1.9993', '0', '1.9993', '0', '3.2285', '0', '-2.0089', '0', '25.3', '25.5', '25.6', '25.2', '-0.0003', '-0.0003', '-0.0001', '-0.0003', NULL);
 */
public class OdsXwStepDataWriteByFlink {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        env.fromData(ExcelDataHandle.readStepData())
                .addSink(JdbcSink.sink(
                        "INSERT INTO `test_db_czq`.`ods_xw_step_data` " +
                                "(`cycle_id`, `step_id`, `step_no`, `step_type`, `step_time`, `start_absolute_time`, " +
                                "`end_absolute_time`, `capacity`, `specific_capacity`, `charge_capacity`, `charge_specific_capacity`," +
                                " `discharge_capacity`, `discharge_specific_capacity`, `net_discharge_capacity`, `energy`, `specific_energy`, " +
                                "`charge_energy`, `charge_specific_energy`, `discharge_energy`, `discharge_specific_energy`, `net_discharge_energy`, " +
                                "`super_capacitor`, `initial_voltage`, `charge_initial_voltage`, `discharge_initial_voltage`, `end_voltage`, `charge_end_voltage`," +
                                " `discharge_end_voltage`, `charge_median_voltage`, `discharge_median_voltage`, `initial_current`, `end_current`, `dcir`, " +
                                "`t1_start_temperature`, `t1_end_temperature`, `t1_max_temperature`, `t1_min_temperature`, `v1_start_voltage`, `v1_end_voltage`, " +
                                "`v1_max_voltage`, `v1_min_voltage`, `file_name`) " +
                                "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?, ?,?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);",
                        (ps, data) -> {
                            ps.setString(1, data.getCycle_id());
                            ps.setString(2, data.getStep_id());
                            ps.setString(3, data.getStep_no());
                            ps.setString(4, data.getStep_type());
                            ps.setString(5, data.getStep_time());
                            ps.setString(6, data.getStart_absolute_time());
                            ps.setString(7, data.getEnd_absolute_time());
                            ps.setString(8, data.getCapacity());
                            ps.setString(9, data.getSpecific_capacity());
                            ps.setString(10, data.getCharge_capacity());
                            ps.setString(11, data.getCharge_specific_capacity());
                            ps.setString(12, data.getDischarge_capacity());
                            ps.setString(13, data.getDischarge_specific_capacity());
                            ps.setString(14, data.getNet_discharge_capacity());
                            ps.setString(15, data.getEnergy());
                            ps.setString(16, data.getSpecific_energy());
                            ps.setString(17, data.getCharge_energy());
                            ps.setString(18, data.getCharge_specific_energy());
                            ps.setString(19, data.getDischarge_energy());
                            ps.setString(20, data.getDischarge_specific_energy());
                            ps.setString(21, data.getNet_discharge_energy());
                            ps.setString(22, data.getSuper_capacitor());
                            ps.setString(23, data.getInitial_voltage());
                            ps.setString(24, data.getCharge_initial_voltage());
                            ps.setString(25, data.getDischarge_initial_voltage());
                            ps.setString(26, data.getEnd_voltage());
                            ps.setString(27, data.getCharge_end_voltage());
                            ps.setString(28, data.getDischarge_end_voltage());
                            ps.setString(29, data.getCharge_median_voltage());
                            ps.setString(30, data.getDischarge_median_voltage());
                            ps.setString(31, data.getInitial_current());
                            ps.setString(32, data.getEnd_current());
                            ps.setString(33, data.getDcir());
                            ps.setString(34, data.getT1_start_temperature());
                            ps.setString(35, data.getT1_end_temperature());
                            ps.setString(36, data.getT1_max_temperature());
                            ps.setString(37, data.getT1_min_temperature());
                            ps.setString(38, data.getV1_start_voltage());
                            ps.setString(39, data.getV1_end_voltage());
                            ps.setString(40, data.getV1_max_voltage());
                            ps.setString(41, data.getV1_min_voltage());
                            ps.setString(42, data.getFile_name());
                        },
                        new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
                                .withUsername(DorisConstant.USERNAME)
                                .withPassword(DorisConstant.PASSWORD)
                                .withDriverName(DorisConstant.DRIVER_CLASS_NAME)
                                .withUrl(DorisConstant.DORIS_JDBC_URL)
                                .build()
                ));
        env.execute();
    }
}
复制代码

四、实现结果

登录Doris后台可以看到数据已经同步到了ods_xw_step_data表中

相关推荐
AI极客菌3 小时前
AI绘画工具中,为什么专业玩家爱用Stable Diffusion,普通玩家却喜欢Midjourney?
大数据·人工智能·ai·ai作画·stable diffusion·aigc·midjourney
腾视科技AI3 小时前
腾视科技大模型一体机解决方案:低成本私有化落地,重塑行业智能应用新格局
大数据·人工智能·科技·ai·边缘计算·算力·ai算力
金融支付架构实战指南4 小时前
支付系统 ES 实战案例:从索引创建到真实业务查询
大数据·elasticsearch·搜索引擎·支付
百胜软件@百胜软件6 小时前
从“数据孤岛”到“智利标杆”:百胜E3全渠道中台助力“名创优品”Newtree实现一体化智变
大数据·人工智能·零售数字化·数智中台·珠宝行业
lizhihai_996 小时前
股市学习心得-A股服务器/算力服务器龙头
大数据·运维·服务器·人工智能·科技·学习
AllData公司负责人7 小时前
大模型赋能AllData数据中台,系列升级|通过联合智谱大模型与BiSheng开源项目,建设企业大模型应用开发平台,支持知识库向量检索!
大数据·数据结构·数据库·算法·大模型·向量数据库·智谱ai
Antom全球收单7 小时前
面对多市场、多币种、多支付方式,Antom如何帮助企业搭建全球支付平台
大数据
数智化管理手记7 小时前
标准作业越推越虚?重塑认知、规避误区,破解精益落地形式主义
大数据·网络·精益工程
一只鹿鹿鹿7 小时前
网络安全评估方案
java·大数据·运维·物联网·web安全
人工智能培训8 小时前
打造行业知识图谱三步走
大数据·人工智能·机器学习·3d·知识图谱·agent