一. 服务器基础配置
1. 设置静态IP地址
在所有节点执行:
# 进入管理员用户:
su root
# 查看网卡名称
ip addr
# 编辑网络配置(ens33为示例网卡名,请根据实际情况修改)
vi /etc/sysconfig/network-scripts/ifcfg-ens33
/*
# 查看防火墙运行状态
systemctl status firewalld
# 启动防火墙
systemctl start firewalld
# 关闭防火墙
# 临时关闭(重启后恢复)
systemctl stop firewalld
# 永久关闭(禁用开机自启)
systemctl disable firewalld
# 重启防火墙
systemctl restart firewalld
# 设置开启自启
# 启用开机自启
systemctl enable firewalld
# 禁用开机自启
systemctl disable firewalld
*/
配置:
sheel
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="92eb8451-49b9-4646-93e4-f37b78cf7fa2"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.1.200
NETMASK=255.255.255.0
GATEWAY=192.168.1.2
DNS1=114.114.114.114
DNS2=8.8.8.8
DNS3=192.168.1.0
重启网络服务:
systemctl restart network
**注意:**网关ip配置(GATEWAY)需要和实际网关地址一致
二. 安装系统基础软件
1.将centos的镜像源更新成阿里云的镜像源,方便下载,因为centos在2024年低镜像源就没有更新了
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
2.安装ntp,用于配置时间同步
yum install -y ntp
# 启动ntp服务
systemctl start ntpd
systemctl enable ntpd
# 设置时区
timedatectl set-timezone Asia/Shanghai
date
# 安装 chrony(CentOS 7 推荐的时间同步工具)
sudo yum install -y chrony
# 启动 chronyd 服务
sudo systemctl start chronyd
# 设置开机自启
sudo systemctl enable chronyd
安装net-tools,用于可以ifconfig查看网络
sudo yum install -y net-tools
# 安装vim编辑器
yum install vim-enhanced
# CentOS 需要 psmisc(提供 pstree 命令以供进程树分析使用):
yum install -y psmisc
# 安装 telnet 客户端
yum install -y telnet
下载文件包
yum install -y wget
查看当前目录下的所有结构
yum install -y tree
-- 配置虚拟机主机名
vi /etc/hostname
# 覆盖更新如下
linux1
vi /etc/hosts
# 添加如下内容
192.168.1.201 linux1
CentOS 7 单机实时数据同步环境搭建
架构路径: Kafka -> Flink 1.20 -> StarRocks(支持扩展其他数据库)
调度工具: DolphinScheduler
适用场景: 单机模拟环境、实时数据入库、多数据源数据集成
1. 环境准备
1.1 硬件与系统要求
| 配置项 | 最低要求 | 推荐配置 | 说明 |
|---|---|---|---|
| 操作系统 | CentOS 7.6+ | CentOS 7.9 | 最小化安装即可 |
| CPU | 2核 | 4核 | Flink 流处理对CPU有一定要求 |
| 内存 | 4GB | 8GB | 内存不足会导致 Flink OOM |
| 硬盘 | 20GB | 50GB | 存储日志和临时数据 |
| 网络 | 可访问外网 | - | 用于下载依赖包 |
1.2 创建统一的安装目录
bash
# 运维软件根目录
mkdir tools
# 运维软件安装目录
mkdir /tools/server
# 运维软件安装包目录
mkdir /tools/soft
# 运维软件日志目录
mkdir /tools/logs
1.3 安装jdk1.8
bash
1.进入管理员账户:
su root
2.解压安装包(安装包网上有很多,我这里使用的是jdk-8u241-linux-x64.tar.gz)
tar -zxvf /tools/soft/jdk-8u241-linux-x64.tar.gz -C /tools/server/
3.配置软连接(删除软连接:unlink 软连接名称)
ln -s /tools/server/jdk1.8.0_241 jdk1.8
4.配置环境变量
vim /etc/profile
# 路径与实际解压目录一致
export JAVA_HOME=/tools/server/jdk1.8
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
5. 刷新环境变量
source /etc/profile
6.验证
java -version
2. 组件版本清单
3. 基础组件安装
文件夹存放目录
# 先一次将如上需要的包下载到如下文件夹
# 存放包的文件夹
/tools/soft/
# 存放flink连接器的文件夹
/tools/soft/flink_connector/
3.1 Zookeeper 安装配置
bash
cd /tools/soft
# 下载并解压
tar -zxvf /tools/soft/apache-zookeeper-3.8.4-bin.tar.gz -C /tools/server/
mv /tools/server/apache-zookeeper-3.8.4-bin /tools/server/zookeeper
# 复制配置文件
cp /tools/server/zookeeper/conf/zoo_sample.cfg /tools/server/zookeeper/conf/zoo.cfg
# 创建数据目录
mkdir -p /tools/server/zookeeper/data
# 修改数据目录路径
sed -i 's|dataDir=/tmp/zookeeper|dataDir=/tools/server/zookeeper/data|g' /tools/server/zookeeper/conf/zoo.cfg
# 启动 Zookeeper
/tools/server/zookeeper/bin/zkServer.sh start
# 验证启动状态
/tools/server/zookeeper/bin/zkServer.sh status
# 关闭 Zookeeper
/tools/server/zookeeper/bin/zkServer.sh stop
# 预期输出: Mode: standalone
3.1.1配置环境变量
# 打开环境变量配置
vim /etc/profile
# Zookeeper 环境变量
export ZOOKEEPER_HOME=/tools/server/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
# 使环境变量生效
source /etc/profile
操作 命令
启动 zkServer.sh start
停止 zkServer.sh stop
重启 zkServer.sh restart
查看状态 zkServer.sh status
前台运行(调试用) zkServer.sh start-foreground
3.2 Kafka 安装配置
bash
cd /tools/soft/
# 解压
tar -zxvf /tools/soft/kafka_2.13-3.4.1.tgz -C /tools/server/
mv /tools/server/kafka_2.13-3.4.1 /tools/server/kafka
3.2.1.修改配置文件
# 修改配置文件
cat > /tools/server/kafka/config/server.properties << 'EOF'
# Broker 基础配置
broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://192.168.1.200:9092
# 日志目录
log.dirs=/tools/server/kafka/logs
# 主题配置(禁止自动创建,便于管理)
auto.create.topics.enable=false
# 单机模式优化配置
num.partitions=1
default.replication.factor=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
EOF
# 配置zookeeper.connect
echo "zookeeper.connect=localhost:2181" >> /tools/server/kafka/config/server.properties
# 检查是否配置成功
grep "zookeeper.connect" /tools/server/kafka/config/server.properties
3.2.2.启动和排查
# 创建日志目录
mkdir -p /tools/server/kafka/logs
# 启动 Kafka
/tools/server/kafka/bin/kafka-server-start.sh -daemon /tools/server/kafka/config/server.properties
# 停止 kafka
/tools/server/kafka/bin/kafka-server-stop.sh
# 查看状态:
# 方式1:查看 Kafka 进程是否存在
ps aux | grep kafka.Kafka | grep -v grep
# 方式2:查看 Kafka 进程ID(更简洁)
jps | grep Kafka
# 方式3:查看 Kafka 是否正常监听端口
netstat -tlnp | grep 9092
# 方式4:查看 Kafka 服务健康状态(需要先创建测试主题)
# 列出所有主题,如果成功说明服务正常
/tools/server/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
# 创建测试用 Topic test_source
/tools/server/kafka/bin/kafka-topics.sh --create \
--topic test_source \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1
# 测试kafka是否能连通
/tools/server/kafka/bin/kafka-topics.sh --bootstrap-server 100.92.3.61:9092 --describe --topic test_topic_mhww_pre
/tools/server/kafka/bin/kafka-topics.sh --bootstrap-server 100.92.3.61:9092 --list
速查表
| 操作 | 命令 |
|---|---|
| 启动 | /tools/server/kafka/bin/kafka-server-start.sh -daemon /tools/server/kafka/config/server.properties |
| 停止 | /tools/server/kafka/bin/kafka-server-stop.sh |
| 重启 | /tools/server/kafka/bin/kafka-server-stop.sh && sleep 3 && /tools/server/kafka/bin/kafka-server-start.sh -daemon /tools/server/kafka/config/server.properties |
| 查看状态 | `ps aux |
| 查看日志 | tail -f /tools/server/kafka/logs/server.log |
3.2.3.配置环境变量
# 打开环境变量配置
vim /etc/profile
# kafka 环境变量
export KAFKA_HOME=/tools/server/kafka
export PATH=$PATH:$KAFKA_HOME/bin
# 使环境变量生效
source /etc/profile
3.3 StarRocks 安装配置
StarRocks 在虚拟机中运行时,必须关闭透明大页,否则 BE 节点无法启动或性能极差。
bash
cd /tools/server/soft
# 1. 关闭透明大页(必须执行)
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
# 永久生效配置
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' | sudo tee -a /etc/rc.d/rc.local
echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' | sudo tee -a /etc/rc.d/rc.local
sudo chmod +x /etc/rc.d/rc.local
# 2. 调整系统参数(解决 max_map_count 不足问题)
sudo sysctl -w vm.max_map_count=2000000
echo 'vm.max_map_count=2000000' | sudo tee -a /etc/sysctl.conf
3.3.1、MySQL 5.7 安装配置
StarRocks 需要使用 MySQL 客户端来连接和管理,所以先安装 MySQL。
3.3.1.1 解压 MySQL
bash
# 创建安装目录
mkdir -p /tools/server
# 解压 MySQL
cd /tools/soft
tar -zxvf mysql-5.7.42-linux-glibc2.12-x86_64.tar.gz -C /tools/server/
# 重命名目录
mv /tools/server/mysql-5.7.42-linux-glibc2.12-x86_64 /tools/server/mysql5.7
3.3.1.2 创建 MySQL 用户和数据目录
bash
# 创建 mysql 用户组和用户
groupadd mysql
useradd -r -g mysql -s /bin/false mysql
# 创建数据目录和日志目录
mkdir -p /tools/server/mysql5.7/data
mkdir -p /tools/server/mysql5.7/logs
# 修改权限
chown -R mysql:mysql /tools/server/mysql5.7
3.3.1.3 配置 MySQL
bash
# 创建配置文件
cat > /tools/server/mysql5.7/my.cnf << 'EOF'
[client]
port=3306
socket=/tools/server/mysql5.7/mysql.sock
[mysqld]
user=mysql
port=3306
socket=/tools/server/mysql5.7/mysql.sock
basedir=/tools/server/mysql5.7
datadir=/tools/server/mysql5.7/data
log-error=/tools/server/mysql5.7/logs/mysqld.log
pid-file=/tools/server/mysql5.7/mysqld.pid
# 字符集设置
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci
# 内存配置(单机测试用)
innodb_buffer_pool_size=512M
key_buffer_size=64M
# 允许远程连接
bind-address=0.0.0.0
EOF
3.3.1.4 初始化 MySQL
bash
# 初始化数据库(注意记录输出的临时密码)
cd /tools/server/mysql5.7
bin/mysqld --defaults-file=/tools/server/mysql5.7/my.cnf --initialize --user=mysql
# 查看临时密码
grep 'temporary password' /tools/server/mysql5.7/logs/mysqld.log
# 输出类似: [Note] A temporary password is generated for root@localhost: mjLk:ND<I8uj
3.3.1.5 配置 MySQL 环境变量
bash
vim /etc/profile
# mysql5.7 环境变量
export MYSQL_HOME=/tools/server/mysql5.7
export PATH=$PATH:$MYSQL_HOME/bin
# 生效配置
source /etc/profile
3.3.1.6 启动 MySQL
bash
# 启动 MySQL
/tools/server/mysql5.7/bin/mysqld_safe --defaults-file=/tools/server/mysql5.7/my.cnf &
# 等待5秒
sleep 5
# 检查是否启动成功
ps aux | grep mysql | grep -v grep
netstat -tlnp | grep 3306
3.3.1.7 修改 MySQL root 密码
bash
# 使用临时密码登录(把下面命令中的临时密码替换成你实际的)
mysql -uroot -p -S /tools/server/mysql5.7/mysql.sock
# 进入 MySQL 后执行以下 SQL
# 修改密码(MySQL 5.7 要求密码包含大小写字母、数字、特殊字符)
ALTER USER 'root'@'localhost' IDENTIFIED BY 'StarRocks@2026';
# 允许远程连接(StarRocks 连接需要)
CREATE USER 'root'@'%' IDENTIFIED BY 'StarRocks@2026';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
# 退出
EXIT;
3.3.1.8 创建 MySQL 启动脚本
bash
# 创建启动脚本
cat > /tools/server/mysql5.7/mysql-manager.sh << 'EOF'
#!/bin/bash
case "$1" in
start)
echo "Starting MySQL..."
/tools/server/mysql5.7/bin/mysqld_safe --defaults-file=/tools/server/mysql5.7/my.cnf --user=mysql &
;;
stop)
echo "Stopping MySQL..."
/tools/server/mysql5.7/bin/mysqladmin -S /tools/server/mysql5.7/mysql.sock -u root -p shutdown
;;
status)
if ps aux | grep -v grep | grep mysqld > /dev/null; then
echo "MySQL is running"
else
echo "MySQL is stopped"
fi
;;
*)
echo "Usage: $0 {start|stop|status}"
exit 1
;;
esac
EOF
chmod +x /tools/server/mysql5.7/mysql-manager.sh
3.3.2、StarRocks 3.1.17 安装配置
3.3.2.1 解压 StarRocks
bash
cd /tools/soft
# 解压 StarRocks
tar -zxvf /tools/soft/StarRocks-3.1.17-centos-amd64.tar.gz -C /tools/server/
# 重命名目录
mv /tools/server/StarRocks-3.1.17-centos-amd64 /tools/server/starrocks
3.3.2.2 创建数据目录
bash
# 创建 FE 和 BE 的数据目录
mkdir -p /tools/server/starrocks/fe/meta
mkdir -p /tools/server/starrocks/be/storage
# 创建日志目录
mkdir -p /tools/server/starrocks/logs
3.3.2.3 配置 StarRocks 环境变量
bash
vim /etc/profile
# StarRocks 环境变量
export STARROCKS_HOME=/tools/server/starrocks
export PATH=$PATH:$STARROCKS_HOME/fe/bin:$STARROCKS_HOME/be/bin
# 生效配置
source /etc/profile
3.3.2.4 配置 FE(Frontend)
bash
# 编辑 FE 配置文件
vim /tools/server/starrocks/fe/conf/fe.conf
# 修改以下配置(或添加)
properties
# FE 配置文件 fe.conf
# HTTP 端口(Web UI)
http_port = 8030
# RPC 端口
rpc_port = 9020
# MySQL 服务端口(客户端连接用)
query_port = 9030
# 编辑日志端口
edit_log_port = 9010
# FE 元数据目录(这里配置成环境变量目录会失败不知道为啥,所以就修改成了绝对地址)
meta_dir = /tools/server/starrocks/fe/meta
# JVM 参数(单机测试用)
JAVA_OPTS = "-Xmx2048m -Xms2048m"
3.3.2.5 配置 BE(Backend)
bash
# 编辑 BE 配置文件
vim /tools/server/starrocks/be/conf/be.conf
properties
# BE 配置文件 be.conf
# BE 数据存储目录
storage_root_path = /tools/server/starrocks/be/storage
# BE 端口
be_port = 9060
be_http_port = 8040
be_rpc_port = 9070
# 内存限制(单机测试用,总内存的 80%)
mem_limit = 80%
# 添加以下内容(指定 IP 地址段)
priority_networks = 192.168.1.0/24
3.3.2.6 启动 FE
bash
# 启动 FE(注意:官方建议安装jdk11或者是17,因为我安装的8,所以会报建议)
/tools/server/starrocks/fe/bin/start_fe.sh --daemon
# 停止FE
/tools/server/starrocks/fe/bin/stop_fe.sh
# 等待 FE 启动
sleep 10
# 检查 FE 是否启动成功
ps aux | grep StarRocksFE | grep -v grep
netstat -tlnp | grep 9030
# 查看 FE 日志
tail -f /tools/server/starrocks/fe/log/fe.log
3.3.2.7 启动 BE
bash
# 启动 BE
/tools/server/starrocks/be/bin/start_be.sh --daemon
# 停止 BE
/tools/server/starrocks/be/bin/stop_be.sh
# 等待 BE 启动
sleep 5
# 检查 BE 是否启动成功
ps aux | grep starrocks_be | grep -v grep
netstat -tlnp | grep 9060
3.3.2.8 添加 BE 节点到集群
bash
# 使用 MySQL 客户端连接 FE
mysql -h 127.0.0.1 -P 9030 -uroot
# 在 MySQL 命令行中执行以下 SQL
-- 查看 FE 状态
SHOW FRONTENDS\G;
-- 添加 BE 节点
ALTER SYSTEM ADD BACKEND "192.168.1.200:9050";
-- 查看 BE 状态(Alive 应该为 true)
SHOW BACKENDS\G;
-- 创建测试数据库
CREATE DATABASE test_db;
-- 使用数据库
USE test_db;
-- 创建测试表
CREATE TABLE test_table (
id INT NOT NULL,
name VARCHAR(100) NULL,
create_time DATETIME NULL
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");
-- 退出
EXIT;
3.3.3、创建 StarRocks 管理脚本
bash
# 创建管理脚本
cat > /tools/server/starrocks/starrocks-manager.sh << 'EOF'
#!/bin/bash
STARROCKS_HOME=/tools/server/starrocks
case "$1" in
start)
echo "Starting StarRocks FE..."
$STARROCKS_HOME/fe/bin/start_fe.sh --daemon
sleep 10
echo "Starting StarRocks BE..."
$STARROCKS_HOME/be/bin/start_be.sh --daemon
;;
stop)
echo "Stopping StarRocks BE..."
$STARROCKS_HOME/be/bin/stop_be.sh
echo "Stopping StarRocks FE..."
$STARROCKS_HOME/fe/bin/stop_fe.sh
;;
restart)
$0 stop
sleep 5
$0 start
;;
status)
echo "=== FE Status ==="
ps aux | grep StarRocksFE | grep -v grep || echo "FE is not running"
echo ""
echo "=== BE Status ==="
ps aux | grep starrocks_be | grep -v grep || echo "BE is not running"
echo ""
echo "=== MySQL Port ==="
netstat -tlnp | grep 9030 || echo "MySQL port 9030 not listening"
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac
EOF
chmod +x /tools/server/starrocks/starrocks-manager.sh
3.3.4、验证安装
3.3.4.1 验证所有服务
bash
# 检查 MySQL
ps aux | grep mysql | grep -v grep
mysql -h 127.0.0.1 -P 3306 -uroot -pStarRocks@2026 -e "SELECT VERSION();"
# 检查 StarRocks FE
ps aux | grep StarRocksFE | grep -v grep
# 检查 StarRocks BE
ps aux | grep starrocks_be | grep -v grep
# 检查 StarRocks 端口
netstat -tlnp | grep -E "8030|9030|9060"
3.3.4.2 连接 StarRocks 验证
bash
# 连接 StarRocks
mysql -h 127.0.0.1 -P 9030 -uroot
# 执行以下 SQL 验证
SHOW DATABASES;
SHOW PROC '/backends'\G
SHOW PROC '/frontends'\G
mysql -h127.0.0.1 -P9030 -uroot
# 如果无密码直接进入,则执行:
SET PASSWORD FOR 'root'@'%' = PASSWORD('root123');
# 或者对于新版本:
ALTER USER 'root'@'%' IDENTIFIED BY 'root123';
# 创建一个flink专用的用户
CREATE USER 'flink'@'%' IDENTIFIED BY 'flink2024';
GRANT ALL PRIVILEGES ON sr_test.* TO 'flink'@'%';
# 设置了上面之后后续登录就需要是:
mysql -h127.0.0.1 -P9030 -uroot -proot123
3.3.5、一键启动脚本(整合所有组件)
bash
# 创建一键启动脚本
cat > /tools/start-all.sh << 'EOF'
#!/bin/bash
echo "========================================="
echo "启动所有数据组件"
echo "========================================="
# 1. 启动 Zookeeper
echo "1. 启动 Zookeeper..."
/tools/server/zookeeper/bin/zkServer.sh start
sleep 3
# 2. 启动 Kafka
echo "2. 启动 Kafka..."
/tools/server/kafka/bin/kafka-server-start.sh -daemon /tools/server/kafka/config/server.properties
sleep 3
# 3. 启动 MySQL
echo "3. 启动 MySQL..."
/tools/server/mysql/bin/mysqld_safe --defaults-file=/tools/server/mysql/my.cnf --user=mysql &
sleep 5
# 4. 启动 StarRocks FE
echo "4. 启动 StarRocks FE..."
/tools/server/starrocks/fe/bin/start_fe.sh --daemon
sleep 10
# 5. 启动 StarRocks BE
echo "5. 启动 StarRocks BE..."
/tools/server/starrocks/be/bin/start_be.sh --daemon
sleep 3
# 6. 启动 Flink
echo "6. 启动 Flink..."
/tools/server/flink/bin/start-cluster.sh
sleep 3
echo "========================================="
echo "所有组件启动完成!"
echo "========================================="
echo ""
echo "服务访问地址:"
echo " - StarRocks MySQL: mysql -h 127.0.0.1 -P 9030 -uroot"
echo " - StarRocks Web: http://localhost:8030"
echo " - Flink Web: http://localhost:8081"
echo " - MySQL: mysql -h 127.0.0.1 -P 3306 -uroot -p"
echo " - Kafka 端口: 9092"
echo " - Zookeeper 端口: 2181"
EOF
chmod +x /tools/start-all.sh
3.4 DolphinScheduler 3.2.1伪集群安装配置
本文档基于官方伪集群部署指南,针对你已清理的环境进行重写,使用 root 用户 部署,安装目录为
/tools/server/dolphinscheduler。
一、环境概览
| 项目 | 路径/值 |
|---|---|
| 操作系统 | CentOS 7.9 |
| 软件包 | /tools/soft/apache-dolphinscheduler-3.2.1-bin.tar.gz |
| 目标安装目录 | /tools/server/dolphinscheduler |
| JDK 1.8 | /tools/server/jdk1.8 |
| ZooKeeper | /tools/server/zookeeper |
| MySQL 5.7 | /tools/server/mysql5.7 |
| 数据库 | dolphinscheduler / 用户 dolphinscheduler / 密码 dolphinscheduler |
| ZooKeeper 地址 | localhost:2181 |
二、前置准备
2.1 配置 SSH 免密登录
部署脚本需要向本机发送资源,必须配置 SSH 免密。
bash
# 生成密钥并追加到 authorized_keys
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
验证是否成功(无需密码即成功):
bash
ssh localhost
2.2 启动 ZooKeeper
bash
cd /tools/server/zookeeper
# 若 zoo.cfg 不存在,则从样本复制
cp conf/zoo_sample.cfg conf/zoo.cfg
# 启动
./bin/zkServer.sh start
# 验证状态
./bin/zkServer.sh status
确保 ZooKeeper 在 localhost:2181 上正常运行。
三、解压并部署 DolphinScheduler
3.1 解压二进制包
bash
tar -zxf /tools/soft/apache-dolphinscheduler-3.2.1-bin.tar.gz -C /tools/server/
mv /tools/server/apache-dolphinscheduler-3.2.1-bin /tools/server/dolphinscheduler
3.2 设置脚本可执行权限
bash
chmod -R 755 /tools/server/dolphinscheduler/bin/
四、修改配置文件
4.1 修改 install_env.sh
文件位置:/tools/server/dolphinscheduler/bin/env/install_env.sh
修改为以下内容(注意使用 root 用户,installPath 改为你的目标目录):
bash
# 安装机器 IP,伪集群填 localhost
ips="localhost"
sshPort="22"
# Master、Worker、Alert、Api 均部署在本机
masters="localhost"
workers="localhost:default"
alertServer="localhost"
apiServers="localhost"
# DolphinScheduler 安装路径(部署脚本会在此目录下安装各服务)
installPath="/tools/server/dolphinscheduler"
# 部署用户(使用 root)
deployUser="root"
⚠️ 特别注意 :
installPath配置的是 DolphinScheduler 各服务(master-server、worker-server、api-server、alert-server)的实际安装运行目录,而不是解压包所在的目录。部署脚本会自动在该路径下创建各服务的目录结构。
4.2 修改 dolphinscheduler_env.sh
文件位置:/tools/server/dolphinscheduler/bin/env/dolphinscheduler_env.sh
根据你的环境替换为以下配置:
bash
# JDK 路径
export JAVA_HOME=/tools/server/jdk1.8
# 数据库类型(MySQL)
export DATABASE=mysql
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai"
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=dolphinscheduler
# 注册中心(ZooKeeper)
export REGISTRY_TYPE=zookeeper
export REGISTRY_ZOOKEEPER_CONNECT_STRING=localhost:2181
# 如需使用 Flink,取消注释并配置路径
export FLINK_HOME=/tools/server/flink-client
其他任务相关配置(Hadoop、Spark 等)如不使用可保留原样或注释掉。
五、初始化 MySQL 数据库
5.1 添加 MySQL JDBC 驱动
下载 MySQL 8.0 版本的 JDBC 驱动(mysql-connector-java-8.0.16.jar),放到以下目录:
bash
# 创建 lib 目录并下载驱动
mkdir -p /tools/server/dolphinscheduler/lib
wget -P /tools/server/dolphinscheduler/lib/ https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar
注意:JDBC 驱动只需要放在
/tools/server/dolphinscheduler/lib/目录下,部署脚本会自动分发到各服务的 libs 目录,无需手动复制。
5.2 创建数据库及用户
登录 MySQL(密码:StarRocks@2026):
bash
mysql -u root -pStarRocks@2026
执行以下 SQL:
sql
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE USER 'dolphinscheduler'@'localhost' IDENTIFIED BY 'dolphinscheduler';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'localhost';
FLUSH PRIVILEGES;
5.3 执行初始化脚本
bash
cp /tools/server/dolphinscheduler/lib/mysql-connector-java-8.0.16.jar /tools/server/dolphinscheduler/tools/libs/
cd /tools/server/dolphinscheduler/tools/bin
./upgrade-schema.sh
或者直接执行 SQL 脚本:
bash
mysql -udolphinscheduler -pdolphinscheduler dolphinscheduler < /tools/server/dolphinscheduler/tools/sql/sql/dolphinscheduler_mysql.sql
执行成功后,数据库表会被自动创建。
六、启动 DolphinScheduler
bash
官方设计:解压包目录(包含 bin、conf 等)是 部署源,install.sh 会将其服务文件复制到 installPath(实际运行目录),两者必须不同。
mv /tools/server/dolphinscheduler /tools/server/dolphinscheduler-package
find /tools/server/dolphinscheduler-package -name "*.sh" -exec chmod +x {} \;
cp /tools/server/dolphinscheduler-package/lib/mysql-connector-java-8.0.16.jar /tools/server/dolphinscheduler-package/master-server/libs/
cp /tools/server/dolphinscheduler-package/lib/mysql-connector-java-8.0.16.jar /tools/server/dolphinscheduler-package/alert-server/libs/
cp /tools/server/dolphinscheduler-package/lib/mysql-connector-java-8.0.16.jar /tools/server/dolphinscheduler-package/api-server/libs/
并编辑每个服务的配置文件
vim /tools/server/dolphinscheduler-package/master-server/bin/start.sh
vim /tools/server/dolphinscheduler-package/alert-server/bin/start.sh
vim /tools/server/dolphinscheduler-package/api-server/bin/start.sh
vim /tools/server/dolphinscheduler-package/standalone-server/bin/start.sh
vim /tools/server/dolphinscheduler-package/worker-server/bin/start.sh
cd /tools/server/dolphinscheduler
bash /tools/server/dolphinscheduler/bin/install.sh
说明 :
install.sh脚本会做以下事情:读取install_env.sh配置 → 在installPath目录下创建各服务目录 → 将解压包中的服务文件复制过去 → 启动所有服务。
部署完成后,检查各服务状态:
bash
bash /tools/server/dolphinscheduler/bin/status-all.sh
正常应显示 Master、Worker、Api、Alert 均为 RUNNING。
七、登录 Web UI
浏览器访问:http://ip:12345/dolphinscheduler/ui
默认账号密码:
- 用户名:
admin - 密码:
dolphinscheduler123
如果浏览器无法访问:
bash
# 检查端口是否监听
netstat -tlnp | grep 12345
# 若防火墙开启,放行端口
firewall-cmd --zone=public --add-port=12345/tcp --permanent
firewall-cmd --reload
八、常用启停命令
以下命令均在 /tools/server/dolphinscheduler 目录下执行:
| 操作 | 命令 |
|---|---|
| 一键启动所有服务 | bash /tools/server/dolphinscheduler/bin/start-all.sh |
| 一键停止所有服务 | bash /tools/server/dolphinscheduler/bin/stop-all.sh |
| 查看所有服务状态 | bash /tools/server/dolphinscheduler/bin/status-all.sh |
| 单独启动 Master | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh start master-server |
| 单独停止 Master | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop master-server |
| 单独启动 Worker | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh start worker-server |
| 单独停止 Worker | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop worker-server |
| 单独启动 Api | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server |
| 单独停止 Api | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop api-server |
| 单独启动 Alert | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh start alert-server |
| 单独停止 Alert | bash /tools/server/dolphinscheduler/bin/dolphinscheduler-daemon.sh stop alert-server |
九、重置操作参考(如需要)
如需完全重置环境,执行以下命令:
bash
# 删除数据库
mysql -u root -pStarRocks@2026 -e "DROP DATABASE IF EXISTS dolphinscheduler;"
# 删除安装目录
rm -rf /tools/server/dolphinscheduler
rm -rf /tmp/dolphinscheduler
# 清除环境变量配置(如果之前配置过)
rm -f /etc/profile.d/dolphinscheduler.sh
rm -f /etc/systemd/system/dolphinscheduler-*.service
systemctl daemon-reload
十、常见问题排查
| 问题 | 排查方法 |
|---|---|
| 服务启动失败 | 查看各服务日志:/tools/server/dolphinscheduler/<service>/logs/ |
| 数据库连接失败 | 检查 MySQL 服务是否正常:systemctl status mysqld |
| ZooKeeper 连接失败 | 检查 ZooKeeper 状态:/tools/server/zookeeper/bin/zkServer.sh status |
| Web UI 无法访问 | 检查 API Server 是否运行:`netstat -tlnp |
| 端口被占用 | 修改 API 端口:编辑 api-server/conf/application.yaml |
4. Flink 1.20 安装与连接器配置
4.1 Flink 1.20 基础安装
由于我的flink安装在了第二台虚拟机上所以需要再第一台虚拟机上开放对应的端口
9092 (Kafka)、9030 (StarRocks MySQL协议)、8030 (StarRocks HTTP)、5432(PG)/3306(MySQL)
# 开放单个写法
sudo firewall-cmd --zone=public --add-port=8040/tcp --permanent
sudo firewall-cmd --reload
# 开放多个写法
sudo firewall-cmd --zone=public --add-port={9092,9030,8030,5432,3306,9060,8040}/tcp --permanent
sudo firewall-cmd --reload
# 查看已开放端口
firewall-cmd --zone=public --list-ports
# 查看所有规则
firewall-cmd --list-all
cd /tools/soft
# 2. 解压并配置目录
tar -zxvf /tools/soft/flink-1.20.3-bin-scala_2.12.tgz -C /tools/server/
mv /tools/server/flink-1.20.3 /tools/server/flink
# 3. 配置 Flink(允许外部访问 Web UI)
cat >> /tools/server/flink/conf/flink-conf.yaml << 'EOF'
# 允许外部访问 Web UI
rest.bind-address: 0.0.0.0
rest.port: 8081
# 任务管理配置
taskmanager.memory.process.size: 2048m
jobmanager.memory.process.size: 1024m
EOF
4.2 配置环境变量
bash
vim /etc/profile
# flink1.20 环境变量
export FLINK_HOME=/tools/server/flink
export PATH=$FLINK_HOME/bin:$PATH
# 生效配置
source /etc/profile
配置免密 SSH:
bash
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
# 验证
ssh localhost date
修改flink配置文件:
vim /tools/server/flink/conf/flink-conf.yaml
# JobManager 地址(绑定当前主机 IP)
jobmanager.rpc.address: 192.168.1.201
jobmanager.bind-host: 0.0.0.0
# TaskManager 资源(测试环境可适当调小)
taskmanager.numberOfTaskSlots: 4
taskmanager.memory.process.size: 1024m
parallelism.default: 2
# 允许 REST 外部访问
rest.address: 192.168.1.201
rest.bind-address: 0.0.0.0
# 其他保持默认
修改workers
vim /tools/server/flink/conf/workers
localhost
启动单机flink
cd /tools/server/flink
./bin/start-cluster.sh
启动flink: /tools/server/flink/bin/start-cluster.sh
停止flink: /tools/server/flink/bin/stop-cluster.sh
4.3 网络连通性验证
Flink 连接器正常工作需要确保网络端口可访问:
bash
# StarRocks 端口连通性测试
telnet 192.168.1.200 8030 # FE HTTP 端口(Stream Load)
telnet 192.168.1.200 9030 # FE MySQL 端口(JDBC)
telnet 192.168.1.200 9060 # BE 端口(数据读取)
# Kafka 端口连通性测试
telnet 192.168.1.200 9092 # Kafka Broker 端口
所有 JAR 拷贝至 /tools/server/flink/lib/
cp *.jar /tools/server/flink/lib/
cd /tools/server/flink
./bin/stop-cluster.sh && ./bin/start-cluster.sh
第一台虚拟机部署 Flink 客户端(供 DolphinScheduler 使用)
DS 需要通过本地 Flink 客户端提交作业到远程集群,因此在第一台虚拟机上部署一份 Flink 客户端。
解压客户端
bash
cd /tools/server/
# 如果已有相同安装包,直接复制一份
tar -zxvf /tools/soft/flink-1.20.3-bin-scala_2.12.tgz -C /tools/server/
mv /tools/server/flink-1.20.3 /tools/server/flink-client
配置客户端
编辑 /tools/server/flink-client/conf/flink-conf.yaml,仅需指定 JobManager 地址(无需启动任何进程):
vim /tools/server/flink-client/conf/flink-conf.yaml
jobmanager.rpc.address: 192.168.1.201
rest.address: 192.168.1.201
# 因为权限问题,所以需要将jdk地址配置的如下两个文件中
vim /tools/server/flink-client/conf/config.yaml
vim /tools/server/flink-client/conf/flink_conf.yaml
env.java.home: /tools/server/jdk1.8
# 其他使用默认值
# 虚拟机linux1上root用户执行如下语句赋权
chown -R dolphinscheduler:dolphinscheduler /tools/server/dolphinscheduler/flink-client
chmod -R 755 /tools/server/dolphinscheduler/flink-client
# 确保 logs 目录存在且可写
mkdir -p /tools/server/dolphinscheduler/flink-client/log
chmod 775 /tools/server/dolphinscheduler/flink-client/log # 或者 755,但要保证 dolphinscheduler 可写入
将虚拟机2的flink的包全部复制一份给虚拟机1
在虚拟机2上执行
# 拷贝单个文件夹
scp -r /tools/server/flink/lib/ root@192.168.1.100:/tools/server/flink-client/lib/
# 拷贝多个文件夹
scp -r /path/to/folder1 /path/to/folder2 root@192.168.1.100:/path/to/destination/
# 例如:拷贝 flink 和 flink_connector 目录
scp -r /tools/server/flink /root/flink_connector root@192.168.1.100:/root/
5.达梦数据库安装
- 达梦数据库官方网站:https://www.dameng.com/list_103.html
- 达梦生态社区下载区:https://eco.dameng.com/download/
在社区下载时,需要选择与你安装包文件名 "dm8_20260312_x86_CentOS7_64.zip" 相匹配的版本,即 x86 架构 和 CentOS 7 的对应安装包。
达梦数据库安装教程
请确保你在CentOS 7.9系统上有root权限来完成这些步骤。
第1步:准备工作
-
创建专用用户和用户组 :出于安全考虑,建议不要直接用
root用户运行达梦数据库,创建一个专用的damen用户和damen_group用户组。bash
groupadd damen_group useradd -g damen_group -m -d /home/damen -s /bin/bash damen passwd damen 密码:damen_passwd -
创建安装目录 :根据你的要求,创建安装目录
/tools/server/和数据目录,并将权限赋予damen用户。bash
mkdir -p /tools/server/ mkdir -p /tools/data/ chown -R damen:damen_group /tools/server/ chown -R damen:damen_group /tools/data/ -
调整系统资源限制 :为确保数据库稳定运行,需要增大系统文件打开数等限制。编辑
/etc/security/limits.conf,在文件末尾添加:text
* soft nofile 65535 * hard nofile 65535 * soft nproc 65535 * hard nproc 65535 -
安装依赖包:达梦数据库需要一些系统依赖库。
bash
yum install -y gcc gcc-c++ make libaio-devel numactl-devel glibc-devel zlib-devel unzip -
临时关闭防火墙与SELinux:为避免安装过程中出现连接问题,可临时关闭它们(生产环境建议按需配置)。
bash
systemctl stop firewalld setenforce 0
第2步:解压安装包并安装
-
解压安装包 :你已指明安装包在
/tools/soft/,挂载或解压获得的ISO文件。bash
unzip /tools/soft/dm8_20260312_x86_CentOS7_64.zip -d /tools/server/ mount -o loop /tools/server/dm8_20260312_x86_CentOS7_64.iso /mnt -
切换用户并安装 :切换至
damen用户,进入挂载目录,执行静默安装。bash
su - damen cd /mnt ./DMInstall.bin -i安装过程会以交互式文本界面进行,你需要根据提示做出选择:
- 语言选择 :
1(中文) - Key文件:直接回车跳过(试用版,有效期一年)
- 时区设置 :
21(中国标准时间) - 安装类型 :
1(典型安装) - 安装目录 :输入
/tools/server/dmdbms
- 语言选择 :
第3步:初始化数据库
安装完成后,需要手动初始化数据库实例和配置文件。
-
切换回
damen用户,进入安装路径的bin目录。bash
su - damen cd /tools/server/dmdbms/bin -
执行
dminit命令进行初始化(请根据实际情况调整参数)。bash
./dminit PATH=/tools/data/ PAGE_SIZE=32 EXTENT_SIZE=32 CASE_SENSITIVE=y CHARSET=1 DB_NAME=DMDB INSTANCE_NAME=DMSERVER PORT_NUM=5236 SYSDBA_PWD=Damen123456 SYSAUDITOR_PWD=Damen123456
第4步:注册与启动服务
-
注册系统服务 :切换到
root用户,执行注册脚本,实现开机自启。bash
cd /tools/server/dmdbms/script/root/ ./dm_service_installer.sh -t dmserver -dm_ini /tools/data/DMDB/dm.ini -p DMSERVER -
启动数据库服务 :使用
systemctl命令管理达梦服务。bash
# 立即启动达梦数据库 systemctl start DmServiceDMSERVER # 查看服务状态 systemctl status DmServiceDMSERVER # 重启生效配置 systemctl restart DmServiceDMSERVER # 停止服务 systemctl stop DmServiceDMSERVER # 设置达梦数据库开机自启 systemctl enable DmServiceDMSERVER # 取消开机自启 systemctl disable DmServiceDMSERVER
第5步:验证安装
切换回damen用户,使用达梦命令行工具disql连接数据库。
bash
su - damen
/tools/server/dmdbms/bin/disql SYSDBA/Damen123456@localhost:5236
# 正常会进入sql编辑页面
然后将jdbc包复制一份到flink的lib目录下
针对这里是flink1.20和达梦8
cp /tools/server/dmdbms/drivers/jdbc/DmJdbcDriver8.jar /tools/server/flink/lib/
# 开放5236的端口
# 永久开放(生产环境使用)
firewall-cmd --add-port=5236/tcp --permanent
firewall-cmd --reload
配置 DBeaver 远程连接
第1步:获取 JDBC 驱动包
你需要从服务器获取达梦的JDBC驱动包(通常在/tools/server/dmdbms/drivers/jdbc下)。
第2步:在 DBeaver 中配置驱动
- 打开DBeaver,点击菜单栏的 数据库 -> 驱动管理器。
- 点击 新建,在弹出的窗口中填写:
| 配置项 | 填写内容 |
|---|---|
| 驱动名称 | 达梦 DM8 (或自定义) |
| 类名 | dm.jdbc.driver.DmDriver |
| URL 模板 | jdbc:dm://{host}:{port} |
| 默认端口 | 5236 |
- 切换到 库 选项卡,点击 添加文件 ,选择第一步获取到的驱动文件(如
DmJdbcDriver8.jar),然后点击 确定。
第3步:新建连接
- 点击DBeaver工具栏的 新建数据库连接 按钮,在弹出的窗口中找到并选中你刚创建的"达梦 DM8"驱动,点击 下一步。
- 填写连接信息:
- 主机:达梦数据库服务器的IP地址。
- 端口 :
5236 - 用户名/密码 :
SYSDBA/Damen123456
- 点击 测试连接 ,看到"连接成功"提示后,点击 完成 即可。
第4步:建表操作示例
-
连接成功后,在DBeaver左侧数据库导航器中展开到 模式 -> SYSDBA。
-
右键点击 表 ,选择 创建新表 ,在右侧界面填写表名和列信息,最后点击 保存。
-
你也可以打开 SQL编辑器,执行SQL语句,例如:
sql
-- 创建测试表 CREATE TABLE SYSDBA.EMPLOYEES ( ID INT PRIMARY KEY, NAME VARCHAR(50) NOT NULL, AGE INT ); -- 插入测试数据 INSERT INTO SYSDBA.EMPLOYEES (ID, NAME, AGE) VALUES (1, '张三', 30); -- 查询数据 SELECT * FROM SYSDBA.EMPLOYEES;
6.maven项目配置:
01.安装 Maven
# 下载 Maven 3.9.15
wget https://dlcdn.apache.org/maven/maven-3/3.9.15/binaries/apache-maven-3.9.15-bin.tar.gz
tar -xzf apache-maven-3.9.15-bin.tar.gz -C /tools/server/
# 配置环境变量
vim /etc/profile
export MAVEN_HOME=/tools/server/apache-maven-3.9.15
export PATH=$MAVEN_HOME/bin:$PATH
source /etc/profile
# 验证安装
mvn -version
02.自定义连接器
文件夹目录创建结构如下:
starrocks-flink-connector/
├── pom.xml
└── src/main/java/org/starrocks/flink/
├── StarRocksDynamicTableSinkFactory.java
├── StarRocksDynamicTableSink.java
├── StarRocksSinkFunction.java
└── StarRocksOptions.java
pom.xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.starrocks.flink</groupId>
<artifactId>starrocks-flink-sink</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<properties>
<flink.version>1.20.0</flink.version>
<scala.binary.version>2.12</scala.binary.version>
<mysql.connector.version>8.0.33</mysql.connector.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
<dependencies>
<!-- Flink Table API -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Flink Streaming -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- MySQL JDBC Driver (StarRocks兼容) -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.connector.version}</version>
</dependency>
<!-- 日志 (provided以避免冲突) -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.36</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<includes>
<include>mysql:mysql-connector-java</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<source>11</source>
<target>11</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
StarRocksDynamicTableSinkFactory.java
java
package org.starrocks.flink;
import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ConfigOptions;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.factories.DynamicTableSinkFactory;
import org.apache.flink.table.factories.FactoryUtil;
import org.apache.flink.table.types.DataType;
import org.apache.flink.table.types.logical.LogicalType;
import java.time.Duration;
import java.util.*;
public class StarRocksDynamicTableSinkFactory implements DynamicTableSinkFactory {
public static final String IDENTIFIER = "starrocks-partial";
@Override
public String factoryIdentifier() {
return IDENTIFIER;
}
@Override
public Set<ConfigOption<?>> requiredOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(ConfigOptions.key("url").stringType().noDefaultValue());
options.add(ConfigOptions.key("username").stringType().noDefaultValue());
options.add(ConfigOptions.key("password").stringType().noDefaultValue());
options.add(ConfigOptions.key("table-name").stringType().noDefaultValue());
options.add(ConfigOptions.key("primary-keys").stringType().noDefaultValue());
options.add(ConfigOptions.key("op-type-index").intType().noDefaultValue());
options.add(ConfigOptions.key("after-start-index").intType().noDefaultValue());
return options;
}
@Override
public Set<ConfigOption<?>> optionalOptions() {
Set<ConfigOption<?>> options = new HashSet<>();
options.add(ConfigOptions.key("sink.buffer-flush.max-rows").intType().defaultValue(100));
options.add(ConfigOptions.key("sink.buffer-flush.interval").stringType().defaultValue("5s"));
return options;
}
@Override
public DynamicTableSink createDynamicTableSink(Context context) {
FactoryUtil.TableFactoryHelper helper = FactoryUtil.createTableFactoryHelper(this, context);
helper.validate();
Map<String, String> options = context.getCatalogTable().getOptions();
String url = options.get("url");
String username = options.get("username");
String password = options.get("password");
String tableName = options.get("table-name");
String pkStr = options.get("primary-keys");
List<String> primaryKeys = Arrays.asList(pkStr.split(","));
int batchSize = Integer.parseInt(options.getOrDefault("sink.buffer-flush.max-rows", "100"));
int opTypeIndex = Integer.parseInt(options.get("op-type-index"));
int afterStartIndex = Integer.parseInt(options.get("after-start-index"));
long intervalMs = Duration.ofSeconds(5).toMillis();
List<String> columnNames = context.getCatalogTable().getResolvedSchema().getColumnNames();
String[] fieldNames = columnNames.toArray(new String[0]);
DataType physicalDataType = context.getCatalogTable().getResolvedSchema().toPhysicalRowDataType();
LogicalType[] fieldTypes = physicalDataType.getChildren().stream()
.map(DataType::getLogicalType).toArray(LogicalType[]::new);
return new StarRocksDynamicTableSink(
url, username, password, tableName, batchSize, intervalMs,
physicalDataType, primaryKeys, opTypeIndex, afterStartIndex,
fieldNames, fieldTypes);
}
}
StarRocksDynamicTableSink.java
java
package org.starrocks.flink;
import org.apache.flink.table.connector.ChangelogMode;
import org.apache.flink.table.connector.sink.DynamicTableSink;
import org.apache.flink.table.connector.sink.SinkFunctionProvider;
import org.apache.flink.table.types.DataType;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.flink.types.RowKind;
import java.util.List;
import java.util.Map;
public class StarRocksDynamicTableSink implements DynamicTableSink {
private final String url;
private final String username;
private final String password;
private final String tableName;
private final int batchSize;
private final long batchIntervalMs;
private final DataType physicalRowDataType;
private final List<String> primaryKeys;
private final int opTypeIndex;
private final String[] fieldNames;
private final LogicalType[] fieldTypes;
private final Map<String, String> opTypeMapping; // 新增:操作码映射
public StarRocksDynamicTableSink(
String url, String username, String password, String tableName,
int batchSize, long batchIntervalMs,
DataType physicalRowDataType,
List<String> primaryKeys,
int opTypeIndex,
String[] fieldNames, LogicalType[] fieldTypes,
Map<String, String> opTypeMapping) {
this.url = url;
this.username = username;
this.password = password;
this.tableName = tableName;
this.batchSize = batchSize;
this.batchIntervalMs = batchIntervalMs;
this.physicalRowDataType = physicalRowDataType;
this.primaryKeys = primaryKeys;
this.opTypeIndex = opTypeIndex;
this.fieldNames = fieldNames;
this.fieldTypes = fieldTypes;
this.opTypeMapping = opTypeMapping;
}
@Override
public ChangelogMode getChangelogMode(ChangelogMode requestedMode) {
return ChangelogMode.newBuilder()
.addContainedKind(RowKind.INSERT)
.addContainedKind(RowKind.UPDATE_AFTER)
.addContainedKind(RowKind.DELETE)
.build();
}
@Override
public SinkRuntimeProvider getSinkRuntimeProvider(Context context) {
// 主键在 fieldNames 中的索引
final int[] pkIndexes = primaryKeys.stream()
.mapToInt(pk -> {
for (int i = 0; i < fieldNames.length; i++) {
if (fieldNames[i].equalsIgnoreCase(pk)) return i;
}
throw new IllegalArgumentException("Primary key column not found: " + pk);
})
.toArray();
// 数据列索引:除了操作类型列之外的所有列
StarRocksSinkFunction sink = new StarRocksSinkFunction(
url, username, password, tableName, batchSize, batchIntervalMs,
fieldNames, fieldTypes, pkIndexes,
opTypeIndex, opTypeMapping);
return SinkFunctionProvider.of(sink);
}
@Override
public DynamicTableSink copy() {
return new StarRocksDynamicTableSink(url, username, password, tableName,
batchSize, batchIntervalMs,
physicalRowDataType, primaryKeys, opTypeIndex,
fieldNames, fieldTypes, opTypeMapping);
}
@Override
public String asSummaryString() {
return "StarRocks Sink";
}
}
StarRocksSinkFunction.java
java
package org.starrocks.flink;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.table.data.RowData;
import org.apache.flink.table.types.logical.LogicalType;
import org.apache.flink.table.types.logical.LogicalTypeRoot;
import java.sql.*;
import java.util.*;
import java.util.concurrent.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class StarRocksSinkFunction extends RichSinkFunction<RowData> {
private static final Logger LOG = LoggerFactory.getLogger(StarRocksSinkFunction.class);
private static final long serialVersionUID = 1L;
private final String url;
private final String username;
private final String password;
private final String tableName;
private final int batchSize;
private final long batchIntervalMs;
private final String[] fieldNames;
private final LogicalType[] fieldTypes;
private final int[] primaryKeyIndexes;
private final int opTypeIndex;
private final Map<String, String> opTypeMapping;
private transient Connection connection;
private transient LinkedBlockingQueue<RowData> buffer;
private transient ScheduledExecutorService scheduler;
private transient volatile boolean closed = false;
// 实际写入列(跳过操作类型列)
private transient List<String> realColumns;
private transient List<LogicalType> realTypes;
private transient List<Integer> realRowPositions;
private transient Set<Integer> pkRealIndexes;
public StarRocksSinkFunction(String url, String username, String password, String tableName,
int batchSize, long batchIntervalMs,
String[] fieldNames, LogicalType[] fieldTypes,
int[] primaryKeyIndexes,
int opTypeIndex, Map<String, String> opTypeMapping) {
this.url = url;
this.username = username;
this.password = password;
this.tableName = tableName;
this.batchSize = batchSize;
this.batchIntervalMs = batchIntervalMs;
this.fieldNames = fieldNames;
this.fieldTypes = fieldTypes;
this.primaryKeyIndexes = primaryKeyIndexes;
this.opTypeIndex = opTypeIndex;
this.opTypeMapping = opTypeMapping;
}
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
connection = DriverManager.getConnection(url, username, password);
connection.setAutoCommit(false);
buffer = new LinkedBlockingQueue<>();
// 跳过 OP_TYPE 列
realColumns = new ArrayList<>();
realTypes = new ArrayList<>();
realRowPositions = new ArrayList<>();
for (int i = 0; i < fieldNames.length; i++) {
if (i == opTypeIndex) continue; // 跳过操作类型列
realColumns.add(fieldNames[i]);
realTypes.add(fieldTypes[i]);
realRowPositions.add(i);
}
// 主键在 realColumns 中的位置
pkRealIndexes = new HashSet<>();
for (int pkIdx : primaryKeyIndexes) {
String pkName = fieldNames[pkIdx];
for (int r = 0; r < realColumns.size(); r++) {
if (realColumns.get(r).equalsIgnoreCase(pkName)) {
pkRealIndexes.add(r);
break;
}
}
}
scheduler = Executors.newSingleThreadScheduledExecutor();
scheduler.scheduleAtFixedRate(() -> {
if (!closed) flush();
}, batchIntervalMs, batchIntervalMs, TimeUnit.MILLISECONDS);
}
@Override
public void invoke(RowData row, Context context) throws Exception {
buffer.put(row);
if (buffer.size() >= batchSize) {
flush();
}
}
private void flush() {
if (buffer.isEmpty()) return;
List<RowData> batch = new ArrayList<>(buffer.size());
buffer.drainTo(batch);
if (batch.isEmpty()) return;
try {
for (RowData row : batch) {
String rawOp = row.getString(opTypeIndex).toString().trim();
String effectiveOp = resolveOperation(rawOp);
switch (effectiveOp) {
case "INSERT":
executeInsert(row);
break;
case "UPDATE":
executeUpdate(row);
break;
case "DELETE":
executeDelete(row);
break;
default:
LOG.warn("Unknown op_type value: '{}', skipping row", rawOp);
}
}
connection.commit();
} catch (Exception e) {
LOG.error("Batch flush failed, rolling back", e);
try {
connection.rollback();
} catch (SQLException ex) {
LOG.error("Rollback failed", ex);
}
throw new RuntimeException("Batch flush failed", e);
}
}
/**
* 将原始操作值映射为标准操作:INSERT/UPDATE/DELETE
* 优先使用 opTypeMapping,若匹配不上则直接使用 uppercase
*/
private String resolveOperation(String rawOp) {
// 先尝试映射
for (Map.Entry<String, String> entry : opTypeMapping.entrySet()) {
String key = entry.getKey().toUpperCase();
String expectedVal = entry.getValue().trim();
if (expectedVal.equalsIgnoreCase(rawOp)) {
return key; // INSERT/UPDATE/DELETE
}
}
// 未匹配到,直接返回 uppercase(兼容旧行为)
return rawOp.toUpperCase();
}
private void executeInsert(RowData row) throws SQLException {
StringBuilder sql = new StringBuilder("INSERT INTO ").append(tableName).append(" (");
StringBuilder values = new StringBuilder(" VALUES (");
for (int i = 0; i < realColumns.size(); i++) {
if (i > 0) {
sql.append(", ");
values.append(", ");
}
sql.append(realColumns.get(i));
values.append("?");
}
sql.append(")").append(values).append(")");
try (PreparedStatement ps = connection.prepareStatement(sql.toString())) {
for (int i = 0; i < realRowPositions.size(); i++) {
setValue(ps, i + 1, row, realRowPositions.get(i), realTypes.get(i));
}
ps.executeUpdate();
}
}
private void executeUpdate(RowData row) throws SQLException {
List<String> setClauses = new ArrayList<>();
List<Object> params = new ArrayList<>();
for (int i = 0; i < realColumns.size(); i++) {
if (pkRealIndexes.contains(i)) continue; // 跳过主键列
int rowPos = realRowPositions.get(i);
if (!row.isNullAt(rowPos)) {
setClauses.add(realColumns.get(i) + " = ?");
params.add(getValue(row, rowPos, realTypes.get(i)));
}
}
if (setClauses.isEmpty()) {
LOG.debug("UPDATE with no fields to set, skipping");
return;
}
StringBuilder sql = new StringBuilder("UPDATE ").append(tableName).append(" SET ");
sql.append(String.join(", ", setClauses));
sql.append(" WHERE ");
for (int j = 0; j < primaryKeyIndexes.length; j++) {
if (j > 0) sql.append(" AND ");
sql.append(fieldNames[primaryKeyIndexes[j]]).append(" = ?");
}
try (PreparedStatement ps = connection.prepareStatement(sql.toString())) {
int idx = 1;
for (Object param : params) {
ps.setObject(idx++, param);
}
for (int pkIdx : primaryKeyIndexes) {
setValue(ps, idx++, row, pkIdx, fieldTypes[pkIdx]);
}
ps.executeUpdate();
}
}
private void executeDelete(RowData row) throws SQLException {
StringBuilder sql = new StringBuilder("DELETE FROM ").append(tableName).append(" WHERE ");
for (int i = 0; i < primaryKeyIndexes.length; i++) {
if (i > 0) sql.append(" AND ");
sql.append(fieldNames[primaryKeyIndexes[i]]).append(" = ?");
}
try (PreparedStatement ps = connection.prepareStatement(sql.toString())) {
for (int i = 0; i < primaryKeyIndexes.length; i++) {
setValue(ps, i + 1, row, primaryKeyIndexes[i], fieldTypes[primaryKeyIndexes[i]]);
}
ps.executeUpdate();
}
}
private Object getValue(RowData row, int pos, LogicalType type) {
if (row.isNullAt(pos)) return null;
switch (type.getTypeRoot()) {
case BIGINT:
return row.getLong(pos);
case INTEGER:
return row.getInt(pos);
case VARCHAR:
return row.getString(pos).toString();
case TIMESTAMP_WITHOUT_TIME_ZONE:
case TIMESTAMP_WITH_LOCAL_TIME_ZONE:
return row.getTimestamp(pos, 3).toLocalDateTime();
default:
return row.getString(pos).toString();
}
}
private void setValue(PreparedStatement ps, int index, RowData row, int pos, LogicalType type) throws SQLException {
Object val = getValue(row, pos, type);
if (val == null) {
ps.setNull(index, Types.NULL);
} else if (type.getTypeRoot() == LogicalTypeRoot.BIGINT) {
ps.setLong(index, (Long) val);
} else if (type.getTypeRoot() == LogicalTypeRoot.INTEGER) {
ps.setInt(index, (Integer) val);
} else if (type.getTypeRoot() == LogicalTypeRoot.VARCHAR) {
ps.setString(index, (String) val);
} else {
ps.setObject(index, val);
}
}
@Override
public void close() throws Exception {
closed = true;
if (scheduler != null) {
scheduler.shutdown();
scheduler.awaitTermination(5, TimeUnit.SECONDS);
}
flush();
if (connection != null) {
connection.close();
}
super.close();
}
}
StarRocksOptions.java
java
package org.starrocks.flink;
import java.util.Map;
public class StarRocksOptions {
public static final String URL = "url";
public static final String USERNAME = "username";
public static final String PASSWORD = "password";
public static final String TABLE_NAME = "table-name";
public static final String MAX_RETRIES = "max-retries";
public static final String BATCH_SIZE = "sink.buffer-flush.max-rows";
public static final String BATCH_INTERVAL_MS = "sink.buffer-flush.interval";
public static final String OP_TYPE_INDEX = "op-type-index";
public static final String OP_TYPE_FIELD = "op-type-field";
public static final String OP_TYPE_MAPPING = "op-type-mapping";
public static String getRequired(Map<String, String> props, String key) {
String value = props.get(key);
if (value == null || value.isEmpty()) {
throw new IllegalArgumentException("Missing required option: " + key);
}
return value;
}
public static int getInt(Map<String, String> props, String key, int defaultValue) {
return Integer.parseInt(props.getOrDefault(key, String.valueOf(defaultValue)));
}
}
项目编译
注意,因为我编译的连接器是使用的jdk11,所以需要安装jdk11
下载jdk11
cd /tools/soft/
wget https://mirrors.tuna.tsinghua.edu.cn/Adoptium/11/jdk/x64/linux/OpenJDK11U-jdk_x64_linux_hotspot_11.0.31_11.tar.gz
解压jdk11
tar -xzf /tools/soft/OpenJDK11U-jdk_x64_linux_hotspot_11.0.31_11.tar.gz -C /tools/server/
项目编译
# 编译命令:
mvn clean package -DskipTests
# 将编译好的jar包放到flink的lib目录下
cp target/starrocks-flink-sink-1.0.0.jar $FLINK_HOME/lib/
/tools/server/flink/bin/stop-cluster.sh
pkill -f "StandaloneSessionClusterEntrypoint" 2>/dev/null
pkill -f "TaskManagerRunner" 2>/dev/null
sleep 3
/tools/server/flink/bin/start-cluster.sh
/tools/server/flink/bin/sql-client.sh -f /tools/server/flink/sql/test/flink_cloud_kafka_to_cloud_starrocks_test1.sql
注意with的配置如果字段不是op_type需要设置,当操作标识不是直接的INSERT,UPDATE,DELETE需要做映射,写法如下:
'op-type-field' = 'op_type', -- 告诉连接器字段名(如果第一列就叫 op_type 可省略)
'op-type-mapping' = 'INSERT:I,UPDATE:U,DELETE:D'
-- 根据自定义的连接器"starrocks-partial"写的flinksql
-- 1. Kafka 源表
CREATE TABLE kafka_cdc_source (
`table` STRING,
op_type STRING,
op_ts STRING,
current_ts STRING,
pos STRING,
primary_keys ARRAY<STRING>,
before MAP<STRING, STRING>,
after MAP<STRING, STRING>
) WITH (
'connector' = 'kafka',
'topic' = 'test_topic_mhww_pre',
'properties.bootstrap.servers' = '100.92.3.61:19092',
'properties.group.id' = 'flink_sr_sync_group',
'format' = 'json',
'json.fail-on-missing-field' = 'false',
'json.ignore-parse-errors' = 'true',
'scan.startup.mode' = 'earliest-offset'
);
-- 2. 目标表(自定义连接器)
CREATE TABLE sr_test_table1 (
op_type STRING, -- 虚拟列,仅用于传递操作类型
ID BIGINT,
TYPE STRING,
TITLE STRING,
SUBTITLE STRING,
CONTENT STRING,
CONTENT_TEMP STRING,
CREATE_TIME TIMESTAMP(3),
UPDATE_TIME TIMESTAMP(3),
TEXT_TYPE STRING
) WITH (
'connector' = 'starrocks-partial',
'url' = 'jdbc:mysql://100.92.3.61:9030/webdc_ods_test',
'table-name' = 'TEST_TABLE1',
'username' = 'webdc_ods_test_user',
'password' = 'User_4837_ffps',
'primary-keys' = 'ID', -- 指定主键字段
'op-type-index' = '0',
'sink.buffer-flush.max-rows' = '5000',
'sink.buffer-flush.interval' = '3s',
'op-type-field' = 'op_type', -- 告诉连接器字段名(如果第一列就叫 op_type 可省略)
'op-type-mapping' = 'INSERT:INSERT,UPDATE:UPDATE,DELETE:DELETE'
);
-- 3. INSERT SQL
INSERT INTO sr_test_table1
SELECT
op_type,
COALESCE(CAST(after['ID'] AS BIGINT), CAST(before['ID'] AS BIGINT)) AS ID, -- ★ 主键从 after 或 before 取
after['TYPE'] AS TYPE,
after['TITLE'] AS TITLE,
CASE WHEN after['SUBTITLE'] IS NULL THEN NULL ELSE CAST(after['SUBTITLE'] AS STRING) END AS SUBTITLE,
CASE WHEN after['CONTENT'] IS NULL THEN NULL ELSE CAST(after['CONTENT'] AS STRING) END AS CONTENT,
CASE WHEN after['CONTENT_TEMP'] IS NULL THEN NULL ELSE CAST(after['CONTENT_TEMP'] AS STRING) END AS CONTENT_TEMP,
CAST(after['CREATE_TIME'] AS TIMESTAMP(3)) AS CREATE_TIME,
CAST(after['UPDATE_TIME'] AS TIMESTAMP(3)) AS UPDATE_TIME,
CASE WHEN after['TEXT_TYPE'] IS NULL THEN NULL ELSE after['TEXT_TYPE'] END AS TEXT_TYPE
FROM kafka_cdc_source
WHERE `table` = 'MHWW_PRE.TEST_TABLE1'
and op_type IN ('INSERT', 'UPDATE', 'DELETE');
下面内容暂无更新20260426 13:33
5. 扩展连接器安装(PostgreSQL、达梦数据库)
5.1 PostgreSQL CDC 配置
Flink CDC(Change Data Capture)支持实时捕获 PostgreSQL 的数据变更。Flink CDC 3.6.0 版本已官方支持 Flink 1.20。
版本兼容性说明:
- Flink CDC 3.6.0 支持 Flink 1.20.x 和 2.2.x
- 支持 PostgreSQL 9.6、10、11、12、13、14、15、16
bash
cd /opt/flink/lib
# 1. 下载 Flink CDC 核心包
wget https://repo1.maven.org/maven2/org/apache/flink/flink-cdc-connect/3.6.0/flink-cdc-connect-3.6.0.jar
# 2. 下载 PostgreSQL CDC 连接器
wget https://repo1.maven.org/maven2/org/apache/flink/flink-connector-postgres-cdc/3.6.0/flink-connector-postgres-cdc-3.6.0.jar
# 3. PostgreSQL JDBC Driver(如未安装)
wget https://jdbc.postgresql.org/download/postgresql-42.7.3.jar
# 重启 Flink 使连接器生效
/opt/flink/bin/stop-cluster.sh
/opt/flink/bin/start-cluster.sh
PostgreSQL CDC Flink SQL 配置示例:
sql
-- 创建 PostgreSQL CDC 源表
CREATE TABLE pg_orders (
order_id INT,
customer_name STRING,
order_amount DECIMAL(10,2),
order_status STRING,
created_at TIMESTAMP(3)
) WITH (
'connector' = 'postgres-cdc',
'hostname' = 'localhost',
'port' = '5432',
'username' = 'flink_user',
'password' = 'your_password',
'database-name' = 'order_db',
'schema-name' = 'public',
'table-name' = 'orders',
'decoding.plugin.name' = 'pgoutput'
);
5.2 达梦数据库配置
达梦数据库是国产关系型数据库,可通过 Flink JDBC Connector 进行连接。
前置条件 :需要获取达梦数据库的 JDBC 驱动 JAR 包(通常为 DmJdbcDriver18.jar)。
bash
cd /opt/flink/lib
# 1. 将达梦 JDBC 驱动复制到 Flink lib 目录
# 假设驱动包位于 /opt/software/DmJdbcDriver18.jar
cp /opt/software/DmJdbcDriver18.jar /opt/flink/lib/
# 2. 如果尚未安装 JDBC Connector,请一并安装
# 参考 4.2 节中的 JDBC Connector 安装
# 3. 重启 Flink
/opt/flink/bin/stop-cluster.sh
/opt/flink/bin/start-cluster.sh
达梦数据库 Flink SQL 配置示例:
sql
-- 创建达梦数据库源表(通过 JDBC)
CREATE TABLE dm_user_info (
user_id INT,
user_name STRING,
phone STRING,
register_time TIMESTAMP(3)
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:dm://localhost:5236',
'table-name' = 'USER_INFO',
'username' = 'flink_user',
'password' = 'your_password',
'driver' = 'dm.jdbc.driver.DmDriver'
);
6. 完整同步流程实现(Kafka → Flink → StarRocks)
6.1 准备 StarRocks 目标表
sql
-- 登录 StarRocks
mysql -h 127.0.0.1 -P 9030 -uroot
-- 创建数据库和表
CREATE DATABASE IF NOT EXISTS test_db;
USE test_db;
-- 创建明细模型表(用于存放实时数据)
CREATE TABLE IF NOT EXISTS ods_user_behavior (
`user_id` INT NOT NULL COMMENT '用户ID',
`user_name` VARCHAR(100) NULL COMMENT '用户名',
`action_type` VARCHAR(50) NULL COMMENT '行为类型',
`action_time` DATETIME NULL COMMENT '行为时间',
`ext_info` JSON NULL COMMENT '扩展信息'
) ENGINE=OLAP
DUPLICATE KEY(`user_id`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"storage_format" = "DEFAULT"
);
-- 查看表结构
DESC ods_user_behavior;
6.2 编写 Flink SQL 作业
创建 Flink SQL 作业文件 /opt/flink_job_kafka_to_starrocks.sql:
sql
-- =====================================================
-- 作业名称: Kafka to StarRocks 实时同步
-- 描述: 从 Kafka 读取 JSON 数据,写入 StarRocks
-- =====================================================
-- 1. 设置流处理模式和 Checkpoint(Exactly-Once 语义保证)
SET 'execution.runtime-mode' = 'streaming';
SET 'execution.checkpointing.interval' = '10s';
SET 'execution.checkpointing.mode' = 'EXACTLY_ONCE';
SET 'execution.checkpointing.timeout' = '600s';
SET 'restart-strategy' = 'fixed-delay';
SET 'restart-strategy.fixed-delay.attempts' = '3';
SET 'restart-strategy.fixed-delay.delay' = '10s';
-- 2. 定义 Kafka 源表
CREATE TABLE kafka_user_behavior (
`user_id` INT,
`user_name` STRING,
`action_type` STRING,
`action_time` STRING,
`ext_info` STRING
) WITH (
'connector' = 'kafka',
'topic' = 'test_source',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id' = 'flink-sync-group',
'scan.startup.mode' = 'latest-offset',
'format' = 'json',
'json.fail-on-missing-field' = 'false',
'json.ignore-parse-errors' = 'true'
);
-- 3. 定义 StarRocks 结果表(Sink)
CREATE TABLE starrocks_user_behavior (
`user_id` INT,
`user_name` STRING,
`action_type` STRING,
`action_time` TIMESTAMP(3),
`ext_info` STRING
) WITH (
'connector' = 'starrocks',
'jdbc-url' = 'jdbc:mysql://127.0.0.1:9030',
'load-url' = '127.0.0.1:8030',
'database-name' = 'test_db',
'table-name' = 'ods_user_behavior',
'username' = 'root',
'password' = '',
'sink.semantic' = 'exactly-once',
'sink.buffer-flush.max-rows' = '1000',
'sink.buffer-flush.interval-ms' = '5000',
'sink.max-retries' = '3'
);
-- 4. 执行数据同步(带类型转换和清洗)
INSERT INTO starrocks_user_behavior
SELECT
user_id,
COALESCE(user_name, 'anonymous') AS user_name,
action_type,
CAST(action_time AS TIMESTAMP(3)) AS action_time,
ext_info
FROM kafka_user_behavior
WHERE user_id IS NOT NULL;
6.3 生成测试数据并验证
bash
# 1. 向 Kafka 生产测试数据
echo '{"user_id":1001,"user_name":"张三","action_type":"click","action_time":"2024-05-20 10:30:00","ext_info":"{\"page\":\"home\"}"}' | \
/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test_source
echo '{"user_id":1002,"user_name":"李四","action_type":"purchase","action_time":"2024-05-20 10:31:00","ext_info":"{\"amount\":299}"}' | \
/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test_source
# 2. 提交 Flink 作业
/opt/flink/bin/sql-client.sh -f /opt/flink_job_kafka_to_starrocks.sql
# 3. 验证 StarRocks 中的数据
mysql -h 127.0.0.1 -P 9030 -uroot -e "SELECT * FROM test_db.ods_user_behavior;"
7. DolphinScheduler 调度配置
7.1 配置 Flink 环境
在 DolphinScheduler Web UI 中:
-
登录
http://<虚拟机IP>:12345/dolphinscheduler/ui(账号admin/ 密码dolphinscheduler123) -
点击 安全中心 → 环境管理 → 创建环境
-
填写环境信息:
- 环境名称:
flink-1.20-env - 环境配置:
bash
export FLINK_HOME=/tools/server/flink export PATH=$FLINK_HOME/bin:$PATH export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk - 环境名称:
7.2 上传 SQL 资源
- 点击 资源中心 → 文件管理
- 创建文件夹
flink-jobs - 上传
/opt/flink_job_kafka_to_starrocks.sql
7.3 创建工作流
- 点击 项目 → 工作流定义 → 创建工作流
- 拖拽 Flink 节点到画布
- 配置节点:
- 节点名称 :
Kafka_to_StarRocks_Sync - Flink 版本 :
Flink >=1.16(选择此选项) - 部署方式:Local
- 程序类型:SQL
- 主函数/类:留空(SQL 模式)
- SQL 脚本 :选择上传的
flink_job_kafka_to_starrocks.sql - Flink 自定义参数 :
-D parallelism=1
- 节点名称 :
- 保存并上线工作流
- 点击 运行 测试作业
8. 踩坑预警与解决方案
| 问题现象 | 原因分析 | 解决方案 |
|---|---|---|
StarRocks BE 无法启动,报错 max_map_count |
Linux 虚拟内存映射区域不足 | sudo sysctl -w vm.max_map_count=2000000 |
| StarRocks BE 启动慢/失败 | 透明大页未关闭 | echo never > /sys/kernel/mm/transparent_hugepage/enabled |
| Flink 写入 StarRocks 报连接超时 | 网络端口未开放 | 检查 8030、9030、9060 端口是否可访问 |
Kafka Connector 报 ClassNotFoundException |
缺少 Kafka Connector JAR | 将 flink-sql-connector-kafka-1.20.0.jar 放入 /opt/flink/lib |
| PostgreSQL CDC 连接失败 | 数据库未开启逻辑复制 | 修改 postgresql.conf:wal_level=logical |
| 达梦数据库连接超时 | JDBC 驱动版本不匹配 | 使用达梦官方推荐的 JDBC 驱动版本 |
| Flink Web UI 无法访问 | rest.bind-address 配置为 localhost |
修改为 0.0.0.0 |
| DolphinScheduler 提交作业失败 | 用户权限不足 | 创建 Linux 用户 ds,授权 /opt/flink 目录 |
9. 验证检查清单
配置完成后,按以下清单逐项验证:
- Zookeeper :
jps显示QuorumPeerMain,zkServer.sh status显示standalone - Kafka :
jps显示Kafka,kafka-topics.sh --list显示 Topic 列表 - StarRocks :
SHOW BACKENDS中Alive为true - Flink :访问
http://<IP>:8081,能看到 TaskManager 正常注册 - Flink 版本 :
/opt/flink/bin/flink --version显示1.20.0 - 连接器验证 :
ls /opt/flink/lib/ | grep -E "starrocks|kafka|jdbc"有对应 JAR - DolphinScheduler :访问
http://<IP>:12345/dolphinscheduler/ui正常登录 - 端到端验证:生产 Kafka 测试数据,5 秒内在 StarRocks 中查到数据
10. 快速启动命令汇总
bash
# 启动 Zookeeper
/opt/zookeeper/bin/zkServer.sh start
# 启动 Kafka
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
# 启动 StarRocks
/opt/starrocks/fe/bin/start_fe.sh --daemon
/opt/starrocks/be/bin/start_be.sh --daemon
# 启动 Flink
/opt/flink/bin/start-cluster.sh
# 启动 DolphinScheduler
bash /opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start server
# 验证所有服务
jps
# 预期看到的进程:QuorumPeerMain, Kafka, StarRocksFE, StarRocksBE, StandaloneSessionClusterEntrypoint, TaskManagerRunner, ApiApplicationServer
附录:版本兼容性矩阵
| Flink 版本 | StarRocks Connector | Kafka Connector | Flink CDC | JDK 版本 |
|---|---|---|---|---|
| 1.20.0 | 1.2.14_flink-1.20 | 1.20.0 | 3.6.0 | Java 8/11 |
重要提示:
- 所有连接器 JAR 包必须放在
/opt/flink/lib目录并重启 Flink 才能生效 - 生产环境建议配置 HDFS 或 S3 作为 Checkpoint 存储
- 达梦数据库驱动需要从官方渠道获取,本文档不提供直接下载链接