Seatunnel系列之:部署Seatunnel
- 一、步骤一:准备环境
- 二、步骤二:下载SeaTunnel
- 三、步骤三:安装连接器插件
- 四、quick-start-seatunnel-engine
-
- 1.添加作业配置文件来定义作业
- [2.运行 SeaTunnel 应用程序](#2.运行 SeaTunnel 应用程序)
- 五、quick-start-flink
-
- 1.部署和配置Flink
- 2.添加作业配置文件来定义作业
- [3.运行 SeaTunnel 应用程序](#3.运行 SeaTunnel 应用程序)
- 六、quick-start-spark
-
- 1.Spark的部署和配置
- 2.添加作业配置文件来定义作业
- [3.运行 SeaTunnel 应用程序](#3.运行 SeaTunnel 应用程序)
一、步骤一:准备环境
在开始本地运行之前,您需要确保您已经安装了 SeaTunnel 所需的以下软件:
- 安装Java(Java 8或11,高于Java 8的其他版本理论上也可以工作)并设置JAVA_HOME。
二、步骤二:下载SeaTunnel
进入seatunnel下载页面,下载最新版本的分发包seatunnel--bin.tar.gz
或者您可以通过终端下载
bash
export version="2.3.4"
wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
三、步骤三:安装连接器插件
从2.2.0-beta开始,二进制包默认不提供连接器依赖,所以第一次使用时,需要执行以下命令安装连接器:(当然也可以手动下载连接器从 Apache Maven 存储库下载,然后手动移动到连接器/seatunnel 目录)。
bash
sh bin/install-plugin.sh 2.3.4
如果需要指定连接器的版本,以2.3.4为例,需要执行
bash
sh bin/install-plugin.sh 2.3.4
通常你不需要所有的connector插件,所以你可以通过配置config/plugin_config来指定你需要的插件,例如你只需要connector-console插件,那么你可以修改plugin.properties为
bash
--seatunnel-connectors--
connector-console
--end--
如果您想让示例应用程序正常工作,您需要添加以下插件
bash
--seatunnel-connectors--
connector-fake
connector-console
--end--
您可以在下面找到所有支持的连接器和相应的plugin_config配置名称
${SEATUNNEL_HOME}/connectors/plugins-mapping.properties.
注意:
如果您想通过手动下载连接器来安装连接器插件,则需要特别注意以下事项
Connectors目录包含以下子目录,如果不存在,需要手动创建
bash
flink
flink-sql
seatunnel
spark
如果您想手动安装V2连接器插件,只需下载您需要的V2连接器插件并将其放在seatunnel目录中
四、quick-start-seatunnel-engine
1.添加作业配置文件来定义作业
编辑config/v2.batch.config.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。
bash
env {
parallelism = 1
job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
source_table_name = "fake"
result_table_name = "fake1"
field_mapper = {
age = age
name = new_name
}
}
}
sink {
Console {
source_table_name = "fake1"
}
}
2.运行 SeaTunnel 应用程序
您可以通过以下命令启动应用程序
bash
cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。
SeaTunnel 控制台将打印一些日志,如下所示:
bash
2022-12-19 11:01:45,417 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - output rowType: name<STRING>, age<INT>
2022-12-19 11:01:46,489 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: CpiOd, 8520946
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: eQqTs, 1256802974
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: UsRgO, 2053193072
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=4: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: jDQJj, 1993016602
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=5: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: rqdKp, 1392682764
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=6: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: wCoWN, 986999925
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=7: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: qomTU, 72775247
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=8: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: jcqXR, 1074529204
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=9: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: AkWIO, 1961723427
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=10: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: hBoib, 929089763
2022-12-19 11:01:46,490 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=11: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: GSvzm, 827085798
2022-12-19 11:01:46,491 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=12: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: NNAYI, 94307133
2022-12-19 11:01:46,491 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=13: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: EexFl, 1823689599
2022-12-19 11:01:46,491 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=14: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: CBXUb, 869582787
2022-12-19 11:01:46,491 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=15: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: Wbxtm, 1469371353
2022-12-19 11:01:46,491 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=16: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: mIJDt, 995616438
五、quick-start-flink
1.部署和配置Flink
请先下载Flink(要求版本>=1.12.0)。
配置 SeaTunnel:更改 config/seatunnel-env.sh 中的设置,它基于引擎在部署时安装的路径。将 FLINK_HOME 更改为 Flink 部署目录。
2.添加作业配置文件来定义作业
编辑config/v2.streaming.conf.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。
bash
env {
parallelism = 1
job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
source_table_name = "fake"
result_table_name = "fake1"
field_mapper = {
age = age
name = new_name
}
}
}
sink {
Console {
source_table_name = "fake1"
}
}
3.运行 SeaTunnel 应用程序
您可以通过以下命令启动应用程序
flink 版本在 1.12.x 和 1.14.x 之间
bash
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-flink-13-connector-v2.sh --config ./config/v2.streaming.conf.template
flink 版本在 1.15.x 和 1.16.x 之间
bash
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-flink-15-connector-v2.sh --config ./config/v2.streaming.conf.template
查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。
SeaTunnel 控制台将打印一些日志,如下所示:
bash
fields : name, age
types : STRING, INT
row=1 : elWaB, 1984352560
row=2 : uAtnp, 762961563
row=3 : TQEIB, 2042675010
row=4 : DcFjo, 593971283
row=5 : SenEb, 2099913608
row=6 : DHjkg, 1928005856
row=7 : eScCM, 526029657
row=8 : sgOeE, 600878991
row=9 : gwdvw, 1951126920
row=10 : nSiKE, 488708928
row=11 : xubpl, 1420202810
row=12 : rHZqb, 331185742
row=13 : rciGD, 1112878259
row=14 : qLhdI, 1457046294
row=15 : ZTkRx, 1240668386
row=16 : SGZCr, 94186144
六、quick-start-spark
1.Spark的部署和配置
请先下载Spark(要求版本>=2.4.0)。
配置 SeaTunnel:更改 config/seatunnel-env.sh 中的设置,它基于引擎在部署时安装的路径。将 SPARK_HOME 更改为 Spark 部署目录。
2.添加作业配置文件来定义作业
编辑config/seatunnel.streaming.conf.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。
bash
env {
parallelism = 1
job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
source_table_name = "fake"
result_table_name = "fake1"
field_mapper = {
age = age
name = new_name
}
}
}
sink {
Console {
source_table_name = "fake1"
}
}
3.运行 SeaTunnel 应用程序
您可以通过以下命令启动应用程序
spark 2.4.x
bash
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-spark-2-connector-v2.sh \
--master local[4] \
--deploy-mode client \
--config ./config/v2.streaming.conf.template
spark3.x.x
bash
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-spark-3-connector-v2.sh \
--master local[4] \
--deploy-mode client \
--config ./config/v2.streaming.conf.template
查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。
SeaTunnel 控制台将打印一些日志,如下所示:
bash
fields : name, age
types : STRING, INT
row=1 : elWaB, 1984352560
row=2 : uAtnp, 762961563
row=3 : TQEIB, 2042675010
row=4 : DcFjo, 593971283
row=5 : SenEb, 2099913608
row=6 : DHjkg, 1928005856
row=7 : eScCM, 526029657
row=8 : sgOeE, 600878991
row=9 : gwdvw, 1951126920
row=10 : nSiKE, 488708928
row=11 : xubpl, 1420202810
row=12 : rHZqb, 331185742
row=13 : rciGD, 1112878259
row=14 : qLhdI, 1457046294
row=15 : ZTkRx, 1240668386
row=16 : SGZCr, 94186144