Seatunnel系列之:部署Seatunnel

Seatunnel系列之:部署Seatunnel

一、步骤一:准备环境

在开始本地运行之前,您需要确保您已经安装了 SeaTunnel 所需的以下软件:

  • 安装Java(Java 8或11,高于Java 8的其他版本理论上也可以工作)并设置JAVA_HOME。

二、步骤二:下载SeaTunnel

进入seatunnel下载页面,下载最新版本的分发包seatunnel--bin.tar.gz

或者您可以通过终端下载

bash 复制代码
export version="2.3.4"
wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"

三、步骤三:安装连接器插件

从2.2.0-beta开始,二进制包默认不提供连接器依赖,所以第一次使用时,需要执行以下命令安装连接器:(当然也可以手动下载连接器从 Apache Maven 存储库下载,然后手动移动到连接器/seatunnel 目录)。

bash 复制代码
sh bin/install-plugin.sh 2.3.4

如果需要指定连接器的版本,以2.3.4为例,需要执行

bash 复制代码
sh bin/install-plugin.sh 2.3.4

通常你不需要所有的connector插件,所以你可以通过配置config/plugin_config来指定你需要的插件,例如你只需要connector-console插件,那么你可以修改plugin.properties为

bash 复制代码
--seatunnel-connectors--
connector-console
--end--

如果您想让示例应用程序正常工作,您需要添加以下插件

bash 复制代码
--seatunnel-connectors--
connector-fake
connector-console
--end--

您可以在下面找到所有支持的连接器和相应的plugin_config配置名称

${SEATUNNEL_HOME}/connectors/plugins-mapping.properties.

注意:

如果您想通过手动下载连接器来安装连接器插件,则需要特别注意以下事项

Connectors目录包含以下子目录,如果不存在,需要手动创建

bash 复制代码
flink
flink-sql
seatunnel
spark

如果您想手动安装V2连接器插件,只需下载您需要的V2连接器插件并将其放在seatunnel目录中

四、quick-start-seatunnel-engine

1.添加作业配置文件来定义作业

编辑config/v2.batch.config.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。

bash 复制代码
env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  FieldMapper {
    source_table_name = "fake"
    result_table_name = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    source_table_name = "fake1"
  }
}

2.运行 SeaTunnel 应用程序

您可以通过以下命令启动应用程序

bash 复制代码
cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local

查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。

SeaTunnel 控制台将打印一些日志,如下所示:

bash 复制代码
2022-12-19 11:01:45,417 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - output rowType: name<STRING>, age<INT>
2022-12-19 11:01:46,489 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=1:  SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: CpiOd, 8520946
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: eQqTs, 1256802974
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: UsRgO, 2053193072
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=4: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: jDQJj, 1993016602
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=5: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: rqdKp, 1392682764
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=6: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: wCoWN, 986999925
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=7: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: qomTU, 72775247
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=8: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: jcqXR, 1074529204
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=9: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: AkWIO, 1961723427
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=10: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: hBoib, 929089763
2022-12-19 11:01:46,490 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=11: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: GSvzm, 827085798
2022-12-19 11:01:46,491 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=12: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: NNAYI, 94307133
2022-12-19 11:01:46,491 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=13: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: EexFl, 1823689599
2022-12-19 11:01:46,491 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=14: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: CBXUb, 869582787
2022-12-19 11:01:46,491 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=15: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: Wbxtm, 1469371353
2022-12-19 11:01:46,491 INFO  org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=0 rowIndex=16: SeaTunnelRow#tableId=-1 SeaTunnelRow#kind=INSERT: mIJDt, 995616438

请先下载Flink(要求版本>=1.12.0)。

配置 SeaTunnel:更改 config/seatunnel-env.sh 中的设置,它基于引擎在部署时安装的路径。将 FLINK_HOME 更改为 Flink 部署目录。

2.添加作业配置文件来定义作业

编辑config/v2.streaming.conf.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。

bash 复制代码
env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  FieldMapper {
    source_table_name = "fake"
    result_table_name = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    source_table_name = "fake1"
  }
}

3.运行 SeaTunnel 应用程序

您可以通过以下命令启动应用程序

flink 版本在 1.12.x 和 1.14.x 之间

bash 复制代码
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-flink-13-connector-v2.sh --config ./config/v2.streaming.conf.template

flink 版本在 1.15.x 和 1.16.x 之间

bash 复制代码
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-flink-15-connector-v2.sh --config ./config/v2.streaming.conf.template

查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。

SeaTunnel 控制台将打印一些日志,如下所示:

bash 复制代码
fields : name, age
types : STRING, INT
row=1 : elWaB, 1984352560
row=2 : uAtnp, 762961563
row=3 : TQEIB, 2042675010
row=4 : DcFjo, 593971283
row=5 : SenEb, 2099913608
row=6 : DHjkg, 1928005856
row=7 : eScCM, 526029657
row=8 : sgOeE, 600878991
row=9 : gwdvw, 1951126920
row=10 : nSiKE, 488708928
row=11 : xubpl, 1420202810
row=12 : rHZqb, 331185742
row=13 : rciGD, 1112878259
row=14 : qLhdI, 1457046294
row=15 : ZTkRx, 1240668386
row=16 : SGZCr, 94186144

六、quick-start-spark

1.Spark的部署和配置

请先下载Spark(要求版本>=2.4.0)。

配置 SeaTunnel:更改 config/seatunnel-env.sh 中的设置,它基于引擎在部署时安装的路径。将 SPARK_HOME 更改为 Spark 部署目录。

2.添加作业配置文件来定义作业

编辑config/seatunnel.streaming.conf.template,决定了seatunnel启动后数据输入、处理、输出的方式和逻辑。以下是配置文件的示例,与上面提到的示例应用程序相同。

bash 复制代码
env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  FieldMapper {
    source_table_name = "fake"
    result_table_name = "fake1"
    field_mapper = {
      age = age
      name = new_name
    }
  }
}

sink {
  Console {
    source_table_name = "fake1"
  }
}

3.运行 SeaTunnel 应用程序

您可以通过以下命令启动应用程序

spark 2.4.x

bash 复制代码
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-spark-2-connector-v2.sh \
--master local[4] \
--deploy-mode client \
--config ./config/v2.streaming.conf.template

spark3.x.x

bash 复制代码
cd "apache-seatunnel-${version}"
./bin/start-seatunnel-spark-3-connector-v2.sh \
--master local[4] \
--deploy-mode client \
--config ./config/v2.streaming.conf.template

查看输出:运行命令时,您可以在控制台中看到其输出。您可以认为这是命令运行成功与否的标志。

SeaTunnel 控制台将打印一些日志,如下所示:

bash 复制代码
fields : name, age
types : STRING, INT
row=1 : elWaB, 1984352560
row=2 : uAtnp, 762961563
row=3 : TQEIB, 2042675010
row=4 : DcFjo, 593971283
row=5 : SenEb, 2099913608
row=6 : DHjkg, 1928005856
row=7 : eScCM, 526029657
row=8 : sgOeE, 600878991
row=9 : gwdvw, 1951126920
row=10 : nSiKE, 488708928
row=11 : xubpl, 1420202810
row=12 : rHZqb, 331185742
row=13 : rciGD, 1112878259
row=14 : qLhdI, 1457046294
row=15 : ZTkRx, 1240668386
row=16 : SGZCr, 94186144