kyuubi整合flink yarn application model

目录

概述

flink 版本 1.17.1、kyuubi 1.8.0、hive 3.1.3、paimon 0.5

整合过程中,需要注意对应的版本。

注意以上版本

姊妹篇 kyuubi yarn session model 整合链接在此

配置

kyuubi flink yarn application mode 官网文档

bash 复制代码
#jobManager 的 IP 地址
jobmanager.rpc.address: localhost

#jobManager 的端口,默认为 6123
jobmanager.rpc.port: 6123

#jobManager 的 JVM heap 大小,生产环境4G起步
jobmanager.heap.size: 1600m

#taskManager 的 jvm heap 大小设置,低于 1024M 不能启动
taskmanager.memory.process.size: 8094m
taskmanager.memory.managed.size: 64m

#taskManager 中 taskSlots 个数,最好设置成 work 节点的 CPU 个数相等
taskmanager.numberOfTaskSlots: 2

#taskmanager 是否启动时管理所有的内存
taskmanager.memory.preallocate: false

#并行计算数
parallelism.default: 2

#控制类加载策略,可选项有 child-first(默认)和 parent-first
classloader.resolve-order: parent-first
classloader.check-leaked-classloader: false

state.backend.incremental: true
state.backend: rocksdb
execution.checkpointing.interval: 300000
state.checkpoints.dir: hdfs://ks2p-hadoop01:9000/dinky-ckps
state.savepoints.dir: hdfs://ks2p-hadoop01:9000/dinky-savepoints
heartbeat.timeout: 180000
akka.ask.timeout: 60s
web.timeout: 5000
execution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATION

kyuubi 配置

kyuubi-defaults.conf

此处配置引擎类型, flink 的模式,这两个重要的。

xml 复制代码
kyuubi.engine.type                       FLINK_SQL
flink.execution.target                   yarn-application
kyuubi.ha.namespace                      kyuubi

kyuubi-env.sh

没有采用 hadoop 集群默认的配置,javaflink 使用的都是定制的版本。

xml 复制代码
export JAVA_HOME=/data/jdk-11.0.20
export FLINK_HOME=/data/soft/flink-1.17.1
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/data/hadoop/etc/hadoop
export FLINK_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.2.4.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.2.4.jar

hive

生产上 paimoncatlog 信息是 hive 存储的。

bash 复制代码
CREATE CATALOG paimon_hive WITH (
    'type' = 'paimon',
    'metastore' = 'hive',
    'uri' = 'thrift://10.xx.xx.22:9083',
    'warehouse' = 'hdfs:///data/hive/warehouse/paimon',
    'default-database'='test'
);

USE CATALOG paimon_hive;

验证

**注意:**下面启动相应的组件,进行相关的验证。

启动kyuubi


验证一下正常启动如下:

bash 复制代码
[root@ksxx-hadoop06 apache-kyuubi-1.8.0-bin]# netstat -nlp | grep :10009
tcp        0      0 10.xx.xx.22:10009       0.0.0.0:*               LISTEN      218311/java 

beeline 连接

bash 复制代码
[root@ks2p-hadoop06 apache-kyuubi-1.8.0-bin]# bin/beeline -u 'jdbc:hive2://10.xx.xx.22:10009/' -n tableau
Connecting to jdbc:hive2://10.xx.xx.22:10009/
2023-12-06 10:55:48.247 INFO KyuubiSessionManager-exec-pool: Thread-138 org.apache.kyuubi.operation.LaunchEngine: Processing tableau's query[6bab2d9e-c7f5-4438-bcd7-8f1e2fd98020]: PENDING_STATE -> RUNNING_STATE, statement:
LaunchEngine
2023-12-06 10:55:48.279 WARN KyuubiSessionManager-exec-pool: Thread-138 org.apache.kyuubi.shaded.curator.utils.ZKPaths: The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.
2023-12-06 10:55:48.304 INFO KyuubiSessionManager-exec-pool: Thread-138 org.apache.kyuubi.engine.ProcBuilder: Creating tableau's working directory at /data/soft/apache-kyuubi-1.8.0-bin/work/tableau
2023-12-06 10:55:48.317 INFO KyuubiSessionManager-exec-pool: Thread-138 org.apache.kyuubi.engine.EngineRef: Launching engine:
/data/soft/flink-1.17.1/bin/flink run-application -t yarn-application -Dyarn.ship-files=/data/soft/flink-1.17.1/opt/flink-sql-client-1.17.1.jar;/data/soft/flink-1.17.1/opt/flink-sql-gateway-1.17.1.jar -Dyarn.application.name=kyuubi_USER_FLINK_SQL_tableau_default_e29cfc98-f864-4bb9-a430-2d3eceeeac24 -Dyarn.tags=KYUUBI,e29cfc98-f864-4bb9-a430-2d3eceeeac24 -Dcontainerized.master.env.FLINK_CONF_DIR=. -Dexecution.target=yarn-application -c org.apache.kyuubi.engine.flink.FlinkSQLEngine /data/soft/apache-kyuubi-1.8.0-bin/externals/engines/flink/kyuubi-flink-sql-engine_2.12-1.8.0.jar \
        --conf kyuubi.session.user=tableau \
        --conf kyuubi.client.ipAddress=10.xx.xx.22 \
        --conf kyuubi.client.version=1.8.0 \
        --conf kyuubi.engine.submit.time=1701831348298 \
        --conf kyuubi.engine.type=FLINK_SQL \
        --conf kyuubi.ha.addresses=10.xx.xx.22:2181 \
        --conf kyuubi.ha.engine.ref.id=e29cfc98-f864-4bb9-a430-2d3eceeeac24 \
        --conf kyuubi.ha.namespace=/kyuubi_1.8.0_USER_FLINK_SQL/tableau/default \
        --conf kyuubi.ha.zookeeper.auth.type=NONE \
        --conf kyuubi.server.ipAddress=10.xx.xx.22 \
        --conf kyuubi.session.connection.url=ks2p-hadoop06:10009 \
        --conf kyuubi.session.real.user=tableau
2023-12-06 10:55:48.321 INFO KyuubiSessionManager-exec-pool: Thread-138 org.apache.kyuubi.engine.ProcBuilder: Logging to /data/soft/apache-kyuubi-1.8.0-bin/work/tableau/kyuubi-flink-sql-engine.log.0

2023-12-06 10:55:59,647 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - YARN application has been deployed successfully.
2023-12-06 10:55:59,648 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface ks2p-hadoop06:1655 of application 'application_1694575187997_0427'.
Connected to: Apache Flink (version 1.17.1)
Driver: Kyuubi Project Hive JDBC Client (version 1.8.0)
Beeline version 1.8.0 by Apache Kyuubi
0: jdbc:hive2://10.xx.xx.22:10009/> 

使用hive catalog

sql测试

重要的步骤:

设置 flink 为批模式

bash 复制代码
SET execution.runtime-mode=batch;


业务稍大的数据处理:

结束

kyuubi整合flink yarn application model 至此结束,如有问题,欢迎评论区留言。

相关推荐
青云交24 分钟前
Java 大视界 -- Java 大数据在智能电网电力市场交易数据分析与策略制定中的关键作用(162)
java·大数据·数据分析·交易策略·智能电网·java 大数据·电力市场交易
宝哥大数据42 分钟前
Flink内存模型--flink1.19.1
大数据·flink
一个天蝎座 白勺 程序猿1 小时前
大数据(4.5)Hive聚合函数深度解析:从基础统计到多维聚合的12个生产级技巧
大数据·hive·hadoop
爱编程的王小美1 小时前
用户行为分析系统开发文档
大数据
白雪讲堂2 小时前
AI搜索品牌曝光资料包(精准适配文心一言/Kimi/DeepSeek等场景)
大数据·人工智能·搜索引擎·ai·文心一言·deepseek
浩浩kids3 小时前
Hadoop•踩过的SHIT
大数据·hadoop·分布式
qr9j422333 小时前
elasticsearch 如果按照日期进行筛选
大数据·elasticsearch·jenkins
DavidSoCool4 小时前
es分页边界数据重复问题处理
大数据·elasticsearch·搜索引擎
路由侠内网穿透4 小时前
本地部署开源流处理框架 Apache Flink 并实现外部访问
大数据·网络协议·tcp/ip·flink·服务发现·apache·consul
qq_5470261795 小时前
Elasticsearch 正排索引
大数据·elasticsearch·jenkins