DataSophon集成ApacheImpala的过程

注意: 本次安装操作系统环境为Anolis8.9(Centos7和Centos8应该也一样)

DataSophon版本为DDP-1.2.1

整合的安装包我放网盘了:

通过网盘分享的文件:impala-4.4.1.tar.gz等2个文件

链接: https://pan.baidu.com/s/18KfkO_BEFa5gVcc16I-Yew?pwd=za4k 提取码: za4k

  1. Apache Impala的版本我选择的是目前Github上的最新版本4.4.1

Github上 Impala提供了rpm和deb两种安装方式,由于编译源码涉及到python包以及C++包机器难下载,所以我选择了Github提供的rpm包(注意RPM包中没有包含shell目录 deb包中是有的,如果使用rpm包的话,需要手动将这个shell文件夹拷贝进来).

  1. 首先下载apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm

将rpm包移动到/opt目录下,然年执行yum安装

bash 复制代码
wget https://github.com/apache/impala/releases/download/4.4.1/apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm
mv apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm /opt
yum install -y apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm

记得安装好之后将shell文件夹拷贝进来啊

  1. 安装之后得到/opt/impala文件夹,重命名为/opt/impala-4.4.1
bash 复制代码
cd /opt
mv impala impala-4.4.1
  1. 修改conf/impala-env.sh脚本命令,修改以下三个的值
bash 复制代码
: ${JAVA_HOME:=/usr/local/jdk}

# Specify extra CLASSPATH.
: ${CLASSPATH:=${IMPALA_HOME}/conf/:${IMPALA_HOME}/lib/jars/*}

# Specify extra LD_LIBRARY_PATH.
: ${LD_LIBRARY_PATH:=${IMPALA_HOME}/lib/native/:${JAVA_HOME}/jre/lib/amd64/server/}
  1. 在conf下将Hadoop的core-site.xml hdfs-site.xml以及hive的core-site.xml拷贝进来

hive-site.xml内容我只放了这些

XML 复制代码
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    <property>
        <name>hive.metastore.port</name>
        <value>9083</value>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node01:9083</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.metastore.dml.events</name>
        <value>true</value>
    </property>
    <property>
	    <name>hive.metastore.transactional.event.listeners</name>
	    <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
    </property>

</configuration>
  1. 打包tar.gz,并拷贝到DDP/packages目录下以及生成md5文件
bash 复制代码
cd /opt
# 打包tar.gz
tar -zcvf impala-4.4.1.tar.gz impala-4.4.1
# 将tar.gz包拷贝到DDP/packages目录下
cp impapa-4.4.1.tar.gz /opt/datasophon/DDP/packages
cd /opt/datasophon/DDP/packages
# 生成md5文件
java -jar file-md5-1.0-SNAPSHOT-jar-with-dependencies.jar impala-4.4.1.tar.gz
  1. 编写IMPALA的service_ddl.json配置文件
bash 复制代码
cd /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/
mkdir IMPALA
cd IMPALA
# 创建json文件并写入内容
vi service_ddl.json
bash 复制代码
{
	"name": "IMPALA",
	"label": "Impala",
	"description": "MPP(大规模并行处理)SQL查询引擎",
	"version": "4.4.1",
	"sortNum": 22,
	"dependencies": ["HDFS", "HIVE"],
	"packageName": "impala-4.4.1.tar.gz",
	"decompressPackageName": "impala-4.4.1",
	"roles": [{
		"name": "StateStored",
		"label": "StateStored",
		"roleType": "master",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 1,
		"logFile": "/var/log/impala/statestored.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "statestored"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "statestored"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "statestored"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "statestored"]
		},
		"externalLink": {
			"name": "StateStored Ui",
			"label": "StateStored Ui",
			"url": "http://${host}:25010"
		}
	}, {
		"name": "Catalogd",
		"label": "Catalogd",
		"roleType": "master",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 2,
		"logFile": "/var/log/impala/catalogd.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "catalogd"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "catalogd"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "catalogd"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "catalogd"]
		},
		"externalLink": {
			"name": "Catalogd Ui",
			"label": "Catalogd Ui",
			"url": "http://${host}:25020"
		}
	}, {
		"name": "Impalad",
		"label": "Impalad",
		"roleType": "worker",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 3,
		"logFile": "/var/log/impala/impalad.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "impalad", "--enable_legacy_avx_support"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "impalad"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "impalad"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "impalad", "--enable_legacy_avx_support"]
		}
	}],
	"configWriter": {
		"generators": [{
			"filename": "statestored_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-log_dir", "-minidump_path", "custom.statestored_flags"]
		}, {
			"filename": "catalogd_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-state_store_host", "-log_dir", "-minidump_path", "custom.catalogd_flags"]
		}, {
			"filename": "impalad_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-state_store_host", "-catalog_service_host", "-log_dir", "-minidump_path", "-mem_limit", "custom.impalad_flags"]
		}]
	},
	"parameters": [{
		"name": "-hostname",
		"label": "impalad部署节点IP",
		"description": "impalad部署节点IP",
		"required": true,
		"type": "input",
		"value": "${host}",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-catalog_service_host",
		"label": "catalog_service_host部署节点IP",
		"description": "catalog_service_host部署节点IP",
		"required": true,
		"type": "input",
		"value": "node01",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-state_store_host",
		"label": "statestore部署节点IP",
		"description": "statestore部署节点IP",
		"required": true,
		"type": "input",
		"value": "node01",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-log_dir",
		"label": "log_dir日志路径",
		"description": "log_dir日志路径",
		"required": true,
		"type": "input",
		"value": "/var/log/impala",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "/var/log/impala"
	}, {
		"name": "-minidump_path",
		"label": "minidump_path路径",
		"description": "minidump_path路径",
		"required": true,
		"type": "input",
		"value": "/var/log/impala/minidumps",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "/var/log/impala/minidumps"
	}, {
		"name": "-mem_limit",
		"label": "mem_limit",
		"description": "mem_limit",
		"required": true,
		"type": "input",
		"value": "80%",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "80%"
	}, {
		"name": "custom.statestored_flags",
		"label": "自定义配置statestored_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "statestored"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}, {
		"name": "custom.catalogd_flags",
		"label": "自定义配置catalogd_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "catalogd"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}, {
		"name": "custom.impalad_flags",
		"label": "自定义配置impalad_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "impalad"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			},
			{
				"-scratch_dirs": "/data/impala/impalad"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}]
}
  1. 重启datasophon的api服务
bash 复制代码
cd /opt/datasophon/datasophon-manager-1.2.1/
启动:sh bin/datasophon-api.sh start api
停止:sh bin/datasophon-api.sh stop api
重启:sh bin/datasophon-api.sh restart api
  1. 回到网页中就可以进行服务的安装了(注意安装过程中修改catalogd以及satastored所在的服务器的ip或者host名称)

由于我的目录设置为/var/log/impala, impala并没有权限进行创建该目录,只能手动创建(这个问题我知道咋解决),这个步骤要在安装服务之前就做吧.

bash 复制代码
mkdir -p /var/log/impala/minidumps
chmod 777 /var/log/impala
chmod 777 var/log/impala/minidumps
  1. 另外记得修改下环境变量,,DDP的环境变量在/etc/profile.d/datasophon-env.sh
bash 复制代码
cat /etc/profile.d/datasophon-env.sh

# 内容如下
export JAVA_HOME=/usr/local/jdk1.8.0_333
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME CLASSPATH

export KYUUBI_HOME=/opt/datasophon/kyuubi-1.7.3
export SPARK_HOME=/opt/datasophon/spark-3.1.3
export PYSPARK_ALLOW_INSECURE_GATEWAY=1
export HIVE_HOME=/opt/datasophon/hive-3.1.0
export IMPALA_HOME=/opt/datasophon/impala-4.4.1
export KAFKA_HOME=/opt/datasophon/kafka-2.4.1
export HBASE_HOME=/opt/datasophon/hbase-2.4.16
export FLINK_HOME=/opt/datasophon/flink-1.17.1
export HADOOP_HOME=/opt/datasophon/hadoop-3.3.3
export HADOOP_CONF_DIR=/opt/datasophon/hadoop-3.3.3/etc/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$IMPALA_HOME/bin:$IMPALA_HOME/sbin:$IMPALA_HOME/shell:$FLINK_HOME/bin:$KAFKA_HOME/bin:$HBASE_HOME/bin
export HADOOP_CLASSPATH=`hadoop classpath`

如果有读者想自己打包安装可以参考以下博客

编译部署apache-impala | 子崖说

Datasophon集成impala | 子崖说

https://zhuanlan.zhihu.com/p/348344999

相关推荐
数造科技1 分钟前
紧随“可信数据空间”政策风潮,数造科技正式加入开放数据空间联盟
大数据·人工智能·科技·安全·敏捷开发
逸Y 仙X3 小时前
Git常见命令--助力开发
java·大数据·git·java-ee·github·idea
caihuayuan44 小时前
PHP建立MySQL持久化连接(长连接)及mysql与mysqli扩展的区别
java·大数据·sql·spring
B站计算机毕业设计超人4 小时前
计算机毕业设计Hadoop+Spark+DeepSeek-R1大模型民宿推荐系统 hive民宿可视化 民宿爬虫 大数据毕业设计(源码+LW文档+PPT+讲解)
大数据·hadoop·爬虫·机器学习·课程设计·数据可视化·推荐算法
(; ̄ェ ̄)。5 小时前
在nodejs中使用ElasticSearch(二)核心概念,应用
大数据·elasticsearch·搜索引擎
一个儒雅随和的男子5 小时前
Elasticsearch除了用作查找以外,还能可以做什么?
大数据·elasticsearch·搜索引擎
Sui_Network6 小时前
Sui 如何支持各种类型的 Web3 游戏
大数据·数据库·人工智能·游戏·web3·区块链
ZKNOW甄知科技6 小时前
IT服务运营管理体系的常用方法论与实践指南(上)
大数据·数据库·人工智能
车到山前必有“陆”7 小时前
智能硬件解决方案
大数据·人工智能·经验分享·科技·产品运营·智能硬件
车到山前必有“陆”7 小时前
智能硬件-01智能停车场
大数据·人工智能·经验分享·科技·产品运营·智能硬件