DataSophon集成ApacheImpala的过程

注意: 本次安装操作系统环境为Anolis8.9(Centos7和Centos8应该也一样)

DataSophon版本为DDP-1.2.1

整合的安装包我放网盘了:

通过网盘分享的文件:impala-4.4.1.tar.gz等2个文件

链接: https://pan.baidu.com/s/18KfkO_BEFa5gVcc16I-Yew?pwd=za4k 提取码: za4k

  1. Apache Impala的版本我选择的是目前Github上的最新版本4.4.1

Github上 Impala提供了rpm和deb两种安装方式,由于编译源码涉及到python包以及C++包机器难下载,所以我选择了Github提供的rpm包(注意RPM包中没有包含shell目录 deb包中是有的,如果使用rpm包的话,需要手动将这个shell文件夹拷贝进来).

  1. 首先下载apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm

将rpm包移动到/opt目录下,然年执行yum安装

bash 复制代码
wget https://github.com/apache/impala/releases/download/4.4.1/apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm
mv apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm /opt
yum install -y apache-impala-4.4.1-RELEASE_hive-3.1.3-x86_64.el8.8.rpm

记得安装好之后将shell文件夹拷贝进来啊

  1. 安装之后得到/opt/impala文件夹,重命名为/opt/impala-4.4.1
bash 复制代码
cd /opt
mv impala impala-4.4.1
  1. 修改conf/impala-env.sh脚本命令,修改以下三个的值
bash 复制代码
: ${JAVA_HOME:=/usr/local/jdk}

# Specify extra CLASSPATH.
: ${CLASSPATH:=${IMPALA_HOME}/conf/:${IMPALA_HOME}/lib/jars/*}

# Specify extra LD_LIBRARY_PATH.
: ${LD_LIBRARY_PATH:=${IMPALA_HOME}/lib/native/:${JAVA_HOME}/jre/lib/amd64/server/}
  1. 在conf下将Hadoop的core-site.xml hdfs-site.xml以及hive的core-site.xml拷贝进来

hive-site.xml内容我只放了这些

XML 复制代码
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    <property>
        <name>hive.metastore.port</name>
        <value>9083</value>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node01:9083</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.metastore.dml.events</name>
        <value>true</value>
    </property>
    <property>
	    <name>hive.metastore.transactional.event.listeners</name>
	    <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
    </property>

</configuration>
  1. 打包tar.gz,并拷贝到DDP/packages目录下以及生成md5文件
bash 复制代码
cd /opt
# 打包tar.gz
tar -zcvf impala-4.4.1.tar.gz impala-4.4.1
# 将tar.gz包拷贝到DDP/packages目录下
cp impapa-4.4.1.tar.gz /opt/datasophon/DDP/packages
cd /opt/datasophon/DDP/packages
# 生成md5文件
java -jar file-md5-1.0-SNAPSHOT-jar-with-dependencies.jar impala-4.4.1.tar.gz
  1. 编写IMPALA的service_ddl.json配置文件
bash 复制代码
cd /opt/datasophon/datasophon-manager-1.2.1/conf/meta/DDP-1.2.1/
mkdir IMPALA
cd IMPALA
# 创建json文件并写入内容
vi service_ddl.json
bash 复制代码
{
	"name": "IMPALA",
	"label": "Impala",
	"description": "MPP(大规模并行处理)SQL查询引擎",
	"version": "4.4.1",
	"sortNum": 22,
	"dependencies": ["HDFS", "HIVE"],
	"packageName": "impala-4.4.1.tar.gz",
	"decompressPackageName": "impala-4.4.1",
	"roles": [{
		"name": "StateStored",
		"label": "StateStored",
		"roleType": "master",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 1,
		"logFile": "/var/log/impala/statestored.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "statestored"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "statestored"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "statestored"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "statestored"]
		},
		"externalLink": {
			"name": "StateStored Ui",
			"label": "StateStored Ui",
			"url": "http://${host}:25010"
		}
	}, {
		"name": "Catalogd",
		"label": "Catalogd",
		"roleType": "master",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 2,
		"logFile": "/var/log/impala/catalogd.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "catalogd"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "catalogd"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "catalogd"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "catalogd"]
		},
		"externalLink": {
			"name": "Catalogd Ui",
			"label": "Catalogd Ui",
			"url": "http://${host}:25020"
		}
	}, {
		"name": "Impalad",
		"label": "Impalad",
		"roleType": "worker",
		"runAs": {
			"user": "impala",
			"group": "hadoop"
		},
		"cardinality": "1+",
		"sortNum": 3,
		"logFile": "/var/log/impala/impalad.INFO",
		"jmxPort": 2191,
		"startRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["start", "impalad", "--enable_legacy_avx_support"]
		},
		"stopRunner": {
			"timeout": "600",
			"program": "bin/impala.sh",
			"args": ["stop", "impalad"]
		},
		"statusRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["status", "impalad"]
		},
		"restartRunner": {
			"timeout": "60",
			"program": "bin/impala.sh",
			"args": ["restart", "impalad", "--enable_legacy_avx_support"]
		}
	}],
	"configWriter": {
		"generators": [{
			"filename": "statestored_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-log_dir", "-minidump_path", "custom.statestored_flags"]
		}, {
			"filename": "catalogd_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-state_store_host", "-log_dir", "-minidump_path", "custom.catalogd_flags"]
		}, {
			"filename": "impalad_flags",
			"configFormat": "properties",
			"outputDirectory": "conf",
			"includeParams": ["-hostname", "-state_store_host", "-catalog_service_host", "-log_dir", "-minidump_path", "-mem_limit", "custom.impalad_flags"]
		}]
	},
	"parameters": [{
		"name": "-hostname",
		"label": "impalad部署节点IP",
		"description": "impalad部署节点IP",
		"required": true,
		"type": "input",
		"value": "${host}",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-catalog_service_host",
		"label": "catalog_service_host部署节点IP",
		"description": "catalog_service_host部署节点IP",
		"required": true,
		"type": "input",
		"value": "node01",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-state_store_host",
		"label": "statestore部署节点IP",
		"description": "statestore部署节点IP",
		"required": true,
		"type": "input",
		"value": "node01",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "${host}"
	}, {
		"name": "-log_dir",
		"label": "log_dir日志路径",
		"description": "log_dir日志路径",
		"required": true,
		"type": "input",
		"value": "/var/log/impala",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "/var/log/impala"
	}, {
		"name": "-minidump_path",
		"label": "minidump_path路径",
		"description": "minidump_path路径",
		"required": true,
		"type": "input",
		"value": "/var/log/impala/minidumps",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "/var/log/impala/minidumps"
	}, {
		"name": "-mem_limit",
		"label": "mem_limit",
		"description": "mem_limit",
		"required": true,
		"type": "input",
		"value": "80%",
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": "80%"
	}, {
		"name": "custom.statestored_flags",
		"label": "自定义配置statestored_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "statestored"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}, {
		"name": "custom.catalogd_flags",
		"label": "自定义配置catalogd_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "catalogd"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}, {
		"name": "custom.impalad_flags",
		"label": "自定义配置impalad_flags",
		"description": "自定义配置",
		"configType": "custom",
		"required": true,
		"type": "multipleWithKey",
		"value": [{
				"-v": "1"
			},
			{
				"-log_filename": "impalad"
			},
			{
				"-max_log_files": "10"
			},
			{
				"-max_log_size": "200"
			},
			{
				"-scratch_dirs": "/data/impala/impalad"
			}
		],
		"configurableInWizard": true,
		"hidden": false,
		"defaultValue": ""
	}]
}
  1. 重启datasophon的api服务
bash 复制代码
cd /opt/datasophon/datasophon-manager-1.2.1/
启动:sh bin/datasophon-api.sh start api
停止:sh bin/datasophon-api.sh stop api
重启:sh bin/datasophon-api.sh restart api
  1. 回到网页中就可以进行服务的安装了(注意安装过程中修改catalogd以及satastored所在的服务器的ip或者host名称)

由于我的目录设置为/var/log/impala, impala并没有权限进行创建该目录,只能手动创建(这个问题我知道咋解决),这个步骤要在安装服务之前就做吧.

bash 复制代码
mkdir -p /var/log/impala/minidumps
chmod 777 /var/log/impala
chmod 777 var/log/impala/minidumps
  1. 另外记得修改下环境变量,,DDP的环境变量在/etc/profile.d/datasophon-env.sh
bash 复制代码
cat /etc/profile.d/datasophon-env.sh

# 内容如下
export JAVA_HOME=/usr/local/jdk1.8.0_333
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME CLASSPATH

export KYUUBI_HOME=/opt/datasophon/kyuubi-1.7.3
export SPARK_HOME=/opt/datasophon/spark-3.1.3
export PYSPARK_ALLOW_INSECURE_GATEWAY=1
export HIVE_HOME=/opt/datasophon/hive-3.1.0
export IMPALA_HOME=/opt/datasophon/impala-4.4.1
export KAFKA_HOME=/opt/datasophon/kafka-2.4.1
export HBASE_HOME=/opt/datasophon/hbase-2.4.16
export FLINK_HOME=/opt/datasophon/flink-1.17.1
export HADOOP_HOME=/opt/datasophon/hadoop-3.3.3
export HADOOP_CONF_DIR=/opt/datasophon/hadoop-3.3.3/etc/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$IMPALA_HOME/bin:$IMPALA_HOME/sbin:$IMPALA_HOME/shell:$FLINK_HOME/bin:$KAFKA_HOME/bin:$HBASE_HOME/bin
export HADOOP_CLASSPATH=`hadoop classpath`

如果有读者想自己打包安装可以参考以下博客

编译部署apache-impala | 子崖说

Datasophon集成impala | 子崖说

https://zhuanlan.zhihu.com/p/348344999

相关推荐
Java 第一深情2 小时前
零基础入门Flink,掌握基本使用方法
大数据·flink·实时计算
MXsoft6182 小时前
华为服务器(iBMC)硬件监控指标解读
大数据·运维·数据库
PersistJiao3 小时前
Spark 分布式计算中网络传输和序列化的关系(二)
大数据·网络·spark·序列化·分布式计算
九河云3 小时前
如何对AWS进行节省
大数据·云计算·aws
FreeIPCC3 小时前
谈一下开源生态对 AI人工智能大模型的促进作用
大数据·人工智能·机器人·开源
梦幻通灵4 小时前
ES分词环境实战
大数据·elasticsearch·搜索引擎
Elastic 中国社区官方博客4 小时前
Elasticsearch 中的热点以及如何使用 AutoOps 解决它们
大数据·运维·elasticsearch·搜索引擎·全文检索
天冬忘忧4 小时前
Kafka 工作流程解析:从 Broker 工作原理、节点的服役、退役、副本的生成到数据存储与读写优化
大数据·分布式·kafka
sevevty-seven5 小时前
幻读是什么?用什么隔离级别可以防止幻读
大数据·sql
Yz98767 小时前
hive复杂数据类型Array & Map & Struct & 炸裂函数explode
大数据·数据库·数据仓库·hive·hadoop·数据库开发·big data