下载构建源码
这个项目实现了汉化和开箱即用,感谢大佬
GitHub - lutinglt/superset-zh: Superset 汉化, Superset 中文版
替换国内apt源
查看debian版本,不同版本替换apt源的内容不同
bash
cat /etc/debian_version
我这里是11.9版本
-
apt源文件sources.list位置在/etc/apt/sources.list下
-
阿里源官网:https://developer.aliyun.com/mirror/debian?spm=a2c6h.13651102.0.0.509b1b11NQaUy0
这里是debian 11.x版本apt源的汇总
# ===============================清华源===================================
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free
# 以下安全更新软件源包含了官方源与镜像站配置,如有需要可自行修改注释切换
deb https://security.debian.org/debian-security bullseye-security main contrib non-free
# deb-src https://security.debian.org/debian-security bullseye-security main contrib non-free
# ===============================阿里源===================================
deb https://mirrors.aliyun.com/debian/ bullseye main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye main non-free contrib
deb https://mirrors.aliyun.com/debian-security/ bullseye-security main
deb-src https://mirrors.aliyun.com/debian-security/ bullseye-security main
deb https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye-updates main non-free contrib
deb https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
deb-src https://mirrors.aliyun.com/debian/ bullseye-backports main non-free contrib
# ===============================原版===================================
# deb http://snapshot.debian.org/archive/debian/20240423T150000Z bullseye main
deb http://deb.debian.org/debian bullseye main
# deb http://snapshot.debian.org/archive/debian-security/20240423T150000Z bullseye-security main
deb http://deb.debian.org/debian-security bullseye-security main
# deb http://snapshot.debian.org/archive/debian/20240423T150000Z bullseye-updates main
deb http://deb.debian.org/debian bullseye-updates main
在Dockerfile所在目录下新建一个sources.list文件,选择清华源或阿里源复制过去
修改Dockerfile
修改如下:
基础镜像替换为apache/superset:3.1.3-py39
替换apt国内源
添加sasl相关依赖
添加kinit、klis等命令
添加vim、ping、telnet命令
添加pyhive[hive]、sasl等依赖库
添加ez_setup、setuptools等工具
dockerfile
# 构建翻译文件(请勿替换FROM)
FROM python:3.12.6-slim-bookworm AS builder
COPY . /app
WORKDIR /app
RUN pip install --no-cache-dir --upgrade pip &&\
pip install --no-cache-dir -r requirements.txt &&\
python generate_locales.py && python generate_messages.py
# 将翻译导入镜像(此处替换所需的官方版本)
FROM apache/superset:3.1.3-py39
COPY --from=builder /app/messages.json /app/superset/translations/zh/LC_MESSAGES/messages.json
COPY --from=builder /app/target/messages.mo /app/superset/translations/zh/LC_MESSAGES/messages.mo
# 更新apt源
COPY --from=builder /app/sources.list /etc/apt/sources.list
USER root
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime &&\
export DEBIAN_FRONTEND=noninteractive &&\
apt-get -y clean &&\
apt-get update &&\
apt-get -y install --no-install-recommends --no-install-suggests wget pkg-config gcc &&\
apt-get -y install build-essential python3-dev libsasl2-dev libssl-dev &&\
apt-get update &&\
apt-get -y install libsasl2-2 libsasl2-modules libsasl2-modules-gssapi-mit &&\
apt-get -y install libsasl2-modules-gssapi-heimdal libsasl2-modules-ldap &&\
apt-get -y install krb5-user &&\
apt-get -y install vim &&\
apt-get -y install iputils-ping telnet &&\
apt-get -y clean && rm -rf /var/lib/apt/lists/* &&\
apt-get update &&\
# 安装数据库驱动
pip install psycopg2 mysqlclient &&\
pip install pyhive[hive] &&\
pip install sasl &&\
pip install setuptools==47.3.1 &&\
pip install ez_setup &&\
# 开箱即用配置, 配置安全密钥和关闭安全验证
sed -i "s/SECRET_KEY =.*/SECRET_KEY = \"superset\"/" /app/superset/config.py &&\
sed -i "s/WTF_CSRF_ENABLED = True/WTF_CSRF_ENABLED = False/" /app/superset/config.py &&\
sed -i "s/TALISMAN_ENABLED =.*/TALISMAN_ENABLED = False/" /app/superset/config.py &&\
# 默认语言
sed -i "s/BABEL_DEFAULT_LOCALE = \"en\"/BABEL_DEFAULT_LOCALE = \"zh\"/" /app/superset/config.py &&\
# 打开语言切换
sed -i "s/LANGUAGES = {}/LANGUAGES = {\"zh\": {\"flag\": \"cn\", \"name\": \"简体中文\"}, \"en\": {\"flag\": \"us\", \"name\": \"English\"}}/" /app/superset/config.py
USER superset
个人纪录:
bash
# 开源hive
pip install pyhive[hive]
# 其实只安装pyhive[hive]就够了,下面是备选
pip install thrift
pip install thrift-sasl
pip install sqlalchemy
构建镜像
bash
# 在Dockerfile所在目录执行命令
# 直接build会读取不到镜像,需要先把用到的镜像pull下来
docker pull python:3.12.6-slim-bookworm
docker pull apache/superset:3.1.3-py39
docker pull postgres:15
docker build -t mujinye/superset-zh:3.1.3-py39 .
导入导出镜像
如果涉及到切换机器,需要将镜像导入导出
bash
docker save -o <file_name.tar> <new_image_name>:<tag>
docker save -o superset_zh.tar mujinye/superset-zh:3.1.3-py39 postgres:15
docker load -i <file_name.tar>
docker load -i superset_zh.tar
编写docker-compose.yml
找个地方创建目录superset_zh_3.1.3-py39
在目录中创建docker-compose.yml
记得确认superset使用的镜像是上面刚刚构建的
yaml
services:
db:
image: postgres:15
container_name: superset_postgres
hostname: postgres
environment:
POSTGRES_USER: superset
POSTGRES_PASSWORD: superset
POSTGRES_DB: superset
ports:
- 35432:5432
restart: unless-stopped
volumes:
- ./postgres/data:/var/lib/postgresql/data
superset:
image: mujinye/superset-zh:3.1.3-py39
container_name: superset
hostname: superset
user: root
restart: unless-stopped
ports:
- 38088:8088
volumes:
- ./superset_config.py:/app/pythonpath/superset_config.py
- ./pyhive:/app/pyhive
在docker-compose.yml指定的路径中创建superset_config.py文件
python
SECRET_KEY = 'superset'
SQLALCHEMY_DATABASE_URI = 'postgresql://superset:superset@postgres:5432/superset'
WTF_CSRF_ENABLED = False
TALISMAN_ENABLED = False
构建pyhive
下载mrs最新版本的源码包:https://github.com/huaweicloud/huaweicloud-mrs-example/tree/mrs-3.5.0
找到hive-examples模块下的python3-examples模块,解压出来
进入根目录执行
python
python setup.py sdist bdist_wheel
在dist目录下会生成构建好的tar.gz包和whl包
把这两个包复制到docker-compose.yml中指定的pyhive路径下
docker compose构建容器
执行docker compose up -d构建容器,然后执行三条初始化命令
bash
# 构建镜像
docker compose up -d
# 镜像删除和重启
docker compose down
docker compose restart
# 执行初始化命令
docker exec -it superset superset fab create-admin --username admin --firstname admin --lastname admin --email admin@superset.apache.org --password admin
docker exec -it superset superset db upgrade
docker exec -it superset superset init
# 进入容器
docker exec -it superset /bin/bash
替换pyhive
bash
# 进入容器
docker exec -it superset /bin/bash
# 可以看到目前pyhive版本是0.7.0
pip list
# 卸载pyhive
pip uninstall PyHive
# 安装由华为构建的pyhive,这里要找到构建镜像时挂载过来的pyhive包
pip install pyhive0.6.2xxx.tar.gz
# 安装后需要重启才能在superset中生效
# exit推出容器
exit
# 找到superset容器id
docker ps
# 重启superset容器
docker restart 容器id
keytab认证
pyhive鉴权是os级别,需要在容器内进行kerberos认证
步骤:
- 在容器外,下载keytab文件,包含:user.keytab和krb5.conf两个文件
- 将两个keytab文件放到挂载的pyhive目录下,因为之前挂载了该目录,所以文件可以直接映射到容器内
- 进入容器内
- 把krb5.conf放到/etc下
- 执行kinit -kt user.keytab user命令进行认证
验证
华为源码python3-examples下有一个pyCLI_sec.py文件,修改这个文件中的参数配置,把这个文件放到容器内,然后执行python3 pyCLI_sec.py。测试一下连接通不通。
写给自己看的:(此处也可以在自己本地环境上测试,看会不会报错,如果报错就看看是不是缺什么依赖没下,就在外网环境提前下好,如果没报错只是显示连接超时就代表没有问题)
连接hive数据库
对于hive数据库:测试连接会成功,创建时会报错,不用管,其实已经创建好了,直接返回就好了
bash
hive://hive@{hostname}:{port}/{database}
# 开源hive可以直接用这个连接串
hive://hive@hadoop102:10000/default
# mrs需要添加额外的参数,根据具体情况修改
hive://hive@hadoop102:10000/default?auth=KERBEROS&kerberos_service_name=hive&krbhost=hadoop.dlk.cc.cmbc.com.cn