数据可视化与分析平台之Superset

Superset

概述

Apache Superset是一个现代的数据探索和可视化平台。它功能强大且十分易用,可对接各种数据源,包括很多现代的大数据分析引擎,拥有丰富的图表展示形式,并且支持自定义仪表盘。

官网:https://superset.apache.org/

文档:https://superset.apache.org/docs/intro

GitHub:https://github.com/apache/superset

安装Python环境

Superset是由Python语言编写的Web应用,要求Python3.7以上的环境。
通常Linux服务器是有安装Python环境的,Python版本为2.X。又因为系统很多服务功能需要Python2.X,且Python2与Python3不兼容,故需要安装Python3的环境。

注意:

如果误删除、更新python2将出现不可预料的后果,解决方法参考:误删自带python2或yum异常导致yum命令不可用的解决方法

这里使用CondaPython虚拟环境管理,具体使用参考:Anaconda Conda的安装配置与Python虚拟环境管理

python 复制代码
conda create -n superset
python 复制代码
[root@master ~]# conda activate superset
(superset) [root@master ~]#

创建superset环境

python 复制代码
conda create --name superset python=3.10.9

激活环境,并查看Python版本

python 复制代码
[root@node01 ~]# conda activate superset
(superset) [root@node01 ~]# python -V
Python 3.10.9

Superset部署

安装依赖

安装Superset之前,需安装以下所需依赖

bash 复制代码
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel python-setuptools openssl-devel cyrus-sasl-devel openldap-devel

安装Superset

可能需要升级 pip、upgrade才能使安装正常工作

bash 复制代码
pip install --upgrade pip -i https://pypi.douban.com/simple/

pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/

安装Supetset

bash 复制代码
pip install apache-superset -i https://pypi.douban.com/simple/

更换镜像安装Supetset

bash 复制代码
pip install apache-superset --trusted-host https://repo.huaweicloud.com -i https://repo.huaweicloud.com/repository/pypi/simple

指定版本安装

bash 复制代码
pip install apache-superset==2.1.0 -i https://pypi.douban.com/simple/

安装遇到异常:

python 复制代码
  ERROR: HTTP error 404 while getting https://repo.huaweicloud.com/repository/pypi/packages/18/b9/cb8d519ea0094b9b8fe7480225c14937517729f8ec927643dc7379904f64/celery-5.3.1-py3-none-any.whl.metadata
ERROR: 404 Client Error: Not Found for url: https://repo.huaweicloud.com/repository/pypi/packages/18/b9/cb8d519ea0094b9b8fe7480225c14937517729f8ec927643dc7379904f64/celery-5.3.1-py3-none-any.whl.metadata

使用清华大学的镜像源进行安装:

python 复制代码
pip install apache-superset==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

配置Superset元数据库

Superset的元数据支持MySQL、PostgreSQL,此处采用MySQL。

创建superset元数据库

bash 复制代码
CREATE DATABASE superset DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

创建superset用户

bash 复制代码
create user superset@'%' identified WITH mysql_native_password BY 'superset';

grant all privileges on *.* to superset@'%' with grant option;

flush privileges;

修改superset配置文件

bash 复制代码
vim  /usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/config.py

开启行号

python 复制代码
: set nu

找到大概197行

python 复制代码
 197 # The SQLAlchemy connection string.
 198 SQLALCHEMY_DATABASE_URI = "sqlite:///" + os.path.join(DATA_DIR, "superset.db")
 199 # SQLALCHEMY_DATABASE_URI = 'mysql://myapp@localhost/myapp'
 200 # SQLALCHEMY_DATABASE_URI = 'postgresql://root:password@localhost/myapp'

配置

bash 复制代码
SQLALCHEMY_DATABASE_URI = 'mysql://superset:superset@node01:3306/superset?charset=utf8'

安装python msyql驱动

bash 复制代码
conda install mysqlclient

初始化superset元数据

bash 复制代码
export FLASK_APP=superset

superset db upgrade

可能出现如下异常:

python 复制代码
(superset) [root@master ~]# superset db upgrade
Traceback (most recent call last):
  File "/usr/local/program/miniconda3/envs/superset/bin/superset", line 5, in <module>
    from superset.cli.main import superset
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/__init__.py", line 21, in <module>
    from superset.app import create_app
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/app.py", line 23, in <module>
    from superset.initialization import SupersetAppInitializer
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/initialization/__init__.py", line 33, in <module>
    from superset.extensions import (
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/extensions/__init__.py", line 32, in <module>
    from superset.utils.async_query_manager import AsyncQueryManager
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/utils/async_query_manager.py", line 26, in <module>
    from superset.utils.core import get_user_id
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/utils/core.py", line 106, in <module>
    from superset.sql_parse import sanitize_clause
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/sql_parse.py", line 67, in <module>
    re.compile(r"'(''|\\\\|\\|[^'])*'", sqlparse.keywords.FLAGS).match,
AttributeError: module 'sqlparse.keywords' has no attribute 'FLAGS'

原因:

superset使用的sqlparse库版本不兼容导致的。SQLParse库中的FLAGS属性在较新的版本中已被移除,而superset所依赖的版本可能需要使用这个属性。

解决方案:

升级superset:尝试升级到superset的最新版本,可能已经修复了这个问题。

降级sqlparse库:尝试降低sqlparse库的版本,使用一个兼容的版本。可以使用以下命令安装一个特定版本的sqlparse库

修改superset代码

这里选择降低sqlparse库版本

python 复制代码
(superset) [root@master ~]# conda list | grep sqlparse
sqlparse                  0.4.4                    pypi_0    pypi


(superset) [root@master ~]# pip install sqlparse==0.4.1


(superset) [root@master ~]# conda list | grep sqlparse
sqlparse                  0.4.1                    pypi_0    pypi

再次执行初始化操作,异常消失,但是出现警告。

这个警告是关于Superset中的SECRET_KEY的默认设置。SECRET_KEY用于加密数据和计算哈希值,以增加应用程序的安全性。默认情况下,Superset使用一个默认的SECRET_KEY,但这是不安全的,因为它在公开的代码仓库中公开,可能会被恶意使用。

python 复制代码
(superset) [root@master ~]# superset db upgrade
--------------------------------------------------------------------------------
                                    WARNING
--------------------------------------------------------------------------------
A Default SECRET_KEY was detected, please use superset_config.py to override it.
Use a strong complex alphanumeric string and use a tool to help you generate
a sufficiently random sequence, ex: openssl rand -base64 42
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Refusing to start due to insecure SECRET_KEY

为了解决这个警告,应该使用一个强大且随机的SECRET_KEY来覆盖默认设置,以增加应用程序的安全性。

进入Superset的安装目录

python 复制代码
cd /usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset

使用工具来帮助生成一个足够随机的序列

python 复制代码
(superset) [root@master lib]# openssl rand -base64 42
m9y2X0JSOhZBPafQE8JVJtqtzESXXIeFg8opUOLom04k7EucpYCEb4Ts

修改superset配置文件

python 复制代码
vim  /usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/config.py

配置SECRET_KEY

python 复制代码
 191 # Your App secret key. Make sure you override it on superset_config.py
 192 # or use `SUPERSET_SECRET_KEY` environment variable.
 193 # Use a strong complex alphanumeric string and use a tool to help you generate
 194 # a sufficiently random sequence, ex: openssl rand -base64 42"
 195 #SECRET_KEY = os.environ.get("SUPERSET_SECRET_KEY") or CHANGE_ME_SECRET_KEY
 196 SECRET_KEY ='m9y2X0JSOhZBPafQE8JVJtqtzESXXIeFg8opUOLom04k7EucpYCEb4Ts'

再次执行初始化遇到如下异常:

python 复制代码
Traceback (most recent call last):
  File "/usr/local/program/miniconda3/envs/superset/bin/superset", line 8, in <module>
    sys.exit(superset())
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 567, in main
    return super().main(*args, **kwargs)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 406, in decorator
    with __ctx.ensure_object(ScriptInfo).load_app().app_context():
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 369, in load_app
    app = locate_app(import_name, name)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 231, in locate_app
    return find_best_app(module)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 57, in find_best_app
    app = app_factory()
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/app.py", line 44, in create_app
    raise ex
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/app.py", line 37, in create_app
    app_initializer.init_app()
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/initialization/__init__.py", line 493, in init_app
    self.init_app_in_ctx()
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/initialization/__init__.py", line 425, in init_app_in_ctx
    self.configure_data_sources()
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/initialization/__init__.py", line 519, in configure_data_sources
    __import__(module_name, fromlist=class_names)
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/connectors/sqla/__init__.py", line 17, in <module>
    from . import models, views
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/connectors/sqla/views.py", line 32, in <module>
    from superset.connectors.base.views import DatasourceModelView
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/connectors/base/views.py", line 24, in <module>
    from superset.views.base import SupersetModelView
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/views/__init__.py", line 17, in <module>
    from . import (
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/views/access_requests.py", line 24, in <module>
    from superset.views.base import DeleteMixin, SupersetModelView
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/views/base.py", line 67, in <module>
    from superset.db_engine_specs.gsheets import GSheetsEngineSpec
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/db_engine_specs/gsheets.py", line 33, in <module>
    from superset.databases.schemas import encrypted_field_properties, EncryptedString
  File "/usr/local/program/miniconda3/envs/superset/lib/python3.10/site-packages/superset/databases/schemas.py", line 28, in <module>
    from marshmallow_enum import EnumField
ModuleNotFoundError: No module named 'marshmallow_enum'

解决方案:安装marshmallow_enum

python 复制代码
pip install marshmallow_enum
python 复制代码
(superset) [root@master ~]# superset db upgrade
logging was configured successfully
2023-08-23 10:43:23,481:INFO:superset.utils.logging_configurator:logging was configured successfully
2023-08-23 10:43:23,494:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>

INFO  [alembic.runtime.migration] Running upgrade a39867932713 -> 409c7b420ab0, add created_by_fk as owner
INFO  [alembic.runtime.migration] Running upgrade 409c7b420ab0 -> ffa79af61a56, rename report_schedule.extra to extra_json
INFO  [alembic.runtime.migration] Running upgrade ffa79af61a56 -> 6d3c6f9d665d, fix_table_chart_conditional_formatting_colors
INFO  [alembic.runtime.migration] Running upgrade 6d3c6f9d665d -> 291f024254b5, drop_column_allow_multi_schema_metadata_fetch
INFO  [alembic.runtime.migration] Running upgrade 291f024254b5 -> deb4c9d4a4ef, parameters in saved queries
INFO  [alembic.runtime.migration] Running upgrade deb4c9d4a4ef -> 4ce1d9b25135, remove_filter_bar_orientation
INFO  [alembic.runtime.migration] Running upgrade 4ce1d9b25135 -> f3c2d8ec8595, create_ssh_tunnel_credentials_tbl

初始化成功,查看数据库,发现生成了相关的表。

python 复制代码
mysql> use superset
Database changed
mysql> show tables;
+----------------------------+
| Tables_in_superset         |
+----------------------------+
| ab_permission              |
| ab_permission_view         |
| ab_permission_view_role    |
| ab_register_user           |
| ab_role                    |
| ab_user                    |
| ab_user_role               |
| ab_view_menu               |
| access_request             |
| alembic_version            |
| alert_logs                 |
| alert_owner                |
| alerts                     |
| annotation                 |

SupersetSet初始化

创建管理员用户

bash 复制代码
superset fab create-admin
python 复制代码
  for prop in class_mapper(obj).iterate_properties:
Username [admin]: // 回车,使用默认用户admin,用于登陆管理页面的管理用户
User first name [admin]: // 回车
User last name [user]: // 回车
Email [admin@fab.org]: // 回车
Password: // 设置密码,用于登陆管理页面的管理用户密码
Repeat for confirmation: // 确认密码
Recognized Database Authentications. 
Admin User admin created.

初始化superset

bash 复制代码
superset init

启动Supterset

安装gunicorn,它是一个Python Web Server,可以和java中的TomCat类比

bash 复制代码
pip install gunicorn -i https://pypi.douban.com/simple/

启动Superset

bash 复制代码
gunicorn --workers 5 --timeout 120 --bind node01:8787  "superset.app:create_app()" --daemon 
bash 复制代码
--workers:指定进程个数

--timeout:worker进程超时时间,超时会自动重启

--bind:绑定本机地址,即为Superset访问地址

--daemon:后台运行

登录Superset

访问http://IP:8787进行登录,使用创建管理员的账号密码

停止gunicorn进程

bash 复制代码
ps -ef | awk '/superset/ && !/awk/{print $2}' | xargs kill -9

退出superset环境

bash 复制代码
conda deactivate

superset启停脚本

创建vim superset.sh文件

bash 复制代码
#!/bin/bash

superset_status(){
    result=`ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | wc -l`
    if [[ $result -eq 0 ]]; then
        return 0
    else
        return 1
    fi
}
superset_start(){
        source ~/.bashrc
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            conda activate superset ; gunicorn --workers 5 --timeout 120 --bind hadoop102:8787 --daemon 'superset.app:create_app()'
        else
            echo "superset正在运行"
        fi

}

superset_stop(){
    superset_status >/dev/null 2>&1
    if [[ $? -eq 0 ]]; then
        echo "superset未在运行"
    else
        ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
    fi
}


case $1 in
    start )
        echo "启动Superset"
        superset_start
    ;;
    stop )
        echo "停止Superset"
        superset_stop
    ;;
    restart )
        echo "重启Superset"
        superset_stop
        superset_start
    ;;
    status )
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            echo "superset未在运行"
        else
            echo "superset正在运行"
        fi
esac

加执行权限

bash 复制代码
chmod +x superset.sh

启动superset

bash 复制代码
superset.sh start

停止superset

bash 复制代码
superset.sh stop

Superset使用

Superset对接MySQL数据源

安装依赖

bash 复制代码
conda install mysqlclient

注意:对接不同的数据源,需安装不同的依赖

官网说明:

bash 复制代码
https://superset.apache.org/docs/databases/installing-database-drivers/

Database配置

点击Database Connections 点击DATABASE 选择需要连接的数据库

方式一:逐个输入认证信息 方式二:通过URL连接

注意:

SQL Alchemy URI编写规范:mysql://用户名:密码@主机名:端口号/数据库名称

此处填写:mysql://superset:superset@master:3306/demo?charset=utf8,然后点击Test Connection,出现Connection looks good提示即表示连接成功

Table配置

点击Datasets

点击DATASET 配置Table 点击Create DataSet And Create Chart 此时返回Datasets

创建空白仪表盘

点击Dashboards

命名并保存

创建图表

点击Charts

选则数据源及图表类型并创建图表

按照说明配置图表并创建

如配置无误,可出现以下图标 保存至仪表盘

编辑仪表盘

打开仪表盘,点击编辑按钮

调整图表大小以及图表盘布局

调整仪表盘自动刷新时间 最后保存

相关推荐
青云交17 分钟前
大数据新视界 -- Hive 查询性能优化:基于成本模型的奥秘(上)(5/ 30)
大数据·优化器·执行计划·统计信息·hive 查询性能·成本模型·hive 优化
gma99921 分钟前
【BUG】ES使用过程中问题解决汇总
大数据·elasticsearch·搜索引擎
Mephisto.java1 小时前
【大数据学习 | Spark-Core】RDD的缓存(cache and checkpoint)
大数据·学习·spark
zmd-zk1 小时前
flink学习(3)——方法的使用—对流的处理(map,flatMap,filter)
java·大数据·开发语言·学习·flink·tensorflow
NiNg_1_2341 小时前
Hadoop的MapReduce详解
大数据·hadoop·mapreduce
在下不上天1 小时前
flume-将日志采集到hdfs
大数据·linux·运维·hadoop·hdfs·flume
zmd-zk1 小时前
flink学习(1)——standalone模式的安装
大数据·hadoop·flink·实时
Dreams°1231 小时前
【大数据测试Flume:从 0-1详细教程】
大数据·python·单元测试·自动化·flume
开利网络4 小时前
数字化转型:企业降本增效的关键之路
大数据·物联网·搜索引擎·信息可视化·1024程序员节
Elastic 中国社区官方博客8 小时前
使用 Elastic AI Assistant for Search 和 Azure OpenAI 实现从 0 到 60 的转变
大数据·人工智能·elasticsearch·microsoft·搜索引擎·ai·azure