模型简介

夫子•明察司法大模型是由山东大学、浪潮云、中国政法大学联合研发，以 ChatGLM-6B 为大模型底座，基于海量中文无监督司法语料（包括各类判决文书、法律法规等）与有监督司法微调数据（包括法律问答、类案检索）训练的中文司法大模型。该模型支持法条检索、案例分析、三段论推理判决以及司法对话等功能，旨在为用户提供全方位、高精准的法律咨询与解答服务。

在 2023 年 9 月份由上海AI实验室联合南京大学推出的大语言模型司法能力评估体系 LawBench 中，该模型在法律专精模型 (Law Specific LLMs) 中 Zero-Shot 表现出色，取得了第一名。同时与未经法律专业知识训练的 ChatGLM-6B 相比有了较大提升。

大厂 + 名校，再加上 LawBench 优异的成绩，足以成为该领域无可争议的领头羊。

官方网站：ir.sdu.edu.cn/application...

项目地址：github.com/irlab-sdu/f...

内测申请：docs.qq.com/form/page/D...

ChatGLM-6B

ChatGLM-6B 是由清华大学和智谱AI联合研发的产品，是一个开源的、支持中英双语的对话语言模型，基于 General Language Model (GLM) 架构，具有 62 亿参数。结合模型量化技术，用户可以在消费级的显卡上进行本地部署（INT4 量化级别下最低只需 6GB 显存）。ChatGLM-6B 使用了和 ChatGPT 相似的技术，针对中文问答和对话进行了优化。经过约 1T 标识符的中英双语训练，辅以监督微调、反馈自助、人类反馈强化学习等技术的加持，62 亿参数的 ChatGLM-6B 已经能生成相当符合人类偏好的回答。

部署准备

官网提供了两种部署方式，一种是使用 PyLucene，官方给出安装步骤，以此网络上部署该大模型都是采用这种方式。另一个是使用 Elasticsearch，官方介绍得很简略，采用这种部署方式一般在官方 github 的 issues 看到。

部署前，须先了解下硬件配置，由于夫子明察大模型是基于 ChatGLM-6B 的，我们直接到 ChatGLM-6B 官网查看配置信息

硬件需求

量化等级	最低 GPU 显存（推理）	最低 GPU 显存（高效参数微调）
FP16（无量化）	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB

环境安装

使用 pip 安装依赖：pip install -r requirements.txt，其中 transformers 库版本推荐为 4.27.1，但理论上不低于 4.23.1 即可。此外，如果需要在 cpu 上运行量化后的模型，还需要安装 gcc 与 openmp。多数 Linux 发行版默认已安装。Windows 和 MacOS 可以参考官网。

基于 PyLucene 部署

由于 Pylucene 安装过程较为繁琐，官方提供了 Pylucene 的 Singularity 环境镜像，可以参考 Singularity 安装文档安装 Singularity。

之前在 Windows11 的子系统 Ubuntu 20 上，使用 Ollama 搭建了 Qwen:lastest，觉得可以先复制下经验

随着在 Ollama Hub 上没有找到夫子•明察司法大模型，另外这个大模型也不是基于 Docker 容器的，只能参考官方说明和网络博文的顺序开始部署。

Singularity 简介

Singularity 与 Docker 一样，都是容器技术的一种。

尽管，Singularity 确实存在一些不可忽视的弱点，例如比 Docker 更不成熟以及更小的社区用户。其真正的优势在于高性能服务器上部署分析管道时，无需 root 权限、安全性以及低成本的入门学习，并且能无缝地将 Docker 容器转换为 Singularity 容器。而 Docker 无论是容器构建、运行、生成的结果文件，都需要 root 权限，这对于用户使用来说非常不方便。

一文入门 Singularity：segmentfault.com/a/119000004...

安装 Singularity

1、安装环境依赖包

shell 复制代码

sudo apt-get update

sudo apt-get install -y build-essential libssl-dev uuid-dev libgpgme11-dev squashfs-tools libseccomp-dev wget pkg-config git cryptsetup debootstrap

2、安装 go 语言

由于 Singularity 是用 go 语言写的，所以需要安装 go 。

shell 复制代码

wget https://dl.google.com/go/go1.13.linux-amd64.tar.gz
sudo tar --directory=/usr/local -xzvf go1.13.linux-amd64.tar.gz

将 go 的路径添加到环境变量中，即添加到 ~/.bashrc 的最后一行

shell 复制代码

export PATH=/usr/local/go/bin:$PATH

让环境变量生效，并查看部署情况

shell 复制代码

source ~/.bashrc
go version

3、下载并解压 Singularity

官方提供了 Pylucene 镜像是基于 Singularity 3.8.0 构建的，所以版本号选定了 3.8.0。

shell 复制代码

wget https://github.com/singularityware/singularity/releases/download/v3.8.0/singularity-3.8.0.tar.gz
tar -xzvf singularity-3.8.0.tar.gz

4、构建安装singularity

shell 复制代码

cd singularity-3.8.0

sudo ./mconfig --prefix=/mnt/workspace/singularity-3.8.0 && \
    sudo make -C ./builddir && \
    sudo make -C ./builddir install

将 Singularity 添加到 PATH 中，即 ~/.bashrc

shell 复制代码

export PATH=/mnt/workspace/singularity-3.8.0/bin:$PATH

让环境变量生效，并查看部署情况

shell 复制代码

source ~/.bashrc
singularity --version

5、测试安装

shell 复制代码

singularity run library://godlovedc/funny/lolcow

如果安装正常，则会出现以下内容：

下载 Pylucene 镜像

夫子明察模型的 Singularity 可通过百度网盘获取与下载（提取码：jhhl）：

arduino 复制代码

https://pan.baidu.com/s/1PqUnX7YRNMt9co3RKUwExw

由于 windows 11 和子系统 Ubuntu 的文件都是可以共享的，直接到 /mnt 目录下就可以看到各分区的文件。两个系统之间文件共享。

下载夫子明察源码

shell 复制代码

cd ~
mkdir fuzimingcha
cd fuzimingcha
git clone https://github.com/irlab-sdu/fuzi.mingcha.git

获取模型依赖

更新 Python 版本

shell 复制代码

sudo apt install python3-pip

安装模型依赖

shell 复制代码

cd /mnt/e/fuzimingcha/fuzi_mingcha/src
pip install -r requirements.txt

安装依赖成功后，便基本完成了模型的环境配置：

安装依赖时，出现以下告警，但不影响部署：

java 复制代码

WARNING: The script lit is installed in '/home/asyyr/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

下载预训练模型

夫子明察的预训练模型文件可通过百度网盘进行下载（提取码为：ygm1）

bash 复制代码

https://pan.baidu.com/s/1kEDpBKTj9jX7k1NNhOtEIQ#list/path=%2F

调整 numpy 版本

部署大模型过程中，假如 numpy 版本超过 2，将被终止，因为大模型是在 numpy 1.x.x 版本下预训练的。但是具体哪个版本，官方并没有给出来。

1、查看 numpy 具体版本号

2、卸载 numpy 2.x.x 版本

3、安装 numpy，限制了版本号为 1.26.4。一开始用了其它版本号，再执行部署大模型时会报错，用 1.26.4 重新部署就正常。

部署检索模块

该项目的检索模块使用 pylucene 进行构建，分为 task1 的数据库（法律法规数据集）与 task2 的数据库（案例判决数据集）。

启动 task1

修改 /mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task1/api.py 里面的 csv_folder_path 为绝对路径

shell 复制代码

# 设置CSV文件夹路径和Lucene索引文件夹路径
# csv_folder_path = "./csv_files/"
csv_folder_path = "/mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task1/csv_files/"

执行 api.py

shell 复制代码

singularity exec -B "/mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task1":/mnt "/mnt/e/fuzimingcha/model/pylucene_singularity.sif" python /mnt/api.py --port 9991

启动 task2

修改 /mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task2/api.py 里面的 csv_folder_path 为绝对路径

shell 复制代码

# 设置CSV文件夹路径和Lucene索引文件夹路径
# csv_folder_path = "./csv_files/"
csv_folder_path = "/mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task2/csv_files/"

执行 api.py

shell 复制代码

singularity exec -B "/mnt/e/fuzimingcha/fuzi_mingcha/src/pylucene_task2":/mnt "/mnt/e/fuzimingcha/model/pylucene_singularity.sif" python /mnt/api.py --port 9992

启动大模型Demo

运行 src 下的 cli_demo.py 文件来启动命令行 demo，需要修改 cli_demo.py 中对应的参数存放具体路径，将 tokenizer 与 model 中的路径改为：

shell 复制代码

# SDUIRLab/fuzi-mingcha-v1_0 为参数存放的具体路径
# tokenizer = AutoTokenizer.from_pretrained("SDUIRLab/fuzi-mingcha-v1_0", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("/mnt/e/fuzimingcha/model", trust_remote_code=True)

# model = AutoModel.from_pretrained("SDUIRLab/fuzi-mingcha-v1_0", trust_remote_code=True).half().cuda()
model = AutoModel.from_pretrained("/mnt/e/fuzimingcha/model", trust_remote_code=True).half().cuda()

执行 cli_demo.py

shell 复制代码

python3 /mnt/e/fuzimingcha/fuzi_mingcha/src/cli_demo.py --url_lucene_task1 "http://127.0.0.1:9991" --url_lucene_task2 "http://127.0.0.1:9992"

运行的过程中可能会出现报错模型在加载的过程中被 killed，查看电脑资源使用情况，发现 CPU、内存都快爆了。在本地笔记本部署这个方向算是 Over 了。

阿里云服务器

由于本地笔记本电脑配置跑不动这种大模型，特别是 GPU 资源，集显的短板就显得特别突出。只能借助下公有云了，搜了下阿里云、腾讯云和华为云，发现只有阿里云提供人工智能相关的试用资源，而且 GPU 满足 ChatGLM-6B 的硬件要求。

开通对象存储 OSS

由于阿里云 DSW 支持 5G 以内的文件可以通过操作界面上传，超过 5G 的文件就只能通过 OSS 传输了。

按照阿里云 ossutil 配置介绍 help.aliyun.com/zh/oss/deve... 开通好 AccessKey 要记录 AccessKey ID 和 AccessKey Secret。

上传大文件

按照阿里云 OSS 文档，将 Windows 版的 ossUtil 文件下载到本地，配置好 buket、AccessKey ID、AccessKey Secret 等信息，执行 ossutil 命令就可以上传了

shell 复制代码

ossutil cp E:\fuzimingcha\model\pylucene_singularity.sif oss://asyyr/model/pylucene_singularity.sif

不支持 Singularity

重复之前的部署步骤。部署检索模块时，弹出以下报错：

shell 复制代码

FATAL: while extracting /mnt/workspace/model/pylucene_singularity.sif: root filesystem extraction failed: extract command failed: ERROR  : Failed to create user namespace: user namespace requires to set /proc/sys/kernel/unprivileged_userns_clone to 1

查看了 /proc/sys/kernel/unprivileged_userns_clone，确实为 1

咨询了 DSW 上的 AI 助手，还是建议我将 /proc/sys/kernel/unprivileged_userns_clone 设置为 1，否则就咨询售后工程师。于是提了工单，从早上11点到下午16点之间，工程师最后给了这么个回复：

降低 ChatGLM 参数

想在阿里云上面搭建的这个方向，是没法绕开安全策略限制的，只能从别的方向试试。突然想到，大模型 ChatGLM-6B 参数既然要那么高资源，那 ChatGLM-6B-Int4 低配版的是不是对资源要求低点？

登录魔塔，看了下 ChatGLM-6B-Int4 的介绍：

再对比模型文件，发现小了很多，估计有戏了，立马下载。

在部署检索模块时，出现了编译报错等问题，忘记截图，到了网络上搜了下，建议 Pylucene 镜像基于 ChatGLM-6B-Int4 重新构建。这个难度有点大。。。

基于 Elasticsearch 部署

回到官方部署指导页面，基于 PyLucene 部署几乎能想到的方式，都试过了，不大行。就剩下基于 Elasticsearch 部署方式了，但是官方没什么指导，网络上也没搜到几篇类似的文章，自己搞不知道行不行。但是不试不行了，只能硬着头皮继续干。

Ubuntu 安装 ES

1、前置要求

运行 Ubuntu 22.04 的服务器。

服务器上设置了 root 密码。

2、安装 Java

由于 Elasticsearch 是基于 Java 的，因此需要在服务器上安装 Java：

shell 复制代码

apt install default-jdk -y

Java 安装完成后，可以使用以下命令检查 Java 版本：

shell 复制代码

java --version

3、添加 Elasticsearch 存储库

默认情况下，Elasticsearch 软件包不包含在 Ubuntu 22.04 的默认存储库中。因此需要将官方存储库添加到APT中。

首先，使用以下命令安装所需的依赖项：

shell 复制代码

apt install curl wget gnupg2 wget -y

然后，安装所有依赖项后，使用以下命令添加 Elasticsearch GPG 密钥：

bash 复制代码

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | gpg --dearmor -o /usr/share/keyrings/elastic.gpg

接下来，使用以下命令将 Elasticsearch 存储库添加到 APT：

shell 复制代码

echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list

最后，使用以下命令更新存储库的缓存：

shell 复制代码

apt update -y

4、安装 Elasticsearch

使用以下命令安装 Elasticsearch：

shell 复制代码

apt install elasticsearch -y

安装 Elasticsearch 包后，编辑 Elasticsearch 配置文件：

shell 复制代码

vim /etc/elasticsearch/elasticsearch.yml

更改以下行：

shell 复制代码

network.host: localhost

使用以下命令启动 Elasticsearch 服务：

shell 复制代码

systemctl start elasticsearch

出现以下错误：

shell 复制代码

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: 主机已关闭

该错误提示表明你的系统没有使用 systemd 作为初始化系统。在 Linux 系统中，初始化系统是启动系统时第一个启动的进程，负责启动和管理系统中的各种服务。在使用 sytetmctl 查服务状态时中出现了这个问题。具体解决方法：

shell 复制代码

# apt 更新
> sudo apt-get update
 
# 安装 systemd, systemctl
> sudo apt-get install systemd -y
> sudo apt-get install systemctl -y

升级 CPU 和内存之后，启动服务

shell 复制代码

systemctl start elasticsearch

使用以下命令检查 Elasticsearch 的状态：

shell 复制代码

systemctl status elasticsearch

启动成功

shell 复制代码

elasticsearch.service - Elasticsearch
    Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service, enabled)
    Active: active (running)

部署检索模块

shell 复制代码

cd /mnt/workspace/fuzi_mingcha/src

python ./es_task1/api.py --port 9001 &
python ./es_task2/api.py --port 9002 &

查看进程

shell 复制代码

ps -ef | grep python

启动大模型 Demo

shell 复制代码

python cli_demo.py --url_lucene_task1 "localhost:9001" --url_lucene_task2 "localhost:9002"

编码转换错误 UnicodeDecodeError

shell 复制代码

/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
The dtype of attention mask (torch.int64) is not bool
2024-08-02 16:18:22.979290: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-02 16:18:23.015454: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-02 16:18:23.694707: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/mnt/workspace/fuzi_mingcha/src/cli_demo.py", line 134, in <module>
    main()
  File "/mnt/workspace/fuzi_mingcha/src/cli_demo.py", line 100, in main
    generate_law, _ = chat(prompt1_task1.replace("@用户输入@", query))
  File "/mnt/workspace/fuzi_mingcha/src/cli_demo.py", line 54, in chat
    response, history = model.chat(tokenizer, prompt, history=history if history else [], max_length=4096, max_time=100, top_p=0.7, temperature=0.95)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 1287, in chat
    response = tokenizer.decode(outputs)
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3507, in decode
    token_ids = to_py_obj(token_ids)
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 206, in to_py_obj
    return [to_py_obj(o) for o in obj]
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 206, in <listcomp>
    return [to_py_obj(o) for o in obj]
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 207, in to_py_obj
    elif is_tf_tensor(obj):
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 166, in is_tf_tensor
    return False if not is_tf_available() else _is_tensorflow(x)
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 157, in _is_tensorflow
    import tensorflow as tf
  File "/usr/local/lib/python3.10/site-packages/tensorflow/__init__.py", line 45, in <module>
    from tensorflow._api.v2 import __internal__
  File "/usr/local/lib/python3.10/site-packages/tensorflow/_api/v2/__internal__/__init__.py", line 13, in <module>
    from tensorflow._api.v2.__internal__ import feature_column
  File "/usr/local/lib/python3.10/site-packages/tensorflow/_api/v2/__internal__/feature_column/__init__.py", line 8, in <module>
    from tensorflow.python.feature_column.feature_column_v2 import DenseColumn # line: 1777
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_v2.py", line 38, in <module>
    from tensorflow.python.feature_column import feature_column as fc_old
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 41, in <module>
    from tensorflow.python.layers import base
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module>
    from tensorflow.python.keras.legacy_tf_layers import base
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module>
    from tensorflow.python.keras import models
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 25, in <module>
    from tensorflow.python.keras.engine import training_v1
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_v1.py", line 46, in <module>
    from tensorflow.python.keras.engine import training_arrays_v1
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 37, in <module>
    from scipy.sparse import issparse  # pylint: disable=g-import-not-at-top
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/__init__.py", line 294, in <module>
    from ._base import *
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_base.py", line 5, in <module>
    from scipy._lib._util import VisibleDeprecationWarning
  File "/usr/local/lib/python3.10/site-packages/scipy/_lib/_util.py", line 18, in <module>
    from scipy._lib._array_api import array_namespace
  File "/usr/local/lib/python3.10/site-packages/scipy/_lib/_array_api.py", line 15, in <module>
    from numpy.testing import assert_
  File "/usr/local/lib/python3.10/site-packages/numpy/testing/__init__.py", line 11, in <module>
    from ._private.utils import *
  File "/usr/local/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1253, in <module>
    _SUPPORTS_SVE = check_support_sve()
  File "/usr/local/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1247, in check_support_sve
    output = subprocess.run(cmd, capture_output=True, text=True)
  File "/usr/local/lib/python3.10/subprocess.py", line 505, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/local/lib/python3.10/subprocess.py", line 1154, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/local/lib/python3.10/subprocess.py", line 2059, in _communicate
    stdout = self._translate_newlines(stdout,
  File "/usr/local/lib/python3.10/subprocess.py", line 1031, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

推算由于编码转换异常，修改报错文件 /mnt/workspace/fuzi_mingcha/src/cli_demo.py，增加以下编码定义：

python 复制代码

# -*- coding: utf-8 -*-

重新运行大模型 Demo，还是报同样的错误。

结合 traceback 看，报错的地方应该是 /usr/local/lib/python3.10 路径下的 subprocess.py 文件的第1031行的 data.decode(encoding, errors) 方法。咨询下同事，建议我直接对 Python 3.10 中的特定函数进行修改。

shell 复制代码

# 进入Python 3.10所在路径
cd /usr/local/lib/python3.10
# 使用修改subprocess.py文件
vim subprocess.py

将其第 1031 行的 data.decode(encoding, errors) 方法修改为如下，然后保存。

重新运行大模型 Demo，这次终于可以顺利地和模型聊天了。

左上角的 C，M，G1 分别显示 CPU、内存、GPU 资源使用情况，可以看得出来，推理时需要消耗大量的 GPU 资源。

夫子•明察大模型部署

模型简介

ChatGLM-6B

部署准备

硬件需求

环境安装

基于 PyLucene 部署

Singularity 简介

安装 Singularity

下载 Pylucene 镜像

下载夫子明察源码

获取模型依赖

下载预训练模型

调整 numpy 版本

部署检索模块

启动 task1

启动 task2

启动大模型Demo

阿里云服务器

开通对象存储 OSS

上传大文件

不支持 Singularity

降低 ChatGLM 参数

基于 Elasticsearch 部署

Ubuntu 安装 ES

部署检索模块

启动大模型 Demo