"OpenAI 不 'Open',DeepSeek 真 'Deep'。"
性能匹敌世界顶尖的闭源模型 GPT-4o 和 Claude-3.5-Sonnet,训练成本为 558 万美元,是 GPT-4o 的十分之一不到;API 价格约为 GPT-4o 的百分之一;成本利润率达到 545%,最高日赚 346 万元......
"出道即巅峰"的国产 AI 顶流------DeepSeek,于 2025 年 2 月 24 日至 28 日开展为期 5 天的"开源周"行动,并于收官当日开源了 Fire-Flyer File System(3FS) 并行文件系统,完整揭秘 3FS 如何"榨干" SSD 和远程直接内存访问(RDMA)网络,为 AI 训练和推理工作负载带来低成本高吞吐量的数据访问体验。
基于上海川源存储系统硬件平台,资深研发工程师军哥连夜"爆肝"部署了 DeepSeek 3FS,并整理出一份企业级部署最佳实践指南, "毫无保留地分享我们微小但真诚的进展",助力更多企业率先体验这场全球 AI 普惠实践。
01 测试环境
02 环境配置
2.1 网络配置
分为 2 个网络:管理网和业务网。管理网用普通的千兆即可,业务网必须支持 RDMA(IB/RoCE)。以下为其中一个节点的示例:
yaml
root@storage-145:~# cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
ethernets:
ens10f0np0:
dhcp4: false
addresses: [10.11.12.145/24] #业务网
ens10f1np1:
dhcp4: false
ens129f0:
dhcp4: false
addresses: [172.150.211.145/16] #管理网
gateway4: 172.150.0.1
nameservers:
addresses: [114.114.114.114]
ens129f1:
dhcp4: false
ens3f0np0:
dhcp4: false
ens3f1np1:
dhcp4: false
version: 2
为每个节点设置 hosts 映射:
bash
vi /etc/hosts
10.11.12.143 meta
2.2 安装 OFED(Open Fabrics Enterprise Distribution)
这里每个节点都需要安装。
ruby
1、 安装OFED
wget https://www.mellanox.com/downloads/DOCA/DOCA_v2.10.0/host/doca-host_2.10.0-093000-25.01-ubuntu2204_amd64.deb
dpkg -i doca-host_2.10.0-093000-25.01-ubuntu2204_amd64.deb
apt-get update
apt-get -y install doca-ofed
2、验证OFED
直接执行ibdev2netdev, 有如下输出表示安装成功(这里同时也需要硬件支持)
root@storage-145:~# ibdev2netdev
mlx5_0 port 1 > ens3f0np0 (Up)
mlx5_1 port 1 > ens3f1np1 (Up)
mlx5_2 port 1 > ens10f0np0 (Up)
mlx5_3 port 1 > ens10f1np1 (Up)
root@storage-145:~#
2.3 系统配置
每个节点都需要设置。
bash
1、设置root密码(后续安装都使用root用户)
sudo passwd root
2、切换到root用户
su - root
3、允许root远程登录
vi /etc/ssh/sshd_config
PermitRootLogin yes # 修改此项
systemctl restart sshd # 重启服务
4、调整系统限制
echo "fs.file-max = 20000000" | tee -a /etc/sysctl.conf
echo "fs.nr_open = 20000000" | tee -a /etc/sysctl.conf
sudo sysctl -p # 立即生效配置
cat << EOF >> /etc/security/limits.conf
* - nofile 10000000
root soft nofile 10000000
root hard nofile 10000000
* soft nproc 204800
* hard nproc 204800
* soft nofile 10000000
* hard nofile 10000000
* soft memlock unlimited
* hard memlock unlimited
EOF
5、调整完成后重启操作系统
03 安装
3.1 安装依赖包
这里每个节点都需要安装。
python
apt install -y cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libprocps-dev libdwarf-dev libunwind-dev \
libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
libgoogle-perftools-dev google-perftools libssl-dev ccache gcc-12 g++-12 libboost-all-dev python3-pip
安装 libfuse,这里每个节点都需要安装。
bash
wget https://github.com/libfuse/libfuse/releases/download/fuse-3.16.1/fuse-3.16.1.tar.gz
tar vzxf fuse-3.16.1.tar.gz
cd fuse-3.16.1/
mkdir build && cd build
apt install -y meson
meson setup ..
ninja && ninja install
安装 rust,这里每个节点都需要安装。
arduino
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
输入"1"回车。
arduino
1)proceed with standard installation (default -just press enter)
2) Customize installation
3) Cancel installation
>1
安装完成。
vbnet
Rust is installed now. Great!
To get started you may need to restart your current shell.
This would reload your PATH environment variable to include Cargo's bin directory($HOME/.cargo/bin).
To configure your current shell, you need to source
the Corresponding env file under $HOME/.cargo.
This is usually done by running one of the following (note the leading DOT):
. "$HOME/.cargo/env" # For sh/bash/zsh/ash/dash/pdksh
source "$HOME/.cargo/env.fish" # for fish
root@storage-146:~/fuse-3.16.1/build#
注:安装完成后,需要退出 shell 重新登录。
安装 FoundationDB,这里每个节点都需要安装。
ruby
wget https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-clients_7.3.63-1_amd64.deb
wget https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-server_7.3.63-1_amd64.deb
dpkg -i foundationdb-clients_7.3.63-1_amd64.deb
dpkg -i foundationdb-server_7.3.63-1_amd64.deb
编译 3FS,这里每个节点都需要执行。
bash
git clone https://github.com/deepseek-ai/3fs
cd 3fs
git submodule update --init --recursive
./patches/apply.sh
cmake -S . -B build -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
cmake --build build -j 32
修改配置:
cd ~/3fs/configs
sed -i 's/max_sge = 16/max_sge = 1/g' `grep -rl max_sge`
安装&配置 ClickHouse,在 meta 节点上安装。
sql
1、安装
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
ARCH=$(dpkg --print-architecture)
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
设置密码,如:Shcy123!!
Set up the password for the default user:
sudo apt-get install -y clickhouse-server clickhouse-client
2、启动
clickhouse start
3、登录&验证
clickhouse-client --password 'Shcy123!!'
4、创建Metric table
clickhouse-client --password 'Shcy123!!' -n < ~/3fs/deploy/sql/3fs-monitor.sql
5、查看表。clickhouse支持SQL语法,如下图所示:
配置监控服务
监控服务可以单独部署在一台服务器上。由于是测试环境,我们部署在 meta 节点上。
bash
1、复制配置文件
mkdir -p /opt/3fs/{bin,etc}
mkdir -p /var/log/3fs
cp ~/3fs/build/bin/monitor_collector_main /opt/3fs/bin
cp ~/3fs/configs/monitor_collector_main.toml /opt/3fs/etc
2、修改配置文件
vim /opt/3fs/etc/monitor_collector_main.toml
修改如下配置:
[server.monitor_collector.reporter.clickhouse]
db = '3fs' # 默认为3fs
host = '127.0.0.1' # clickhouse所在的节点
passwd = '9' # 安装时设置的密码
port = '9000' # 默认监听在9000端口
user = 'default' # 默认为default用户
3、启动mon服务
cp ~/3fs/deploy/systemd/monitor_collector_main.service /usr/lib/systemd/system
systemctl start monitor_collector_main
4、检查服务
systemctl status monitor_collector_main
3.2 配置 Admin Client
在所有节点上安装&配置 admin_cli。
bash
1、复制配置文件
mkdir -p /opt/3fs/{bin,etc}
rsync -avz meta:~/3fs/build/bin/admin_cli /opt/3fs/bin
rsync -avz meta:~/3fs/configs/admin_cli.toml /opt/3fs/etc
rsync -avz meta:/etc/foundationdb/fdb.cluster /opt/3fs/etc
2、修改配置
vim /opt/3fs/etc/admin_cli.toml
cluster_id = "stage"
[fdb]
clusterFile = '/opt/3fs/etc/fdb.cluster'
3、使用
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml help
3.3 配置 MGT 服务
MGT 可以单独部署在一个节点上,这里选择与 meta 节点部署在同一节点。
bash
1、复制配置文件
复制配置文件
cp ~/3fs/build/bin/mgmtd_main /opt/3fs/bin
cp ~/3fs/configs/{mgmtd_main.toml,mgmtd_main_launcher.toml,mgmtd_main_app.toml} /opt/3fs/etc
2、修改配置
vim /opt/3fs/etc/mgmtd_main_app.toml
node_id = 1 // 将ID改为1
vim /opt/3fs/etc/mgmtd_main_launcher.toml
cluster_id = "stage" # 集群ID,全局唯一
[fdb]
clusterFile = '/opt/3fs/etc/fdb.cluster'
vim /opt/3fs/etc/mgmtd_main.toml
[common.monitor.reporters.monitor_collector]
remote_ip = "10.11.12.143:10000" # 这里将IP改为mon节点的IP
3、初始化集群
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "init-cluster --mgmtd /opt/3fs/etc/mgmtd_main.toml 1 1048576 16"
其中参数1代表chainTable ID, 1048576代表chunksize(1MB), 16代表file strip size.然后启动服务并验证
4、启动服务
cp ~/3fs/deploy/systemd/mgmtd_main.service /usr/lib/systemd/system
systemctl start mgmtd_main
5、查看服务
6、验证
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "list-nodes"
3.4 配置 meta 服务
只需在 meta 节点上执行。
bash
1、复制配置文件
cp ~/3fs/build/bin/meta_main /opt/3fs/bin
cp ~/3fs/configs/{meta_main_launcher.toml,meta_main.toml,meta_main_app.toml} /opt/3fs/etc
2、修改配置
vi /opt/3fs/etc/meta_main_app.toml
node_id = 100
vi /opt/3fs/etc/meta_main_launcher.toml
cluster_id = "stage"
[mgmtd_client]
mgmtd_server_addresses = ["RDMA://10.11.12.143:8000"]
vi /opt/3fs/etc/meta_main.toml
[server.mgmtd_client]
mgmtd_server_addresses = ["RDMA://10.11.12.143:8000"]
[common.monitor.reporters.monitor_collector]
remote_ip = "10.11.12.143:10000"
[server.fdb]
clusterFile = '/opt/3fs/etc/fdb.cluster'
3、更新配置到管理节点
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "set-config --type META --file /opt/3fs/etc/meta_main.toml"
4、启动服务
cp ~/3fs/deploy/systemd/meta_main.service /usr/lib/systemd/system
systemctl start meta_main
5、查看服务状态
systemctl status meta_main
6、检查集群状态
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "list-nodes"
3.5 配置存储节点
这里在要参与存储的 5 个节点上进行操作。
bash
1、格式化挂载磁盘(这里根据自己的磁盘数量行修改,这里笔者有4块磁盘所以是0..3)mkdir -p /storage/data{0..3}
mkdir -p /var/log/3fs
for i in {0..3};do mkfs.xfs -f -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done
mkdir -p /storage/data{0..3}/3fs
2、修改sysctl.conf
echo "fs.aio-max-nr=67108864" >> /etc/sysctl.conf
sysctl -p
3、在meta节点修改原始配置文件
vi ~/3fs/configs/storage_main_launcher.toml
cluster_id = "stage"
[mgmtd_client]
mgmtd_server_addresses = ["RDMA://10.11.12.143:8000"]
vi ~/3fs/configs/storage_main.toml
[server.mgmtd]
mgmtd_server_address = ["RDMA://10.11.12.143:8000"]
[common.monitor.reporters.monitor_collector]
remote_ip = "10.11.12.143:10000"
[server.targets]
target_paths = ["/storage/data0/3fs","/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs"] # 有多少挂载点就配置多少
4、在所有存储节点上将配置文件从meta节点上拷贝过来
rsync -avz meta:~/3fs/build/bin/storage_main /opt/3fs/bin
rsync -avz meta:~/3fs/configs/{storage_main_launcher.toml,storage_main.toml,storage_main_app.toml} /opt/3fs/etc
5、修改存储节点ID,每个ID必须全局唯一。这里,存储节点使用10001~10005
vi /opt/3fs/etc/storage_main_app.toml
node_id = 10001
6、然后每个存储节点更新配置
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "set-config --type STORAGE --file /opt/3fs/etc/storage_main.toml"
7、每个存储节点启动服务
rsync -avz meta:~/3fs/deploy/systemd/storage_main.service /usr/lib/systemd/system
systemctl start storage_main
8、每个存储节点查看服务状态
systemctl status storage_main
9、检查集群状态
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "list-nodes"
3.6 配置 3FS
以下步骤在管理节点上执行。
bash
1、创建用户
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "user-add --root --admin 0 root"
2、将上一步生成token保存在/opt/3fs/etc/token.txt中.
echo "your_token" > /opt/3fs/etc/token.txt
3、创建链表
pip3 install -r ~/3fs/deploy/data_placement/requirements.txt
python3 ~/3fs/deploy/data_placement/src/model/data_placement.py \
-ql -relax -type CR --num_nodes 5 --replication_factor 3 --min_targets_per_disk 6
# 这一步注意的地方:
# 1.修改num_disks_per_node为实际磁盘数量
# 2.修改node_id_begin和node_id_end为实际的ID
python3 ~/3fs/deploy/data_placement/src/setup/gen_chain_table.py \
--chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \
--num_disks_per_node 4 --num_targets_per_disk 6 \
--target_id_prefix 1 --chain_id_prefix 9 \
--incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickle
4、创建storage target
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") < output/create_target_cmd.txt
5、上传chains 和 chain table到mgmtd service
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chains output/generated_chains.csv"
/opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chain-table --desc stage 1 output/generated_chain_table.csv"
6、检查是否上传成功
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "list-chains"
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "list-chain-tables"
3.7 配置 FUSE 客户端
这里我们复用存储节点作为客户端。
bash
1、复制配置文件
cp ~/3fs/build/bin/hf3fs_fuse_main /opt/3fs/bin
cp ~/3fs/configs/{hf3fs_fuse_main_launcher.toml,hf3fs_fuse_main.toml,hf3fs_fuse_main_app.toml} /opt/3fs/etc
2、创建挂载点
mkdir -p /3fs/stage
3、修改配置
vi /opt/3fs/etc/hf3fs_fuse_main_launcher.toml
cluster_id = "stage"
mountpoint = '/3fs/stage'
token_file = '/opt/3fs/etc/token.txt'
[mgmtd_client]
mgmtd_server_addresses = ["RDMA://10.11.12.143:8000"]
vi /opt/3fs/etc/hf3fs_fuse_main.toml
[mgmtd]
mgmtd_server_addresses = ["RDMA://10.11.12.143:8000"]
[common.monitor.reporters.monitor_collector]
remote_ip = "10.11.12.143:10000"
4、复制meta节点的token
scp meta:/opt/3fs/etc/token.txt /opt/3fs/etc/token.txt
5、更新Fuse client配置到mgmtd service
/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://10.11.12.143:8000"]' "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
6、启动FUSE Client
cp ~/3fs/deploy/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system
systemctl start hf3fs_fuse_main
7、查看服务状态
systemctl status hf3fs_fuse_main
8、查看挂载点
04 测试
用 FIO 进行测试。
arduino
安装
apt install -y fio
测试
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=4M --group_reporting -size=100M -time_based -runtime=3000 -name=2depth_128file_4M_direct_read_bw -directory=/3fs/stage