RK3588 + 银河麒麟部署 swarm 集群指南

RK3588 + 银河麒麟部署 swarm 集群指南

一. 前提(所有的节点)

1)设置主机名(各台设备)

复制代码
	# 在 master-node 上执行
	sudo hostnamectl set-hostname master-node
	# 在 worker-1 上执行
	sudo hostnamectl set-hostname worker-1
	# 在 worker-2 上执行
	sudo hostnamectl set-hostname worker-2
	
	# 验证
	hostname
	
	# 编辑hosts文件(所有节点)
	sudo vi /etc/hosts
	# 添加以下内容(根据实际IP修改):
	192.168.137.224 master-node
	192.168.137.225 worker-1  # 根据实际worker-1的IP修改
	192.168.137.226 worker-2  # 根据实际worker-2的IP修改

2)sshd使能

复制代码
	vi /etc/ssh/sshd_config
	修改:PermitRootLogin yes
	sudo systemctl restart ssh

3)修改源

复制代码
	vi /etc/apt/sources.list
	# 主仓库
	deb http://archive.kylinos.cn/kylin/KYLIN-ALL 10.1 main restricted universe multiverse
	# 更新仓库
	deb http://archive.kylinos.cn/kylin/KYLIN-ALL 10.1-2107-updates main restricted universe multiverse
	deb http://archive.kylinos.cn/kylin/KYLIN-ALL 10.1-2203-updates main restricted universe multiverse

4)更新系统 & 安装基础工具

复制代码
	sudo apt update && sudo apt upgrade -y
	sudo apt install libcurl4=7.68.0-1kylin2.12
	sudo apt install -y curl wget net-tools ssh iputils-ping

5)确保时间同步

复制代码
	sudo apt install -y chrony
	sudo systemctl enable chrony --now

	# 验证时间同步状态
	chronyc sources -v		

6)关闭防火墙

复制代码
	systemctl list-units | grep -E "(ufw|firewall)" //判断使用哪种
	# 如果使用 ufw
	sudo ufw disable  
	# 如果是 firewalld
	sudo systemctl stop firewalld && sudo systemctl disable firewalld

7)工作目录

复制代码
	mkdir -p /home/swarm

8)modprobe欺骗

复制代码
	原因:内核的版本号与kylin的文件系统的驱动版本不对应,同时内核中把某些驱动直接编译到了内核中,导致modprobe不能成功。当然了这里是swarm,k3s和k8s是必须的。
	# 1. 备份真实的 modprobe
	sudo mv /sbin/modprobe /sbin/modprobe.real

	# 2. 创建欺骗脚本
	cat << 'EOF' | sudo tee /sbin/modprobe
	#!/bin/sh
	MODULE="$1"
	if [ "$MODULE" = "br_netfilter" ] || [ "$MODULE" = "overlay" ]; then
    	exit 0
	elif [ "$MODULE" = "-r" ] || [ "$MODULE" = "--remove" ]; then
	    if [ "$2" = "br_netfilter" ] || [ "$2" = "overlay" ]; then
        	exit 0
    	fi
    	exec /sbin/modprobe.real "$@"
	else
    	exec /sbin/modprobe.real "$@"
	fi
	EOF
	# 3. 添加可执行权限
	sudo chmod +x /sbin/modprobe

	# 4. 确保 sysctl 已设置(关键!)
	cat << 'EOF' | sudo tee /etc/sysctl.d/99-k8s-bridge-netfilter.conf
	net.bridge.bridge-nf-call-iptables = 1
	net.bridge.bridge-nf-call-ip6tables = 1
	net.bridge.bridge-nf-call-arptables = 1
	EOF

	sudo sysctl -p

# 5. 验证欺骗是否成功
	sudo modprobe br_netfilter && echo "✅ Success" || echo "❌ Failed"    // 应输出:✅ Success
	sudo modprobe overlay && echo "✅ Success" || echo "❌ Failed"

二、docker安装过程

1)安装

复制代码
	sudo apt update
	sudo apt install docker.io
	sudo usermod -aG docker $USER
	
	docker -v
	Docker version 20.10.7, build 20.10.7-0kylin5~20.04.2

2)配置

复制代码
	sudo tee /etc/docker/daemon.json <<-'EOF'
	{
	  "registry-mirrors": [
	    "https://docker.mirrors.ustc.edu.cn",
	    "https://hub-mirror.c.163.com",
	    "https://mirror.baidubce.com",
	    "https://dockerproxy.com",
	    "https://mirrors.ustc.edu.cn",
	    "https://docker-cf.registry.cyou",
	    "https://dockercf.jsdelivr.fyi",
	    "https://docker.jsdelivr.fyi",
	    "https://dockertest.jsdelivr.fyi",
	    "https://mirror.aliyuncs.com",
	    "https://docker.m.daocloud.io",
	    "https://docker.nju.edu.cn",
	    "https://docker.mirrors.sjtug.sjtu.edu.cn",
	    "https://mirror.iscas.ac.cn",
	    "https://docker.rainbond.cc",
	    "https://docker.registry.cyou",
	    "https://do.nark.eu.org",
	    "https://dc.j8.work",
	    "https://gst6rzl9.mirror.aliyuncs.com",
	    "https://registry.docker-cn.com",
	    "https://mirrors.tuna.tsinghua.edu.cn",
	    "https://registry.cn-beijing.aliyuncs.com"
	  ],
	 "insecure-registries" : [
	    "192.168.137.224:5000",
	    "registry.docker-cn.com",
	    "http://mirrors.sohu.com",
	    "http://hub-mirror.c.163.com",
	    "docker.mirrors.ustc.edu.cn"
	    ],
	 "debug": true,
	 "experimental": false
	}
	EOF

	systemctl daemon-reload
	systemctl restart docker.service
	docker info

3)测试

复制代码
	docker system prune -f //清理之前的下载
	docker pull hello-world
	docker run hello-world
	
	运行结果:
	Hello from Docker!
	This message shows that your installation appears to be working correctly.

	To generate this message, Docker took the following steps:
	 1. The Docker client contacted the Docker daemon.
	 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
		(arm64v8)
	 3. The Docker daemon created a new container from that image which runs the
		executable that produces the output you are currently reading.
	 4. The Docker daemon streamed that output to the Docker client, which sent it
		to your terminal.

	To try something more ambitious, you can run an Ubuntu container with:
	 $ docker run -it ubuntu bash

	Share images, automate workflows, and more with a free Docker ID:
	 https://hub.docker.com/

	For more examples and ideas, visit:
	 https://docs.docker.com/get-started/

4)docker的一些指令

复制代码
# 删除不完整的镜像
docker system prune -f
# 检查镜像是否存在
docker images | grep portainer
 使用国内镜像源拉取 Portainer
docker pull portainer/portainer-ce:latest

//镜像打包
# 检查 hello-world 镜像是否存在
docker images | grep hello-world
# 将镜像保存为 tar 文件
docker save -o hello-world.tar hello-world:latest
# 或者使用重定向方式
docker save hello-world:latest > hello-world-v2.tar
# 查看生成的文件
ls -lh hello-world*.tar

//本地镜像导入
# 从 tar 文件导入镜像
docker load -i hello-world.tar
# 或者使用输入重定向
docker load < hello-world.tar
# 验证镜像已导入
docker images | grep hello-world

三、swarm安装

1)manager初始化及查看

复制代码
docker swarm init --advertise-addr 192.168.137.224
	打印显示:
	Swarm initialized: current node (quj6te9i5wbhnvlukhr8umyyw) is now a manager.
	To add a worker to this swarm, run the following command:
	docker swarm join --token SWMTKN-1-09sgobht9vrzc0kzh80lvejpvyp2suda5akgmbf96dvrbiszpi-al15wyhpzlr2cyd68291mup3q 192.168.137.224:2377
	To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

查看swarm集群状态: docker info
	
查看节点信息: docker node ls
		打印如下:
		root@master-node:/home/swarm# docker node ls
		ID                            HOSTNAME      STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
		quj6te9i5wbhnvlukhr8umyyw *   master-node   Ready     Active         Leader           20.10.7
查询worker要用的加入指令: docker swarm join-token worker		
		打印如下:
		docker swarm join --token SWMTKN-1-09sgobht9vrzc0kzh80lvejpvyp2suda5akgmbf96dvrbiszpi-al15wyhpzlr2cyd68291mup3q 192.168.137.224:2377	

2)worker配置及查看

复制代码
在worker-1和worker-2执行:
	docker swarm join --token SWMTKN-1-09sgobht9vrzc0kzh80lvejpvyp2suda5akgmbf96dvrbiszpi-al15wyhpzlr2cyd68291mup3q 192.168.137.224:2377
	打印如下:
	This node joined a swarm as a worker.
在manager查看节点信息: docker node ls
	打印如下:
	ID                            HOSTNAME      STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
	quj6te9i5wbhnvlukhr8umyyw *   master-node   Ready     Active         Leader           20.10.7
	v7n81ykd7mvine0ul6npshbyz     worker-1      Ready     Active                          20.10.7
	r9zl96awq209m8kjpffr9a1es     worker-2      Ready     Active  						  20.10.7

四、部署服务hello-world

注意:::hello-world镜像为输出一个打印,因此为执行即结束,但是集群具备故障恢复功能,因此就会一直在重复启动!!!

1)操作过程及指令

复制代码
# 1. 使用已有的 hello-world 镜像创建服务
docker service create --name test-service --replicas 3 hello-world

# 2. 查看服务状态
docker service ps test-service

# 3. 查看服务日志
docker service logs test-service

# 4. 扩展服务到更多副本
docker service scale test-service=6

# 5. 再次查看分布情况
docker service ps test-service

# 6. 清理测试服务
docker service rm test-service

2)日志如下

复制代码
root@master-node:~# docker service ps test-service
ID             NAME                 IMAGE                NODE          DESIRED STATE   CURRENT STATE             ERROR     PORTS
anemuatoff61   test-service.1       hello-world:latest   master-node   Ready           Ready 3 seconds ago                 
0iwhwbqm1w0i    \_ test-service.1   hello-world:latest   master-node   Shutdown        Complete 3 seconds ago              
ylqqghxb9jd6    \_ test-service.1   hello-world:latest   master-node   Shutdown        Complete 9 seconds ago              
5zh3m6cf1a6n    \_ test-service.1   hello-world:latest   master-node   Shutdown        Complete 15 seconds ago             
jvec6zvc3ti0    \_ test-service.1   hello-world:latest   master-node   Shutdown        Complete 21 seconds ago             
d5j4mbjai5ay   test-service.2       hello-world:latest   worker-2      Ready           Ready 2 seconds ago                 
v6d2one9fs0c    \_ test-service.2   hello-world:latest   worker-2      Shutdown        Complete 2 seconds ago              
hbj46wuzzva2    \_ test-service.2   hello-world:latest   worker-2      Shutdown        Complete 8 seconds ago              
uq0nq6z3wh0k    \_ test-service.2   hello-world:latest   worker-2      Shutdown        Complete 15 seconds ago             
mjfrz9ie9qks    \_ test-service.2   hello-world:latest   worker-2      Shutdown        Complete 21 seconds ago             
0bti65ql81y5   test-service.3       hello-world:latest   worker-1      Ready           Ready 2 seconds ago                 
fvr1thptrn5y    \_ test-service.3   hello-world:latest   worker-1      Shutdown        Complete 3 seconds ago              
za0b02yrakt3    \_ test-service.3   hello-world:latest   worker-1      Shutdown        Complete 9 seconds ago              
rmgiomcz54g1    \_ test-service.3   hello-world:latest   worker-1      Shutdown        Complete 15 seconds ago             
i4bh61dg0o4q    \_ test-service.3   hello-world:latest   worker-1      Shutdown        Complete 21 seconds ago             
root@master-node:~# docker service logs test-service	

3)成功的证据

复制代码
1. 服务调度正常
	3个副本分别运行在3个节点上:
	test-service.1 在 master-node
	test-service.2 在 worker-2
	test-service.3 在 worker-1
2. 负载均衡和高可用性
	每个任务都有多个历史记录(Shutdown → Complete → 新的 Ready),这表明 Swarm 在容器退出后自动重新调度,确保服务持续可用,这是 Swarm 自愈能力 的体现
3. 跨节点通信正常
	日志显示所有3个节点都成功执行了任务
	网络通信畅通,日志能够汇总到管理节点
4. 镜像分发正常
	所有节点都能访问并运行 hello-world 镜像;
	ARM64 架构兼容性良好;

4)🎯 集群功能验证完成

复制代码
功能	状态	验证结果
服务创建	✅	成功创建多副本服务
节点调度	✅	均匀分布在3个节点
负载均衡	✅	自动分配任务
高可用性	✅	故障后自动恢复
日志收集	✅	跨节点日志汇总
网络通信	✅	节点间通信正常
相关推荐
~光~~3 天前
【环境配置 安装 】RK3588+Ubuntu20.04+cmake3.22+opencv4.54
opencv·ubuntu·rk3588
听风吹雨yu4 天前
RK3588从数据集到训练到部署YoloV8
linux·yolo·开源·rk3588·rknn
ARM+FPGA+AI工业主板定制专家5 天前
基于ZYNQ FPGA+AI+ARM 的卷积神经网络加速器设计
人工智能·fpga开发·cnn·无人机·rk3588
Industio_触觉智能13 天前
瑞芯微RK35XX系列FFmpeg硬件编解码实测,详细性能对比!
ffmpeg·rk3588·rk3568·编解码·rk3562·rk3576
ARM+FPGA+AI工业主板定制专家20 天前
基于RK3576+FPGA的无人机飞控系统设计
linux·fpga开发·无人机·rk3588·rk3568
向成科技1 个月前
XC3588N工控主板助力电力巡检机器人
人工智能·rk3588·安卓·硬件·工控主板·主板
mucheni1 个月前
迅为RK3588开发板挂载Windows以及虚拟机Ubuntu测试
rk3588
林政硕(Cohen0415)2 个月前
RK3568 NPU RKNN(一):概念理清
rk3588·rknn·rknpu
chenchao_shenzhen2 个月前
RK3568嵌入式音视频硬件编解码4K 60帧 rkmpp FFmpeg7.1 音视频开发
ffmpeg·音视频·rk3588·音视频开发·嵌入式开发·瑞芯微rk3568·硬件编解码