使用 RKE 方式搭建 K8s 集群并部署 NebulaGraph

本文由社区用户 Albert 贡献，首发于 NebulaGraph 论坛，旨在提供多一种的部署方式使用 NebulaGraph。

在本文，我将会详细地记录下我用 K8s 部署分布式图数据库 NebulaGraph 的过程。下面是本次实践的内容规划：

一到十章节为 K8s 集群搭建过程；
十一到十五章节为参考 NebulaGraph 官方文档安装部署 NebulaGraph的过程；

本文所有实践是在本地虚拟机 3 台 CentOS 实例上完成

一、集群环境准备

本文所有集群都遵循以下的部署，仅供参考。

首先，集群规划 方面：

shell 复制代码

#把 xxx 替换为对应的主机名
192.168.222.141 node1
192.168.222.142 node2
192.168.222.143 node3

配置静态 IP

shell 复制代码

# vi /etc/sysconfig/network-scripts/ifcfg-enss3

IPADDR="192.168.222.XXX" # XXX 是自己规划的 IP
PREFIX="24"  # 掩码 4 个 255
GATEWAY="192.168.222.XXX" # 网关需要自己指定
DNS1="114.114.114.114" # DNS 也可以设置为其他，能用即可

配置主机名

shell 复制代码

hostnamectl set-hostname xxx
192.168.222.141 node1
192.168.222.142 node2
192.168.222.143 node3

配置 ip_forward 及过滤机制

shell 复制代码

# vim /etc/sysctl.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
modprobe br_netfilter
sysctl -p /etc/sysctl.conf

防火墙设置

shell 复制代码

# systemctl stop firewalld
# systemctl disable firewalld

SELinux 设置

shell 复制代码

#永久关闭，一定要重启操作系统后生效。
sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

主机 Swap 分区设置

shell 复制代码

sed -ri 's/.*swap.*/#&/' /etc/fstab

集群时间同步设置

shell 复制代码

# yum -y install ntpdate
# crontab -e
0 */1 * * *  ntpdate time1.aliyun.com

二、Docker 部署

⚠️ 所有主机均要按照下面方法进行配置。

配置 Docker YUM 源

shell 复制代码

# wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

安装 Docker CE

通过下面命令安装 Docker 社区版本：

shell 复制代码

#  yum install docker-ce-18.09.8-3.el7.x86_64 -y

⚠️ 这里需要指定 Docker 版本，否则后面 RKE 安装会报 Docker 版本不兼容问题。

启动 Docker 服务

shell 复制代码

# systemctl enable docker
# systemctl start docker

配置 Docker 容器镜像加速器

登录个人阿里云账号，在镜像加速中查看配置，将 xxxxx 替换为自己的 ID：

shell 复制代码

# vim /etc/docker/daemon.json
# cat /etc/docker/daemon.json
{
  "registry-mirrors": ["https://xxxxx.mirror.aliyuncs.com"]
}

三、Docker Compose 安装

通过以下方式安装 Docker Compose：

shell 复制代码

# curl -L "https://github.com/docker/compose/releases/download/1.28.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
# chmod +x /usr/local/bin/docker-compose
# ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
# docker-compose --version

四、添加 Rancher 用户

在使用 CentOS 时，不能使用 root 账号，因此我们要添加专用的账号进行 Docker 相关操作。同之前一样，下面操作需要在所有机器上执行：

shell 复制代码

# useradd rancher
# usermod -aG docker rancher
# echo 123 | passwd --stdin rancher

五、生成 SSH 证书用于部署集群

RKE 二进制文件安装主机上创建密钥，即为 control 主机，用于部署集群。

生成 SSH 证书

shell 复制代码

# ssh-keygen

复制证书到集群中所有主机

shell 复制代码

# 修改文件夹所属用户及所属组
chown -R rancher:rancher /home/rancher
# ssh-copy-id rancher@node1
# ssh-copy-id rancher@node2
# ssh-copy-id rancher@node3

验证 SSH 证书是否可用

本次实践主要在 node1 上部署 RKE 二进制文件，并在 RKE 二进制文件安装主机机测试连接其它集群主机，可通过 Docker 的命令 docker ps 来查看服务是否可用。

shell 复制代码

# ssh rancher@主机名
远程主机# docker ps

六、RKE 工具下载

依旧是在 node1 上部署 RKE 二进制文件。

shell 复制代码

# wget https://github.com/rancher/rke/releases/download/v1.3.7/rke_linux-amd64
# mv rke_linux-amd64 /usr/local/bin/rke
# chmod +x /usr/local/bin/rke
# rke --version
rke version v1.3.7

七、初始化 RKE 配置文件

shell 复制代码

# mkdir -p /app/rancher
# cd /app/rancher
# rke config --name cluster.yml
[+] Cluster Level SSH Private Key Path [~/.ssh/id_rsa]: 集群私钥路径
[+] Number of Hosts [1]: 3 集群中有 3 个节点
[+] SSH Address of host (1) [none]: 192.168.10.10 第一个节点 IP 地址
[+] SSH Port of host (1) [22]: 22 第一个节点 SSH 访问端口
[+] SSH Private Key Path of host (192.168.10.10) [none]: ~/.ssh/id_rsa 第一个节点私钥路径
[+] SSH User of host (192.168.10.10) [ubuntu]: rancher 远程用户名
[+] Is host (192.168.10.10) a Control Plane host (y/n)? [y]: y 是否为 K8s 集群控制节点
[+] Is host (192.168.10.10) a Worker host (y/n)? [n]: n 不是 worker 节点
[+] Is host (192.168.10.10) an etcd host (y/n)? [n]: n 不是 etcd 节点
[+] Override Hostname of host (192.168.10.10) [none]: 不覆盖现有主机名
[+] Internal IP of host (192.168.10.10) [none]: 主机局域网 IP 地址
[+] Docker socket path on host (192.168.10.10) [/var/run/docker.sock]: 主机上 docker.sock 路径
[+] SSH Address of host (2) [none]: 192.168.10.12 第二个节点
[+] SSH Port of host (2) [22]: 22 远程端口
[+] SSH Private Key Path of host (192.168.10.12) [none]: ~/.ssh/id_rsa 私钥路径
[+] SSH User of host (192.168.10.12) [ubuntu]: rancher 远程访问用户
[+] Is host (192.168.10.12) a Control Plane host (y/n)? [y]: n 不是控制节点
[+] Is host (192.168.10.12) a Worker host (y/n)? [n]: y 是 worker 节点
[+] Is host (192.168.10.12) an etcd host (y/n)? [n]: n 不是 etcd 节点
[+] Override Hostname of host (192.168.10.12) [none]: 不覆盖现有主机名
[+] Internal IP of host (192.168.10.12) [none]: 主机局域网 IP 地址
[+] Docker socket path on host (192.168.10.12) [/var/run/docker.sock]: 主机上 docker.sock 路径
[+] SSH Address of host (3) [none]: 192.168.10.14 第三个节点
[+] SSH Port of host (3) [22]: 22 远程端口
[+] SSH Private Key Path of host (192.168.10.14) [none]: ~/.ssh/id_rsa 私钥路径
[+] SSH User of host (192.168.10.14) [ubuntu]: rancher 远程访问用户
[+] Is host (192.168.10.14) a Control Plane host (y/n)? [y]: n 不是控制节点
[+] Is host (192.168.10.14) a Worker host (y/n)? [n]: n 不是 worker 节点
[+] Is host (192.168.10.14) an etcd host (y/n)? [n]: y 是 etcd 节点
[+] Override Hostname of host (192.168.10.14) [none]: 不覆盖现有主机名
[+] Internal IP of host (192.168.10.14) [none]: 主机局域网 IP 地址
[+] Docker socket path on host (192.168.10.14) [/var/run/docker.sock]: 主机上 docker.sock 路径
[+] Network Plugin Type (flannel, calico, weave, canal, aci) [canal]: 使用的网络插件
[+] Authentication Strategy [x509]: 认证策略
[+] Authorization Mode (rbac, none) [rbac]: 认证模式
[+] Kubernetes Docker image [rancher/hyperkube:v1.21.9-rancher1]: 集群容器镜像
[+] Cluster domain [cluster.local]: 集群域名
[+] Service Cluster IP Range [10.43.0.0/16]: 集群中 Servic IP 地址范围
[+] Enable PodSecurityPolicy [n]: 是否开启 Pod 安装策略
[+] Cluster Network CIDR [10.42.0.0/16]: 集群 Pod 网络
[+] Cluster DNS Service IP [10.43.0.10]: 集群 DNS Service IP 地址
[+] Add addon manifest URLs or YAML files [no]: 是否增加插件 manifest URL 或配置文件

在部署过程中，需要注意 37 行版本的设置，这里设置为：rancher/hyperkube:v1.21.9-rancher1

shell 复制代码

[root@node1 rancher]# ls
cluster.yml

此外，在 cluster.yaml 文件中修改以下配置：

shell 复制代码

kube-controller:
image: ""
extra_args:
# 如果后面需要部署 kubeflow 或 istio 则一定要配置以下参数
cluster-signing-cert-file: "/etc/kubernetes/ssl/kube-ca.pem"
cluster-signing-key-file: "/etc/kubernetes/ssl/kube-ca-key.pem"

八、集群部署

shell 复制代码

# pwd
/app/rancher
# rke up
输出：
INFO[0000] Running RKE version: v1.3.7
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [192.168.10.14]
INFO[0000] [dialer] Setup tunnel for host [192.168.10.10]
INFO[0000] [dialer] Setup tunnel for host [192.168.10.12]
INFO[0000] Checking if container [cluster-state-deployer] is running on host [192.168.10.14], try #1
INFO[0000] Checking if container [cluster-state-deployer] is running on host [192.168.10.10], try #1
INFO[0000] Checking if container [cluster-state-deployer] is running on host [192.168.10.12], try #1
INFO[0000] [certificates] Generating CA kubernetes certificates
INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating Kubernetes API server certificates
INFO[0000] [certificates] Generating Service account token key
INFO[0000] [certificates] Generating Kube Controller certificates
INFO[0000] [certificates] Generating Kube Scheduler certificates
INFO[0000] [certificates] Generating Kube Proxy certificates
INFO[0001] [certificates] Generating Node certificate
INFO[0001] [certificates] Generating admin certificates and kubeconfig
INFO[0001] [certificates] Generating Kubernetes API server proxy client certificates
INFO[0001] [certificates] Generating kube-etcd-192-168-10-14 certificate and key
INFO[0001] Successfully Deployed state file at [./cluster.rkestate]
INFO[0001] Building Kubernetes cluster
INFO[0001] [dialer] Setup tunnel for host [192.168.10.12]
INFO[0001] [dialer] Setup tunnel for host [192.168.10.14]
INFO[0001] [dialer] Setup tunnel for host [192.168.10.10]
INFO[0001] [network] Deploying port listener containers
INFO[0001] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0001] Starting container [rke-etcd-port-listener] on host [192.168.10.14], try #1
INFO[0001] [network] Successfully started [rke-etcd-port-listener] container on host [192.168.10.14]
INFO[0001] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0001] Starting container [rke-cp-port-listener] on host [192.168.10.10], try #1
INFO[0002] [network] Successfully started [rke-cp-port-listener] container on host [192.168.10.10]
INFO[0002] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0002] Starting container [rke-worker-port-listener] on host [192.168.10.12], try #1
INFO[0002] [network] Successfully started [rke-worker-port-listener] container on host [192.168.10.12]
INFO[0002] [network] Port listener containers deployed successfully
INFO[0002] [network] Running control plane -> etcd port checks
INFO[0002] [network] Checking if host [192.168.10.10] can connect to host(s) [192.168.10.14] on port(s) [2379], try #1
INFO[0002] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0002] Starting container [rke-port-checker] on host [192.168.10.10], try #1
INFO[0002] [network] Successfully started [rke-port-checker] container on host [192.168.10.10]
INFO[0002] Removing container [rke-port-checker] on host [192.168.10.10], try #1
INFO[0002] [network] Running control plane -> worker port checks
INFO[0002] [network] Checking if host [192.168.10.10] can connect to host(s) [192.168.10.12] on port(s) [10250], try #1
INFO[0002] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0003] Starting container [rke-port-checker] on host [192.168.10.10], try #1
INFO[0003] [network] Successfully started [rke-port-checker] container on host [192.168.10.10]
INFO[0003] Removing container [rke-port-checker] on host [192.168.10.10], try #1
INFO[0003] [network] Running workers -> control plane port checks
INFO[0003] [network] Checking if host [192.168.10.12] can connect to host(s) [192.168.10.10] on port(s) [6443], try #1
INFO[0003] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0003] Starting container [rke-port-checker] on host [192.168.10.12], try #1
INFO[0003] [network] Successfully started [rke-port-checker] container on host [192.168.10.12]
INFO[0003] Removing container [rke-port-checker] on host [192.168.10.12], try #1
INFO[0003] [network] Checking KubeAPI port Control Plane hosts
INFO[0003] [network] Removing port listener containers
INFO[0003] Removing container [rke-etcd-port-listener] on host [192.168.10.14], try #1
INFO[0003] [remove/rke-etcd-port-listener] Successfully removed container on host [192.168.10.14]
INFO[0003] Removing container [rke-cp-port-listener] on host [192.168.10.10], try #1
INFO[0003] [remove/rke-cp-port-listener] Successfully removed container on host [192.168.10.10]
INFO[0003] Removing container [rke-worker-port-listener] on host [192.168.10.12], try #1
INFO[0003] [remove/rke-worker-port-listener] Successfully removed container on host [192.168.10.12]
INFO[0003] [network] Port listener containers removed successfully
INFO[0003] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0003] Checking if container [cert-deployer] is running on host [192.168.10.14], try #1
INFO[0003] Checking if container [cert-deployer] is running on host [192.168.10.10], try #1
INFO[0003] Checking if container [cert-deployer] is running on host [192.168.10.12], try #1
INFO[0003] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0003] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0003] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0004] Starting container [cert-deployer] on host [192.168.10.14], try #1
INFO[0004] Starting container [cert-deployer] on host [192.168.10.12], try #1
INFO[0004] Starting container [cert-deployer] on host [192.168.10.10], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.10.14], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.10.12], try #1
INFO[0004] Checking if container [cert-deployer] is running on host [192.168.10.10], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.10.14], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.10.14], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.10.12], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.10.12], try #1
INFO[0009] Checking if container [cert-deployer] is running on host [192.168.10.10], try #1
INFO[0009] Removing container [cert-deployer] on host [192.168.10.10], try #1
INFO[0009] [reconcile] Rebuilding and updating local kube config
INFO[0009] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
WARN[0009] [reconcile] host [192.168.10.10] is a control plane node without reachable Kubernetes API endpoint in the cluster
WARN[0009] [reconcile] no control plane node with reachable Kubernetes API endpoint in the cluster found
INFO[0009] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
INFO[0009] [file-deploy] Deploying file [/etc/kubernetes/audit-policy.yaml] to node [192.168.10.10]
INFO[0009] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0009] Starting container [file-deployer] on host [192.168.10.10], try #1
INFO[0009] Successfully started [file-deployer] container on host [192.168.10.10]
INFO[0009] Waiting for [file-deployer] container to exit on host [192.168.10.10]
INFO[0009] Waiting for [file-deployer] container to exit on host [192.168.10.10]
INFO[0009] Container [file-deployer] is still running on host [192.168.10.10]: stderr: [], stdout: []
INFO[0010] Waiting for [file-deployer] container to exit on host [192.168.10.10]
INFO[0010] Removing container [file-deployer] on host [192.168.10.10], try #1
INFO[0010] [remove/file-deployer] Successfully removed container on host [192.168.10.10]
INFO[0010] [/etc/kubernetes/audit-policy.yaml] Successfully deployed audit policy file to Cluster control nodes
INFO[0010] [reconcile] Reconciling cluster state
INFO[0010] [reconcile] This is newly generated cluster
INFO[0010] Pre-pulling kubernetes images
INFO[0010] Pulling image [rancher/hyperkube:v1.21.9-rancher1] on host [192.168.10.10], try #1
INFO[0010] Pulling image [rancher/hyperkube:v1.21.9-rancher1] on host [192.168.10.14], try #1
INFO[0010] Pulling image [rancher/hyperkube:v1.21.9-rancher1] on host [192.168.10.12], try #1
INFO[0087] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0090] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.12]
INFO[0092] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.14]
INFO[0092] Kubernetes images pulled successfully
INFO[0092] [etcd] Building up etcd plane..
INFO[0092] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0092] Starting container [etcd-fix-perm] on host [192.168.10.14], try #1
INFO[0092] Successfully started [etcd-fix-perm] container on host [192.168.10.14]
INFO[0092] Waiting for [etcd-fix-perm] container to exit on host [192.168.10.14]
INFO[0092] Waiting for [etcd-fix-perm] container to exit on host [192.168.10.14]
INFO[0092] Container [etcd-fix-perm] is still running on host [192.168.10.14]: stderr: [], stdout: []
INFO[0093] Waiting for [etcd-fix-perm] container to exit on host [192.168.10.14]
INFO[0093] Removing container [etcd-fix-perm] on host [192.168.10.14], try #1
INFO[0093] [remove/etcd-fix-perm] Successfully removed container on host [192.168.10.14]
INFO[0093] Image [rancher/mirrored-coreos-etcd:v3.5.0] exists on host [192.168.10.14]
INFO[0093] Starting container [etcd] on host [192.168.10.14], try #1
INFO[0093] [etcd] Successfully started [etcd] container on host [192.168.10.14]
INFO[0093] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.10.14]
INFO[0093] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0094] Starting container [etcd-rolling-snapshots] on host [192.168.10.14], try #1
INFO[0094] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.10.14]
INFO[0099] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0099] Starting container [rke-bundle-cert] on host [192.168.10.14], try #1
INFO[0099] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.10.14]
INFO[0099] Waiting for [rke-bundle-cert] container to exit on host [192.168.10.14]
INFO[0099] Container [rke-bundle-cert] is still running on host [192.168.10.14]: stderr: [], stdout: []
INFO[0100] Waiting for [rke-bundle-cert] container to exit on host [192.168.10.14]
INFO[0100] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.10.14]
INFO[0100] Removing container [rke-bundle-cert] on host [192.168.10.14], try #1
INFO[0100] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0100] Starting container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0100] [etcd] Successfully started [rke-log-linker] container on host [192.168.10.14]
INFO[0100] Removing container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0100] [remove/rke-log-linker] Successfully removed container on host [192.168.10.14]
INFO[0100] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0101] Starting container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0101] [etcd] Successfully started [rke-log-linker] container on host [192.168.10.14]
INFO[0101] Removing container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0101] [remove/rke-log-linker] Successfully removed container on host [192.168.10.14]
INFO[0101] [etcd] Successfully started etcd plane.. Checking etcd cluster health
INFO[0101] [etcd] etcd host [192.168.10.14] reported healthy=true
INFO[0101] [controlplane] Building up Controller Plane..
INFO[0101] Checking if container [service-sidekick] is running on host [192.168.10.10], try #1
INFO[0101] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0101] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0101] Starting container [kube-apiserver] on host [192.168.10.10], try #1
INFO[0101] [controlplane] Successfully started [kube-apiserver] container on host [192.168.10.10]
INFO[0101] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.10.10]
INFO[0106] [healthcheck] service [kube-apiserver] on host [192.168.10.10] is healthy
INFO[0106] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0107] Starting container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0107] [controlplane] Successfully started [rke-log-linker] container on host [192.168.10.10]
INFO[0107] Removing container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0107] [remove/rke-log-linker] Successfully removed container on host [192.168.10.10]
INFO[0107] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0107] Starting container [kube-controller-manager] on host [192.168.10.10], try #1
INFO[0107] [controlplane] Successfully started [kube-controller-manager] container on host [192.168.10.10]
INFO[0107] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.10.10]
INFO[0112] [healthcheck] service [kube-controller-manager] on host [192.168.10.10] is healthy
INFO[0112] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0113] Starting container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0113] [controlplane] Successfully started [rke-log-linker] container on host [192.168.10.10]
INFO[0113] Removing container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0113] [remove/rke-log-linker] Successfully removed container on host [192.168.10.10]
INFO[0113] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0113] Starting container [kube-scheduler] on host [192.168.10.10], try #1
INFO[0113] [controlplane] Successfully started [kube-scheduler] container on host [192.168.10.10]
INFO[0113] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.10.10]
INFO[0118] [healthcheck] service [kube-scheduler] on host [192.168.10.10] is healthy
INFO[0118] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0119] Starting container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0119] [controlplane] Successfully started [rke-log-linker] container on host [192.168.10.10]
INFO[0119] Removing container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0119] [remove/rke-log-linker] Successfully removed container on host [192.168.10.10]
INFO[0119] [controlplane] Successfully started Controller Plane..
INFO[0119] [authz] Creating rke-job-deployer ServiceAccount
INFO[0119] [authz] rke-job-deployer ServiceAccount created successfully
INFO[0119] [authz] Creating system:node ClusterRoleBinding
INFO[0119] [authz] system:node ClusterRoleBinding created successfully
INFO[0119] [authz] Creating kube-apiserver proxy ClusterRole and ClusterRoleBinding
INFO[0119] [authz] kube-apiserver proxy ClusterRole and ClusterRoleBinding created successfully
INFO[0119] Successfully Deployed state file at [./cluster.rkestate]
INFO[0119] [state] Saving full cluster state to Kubernetes
INFO[0119] [state] Successfully Saved full cluster state to Kubernetes ConfigMap: full-cluster-state
INFO[0119] [worker] Building up Worker Plane..
INFO[0119] Checking if container [service-sidekick] is running on host [192.168.10.10], try #1
INFO[0119] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0119] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0119] [sidekick] Sidekick container already created on host [192.168.10.10]
INFO[0119] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0119] Starting container [kubelet] on host [192.168.10.10], try #1
INFO[0119] [worker] Successfully started [kubelet] container on host [192.168.10.10]
INFO[0119] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.10.10]
INFO[0119] Starting container [nginx-proxy] on host [192.168.10.14], try #1
INFO[0119] [worker] Successfully started [nginx-proxy] container on host [192.168.10.14]
INFO[0119] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0119] Starting container [nginx-proxy] on host [192.168.10.12], try #1
INFO[0119] [worker] Successfully started [nginx-proxy] container on host [192.168.10.12]
INFO[0119] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0119] Starting container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0120] Starting container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0120] [worker] Successfully started [rke-log-linker] container on host [192.168.10.14]
INFO[0120] Removing container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0120] [remove/rke-log-linker] Successfully removed container on host [192.168.10.14]
INFO[0120] Checking if container [service-sidekick] is running on host [192.168.10.14], try #1
INFO[0120] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0120] [worker] Successfully started [rke-log-linker] container on host [192.168.10.12]
INFO[0120] Removing container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0120] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.14]
INFO[0120] [remove/rke-log-linker] Successfully removed container on host [192.168.10.12]
INFO[0120] Checking if container [service-sidekick] is running on host [192.168.10.12], try #1
INFO[0120] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0120] Starting container [kubelet] on host [192.168.10.14], try #1
INFO[0120] [worker] Successfully started [kubelet] container on host [192.168.10.14]
INFO[0120] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.10.14]
INFO[0120] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.12]
INFO[0120] Starting container [kubelet] on host [192.168.10.12], try #1
INFO[0120] [worker] Successfully started [kubelet] container on host [192.168.10.12]
INFO[0120] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.10.12]
INFO[0124] [healthcheck] service [kubelet] on host [192.168.10.10] is healthy
INFO[0124] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0124] Starting container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0125] [worker] Successfully started [rke-log-linker] container on host [192.168.10.10]
INFO[0125] Removing container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0125] [remove/rke-log-linker] Successfully removed container on host [192.168.10.10]
INFO[0125] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.10]
INFO[0125] Starting container [kube-proxy] on host [192.168.10.10], try #1
INFO[0125] [worker] Successfully started [kube-proxy] container on host [192.168.10.10]
INFO[0125] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.10.10]
INFO[0125] [healthcheck] service [kubelet] on host [192.168.10.14] is healthy
INFO[0125] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0125] Starting container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0125] [healthcheck] service [kubelet] on host [192.168.10.12] is healthy
INFO[0125] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0125] [worker] Successfully started [rke-log-linker] container on host [192.168.10.14]
INFO[0125] Starting container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0125] Removing container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0126] [remove/rke-log-linker] Successfully removed container on host [192.168.10.14]
INFO[0126] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.14]
INFO[0126] Starting container [kube-proxy] on host [192.168.10.14], try #1
INFO[0126] [worker] Successfully started [rke-log-linker] container on host [192.168.10.12]
INFO[0126] Removing container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0126] [worker] Successfully started [kube-proxy] container on host [192.168.10.14]
INFO[0126] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.10.14]
INFO[0126] [remove/rke-log-linker] Successfully removed container on host [192.168.10.12]
INFO[0126] Image [rancher/hyperkube:v1.21.9-rancher1] exists on host [192.168.10.12]
INFO[0126] Starting container [kube-proxy] on host [192.168.10.12], try #1
INFO[0126] [worker] Successfully started [kube-proxy] container on host [192.168.10.12]
INFO[0126] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.10.12]
INFO[0130] [healthcheck] service [kube-proxy] on host [192.168.10.10] is healthy
INFO[0130] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0130] Starting container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0130] [worker] Successfully started [rke-log-linker] container on host [192.168.10.10]
INFO[0130] Removing container [rke-log-linker] on host [192.168.10.10], try #1
INFO[0130] [remove/rke-log-linker] Successfully removed container on host [192.168.10.10]
INFO[0131] [healthcheck] service [kube-proxy] on host [192.168.10.14] is healthy
INFO[0131] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0131] Starting container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0131] [healthcheck] service [kube-proxy] on host [192.168.10.12] is healthy
INFO[0131] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0131] [worker] Successfully started [rke-log-linker] container on host [192.168.10.14]
INFO[0131] Removing container [rke-log-linker] on host [192.168.10.14], try #1
INFO[0131] Starting container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0131] [remove/rke-log-linker] Successfully removed container on host [192.168.10.14]
INFO[0131] [worker] Successfully started [rke-log-linker] container on host [192.168.10.12]
INFO[0131] Removing container [rke-log-linker] on host [192.168.10.12], try #1
INFO[0131] [remove/rke-log-linker] Successfully removed container on host [192.168.10.12]
INFO[0131] [worker] Successfully started Worker Plane..
INFO[0131] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.12]
INFO[0131] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.14]
INFO[0131] Image [rancher/rke-tools:v0.1.78] exists on host [192.168.10.10]
INFO[0132] Starting container [rke-log-cleaner] on host [192.168.10.14], try #1
INFO[0132] Starting container [rke-log-cleaner] on host [192.168.10.12], try #1
INFO[0132] Starting container [rke-log-cleaner] on host [192.168.10.10], try #1
INFO[0132] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.10.14]
INFO[0132] Removing container [rke-log-cleaner] on host [192.168.10.14], try #1
INFO[0132] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.10.12]
INFO[0132] Removing container [rke-log-cleaner] on host [192.168.10.12], try #1
INFO[0132] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.10.10]
INFO[0132] Removing container [rke-log-cleaner] on host [192.168.10.10], try #1
INFO[0132] [remove/rke-log-cleaner] Successfully removed container on host [192.168.10.14]
INFO[0132] [remove/rke-log-cleaner] Successfully removed container on host [192.168.10.12]
INFO[0132] [remove/rke-log-cleaner] Successfully removed container on host [192.168.10.10]
INFO[0132] [sync] Syncing nodes Labels and Taints
INFO[0132] [sync] Successfully synced nodes Labels and Taints
INFO[0132] [network] Setting up network plugin: canal
INFO[0132] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0132] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0132] [addons] Executing deploy job rke-network-plugin
INFO[0137] [addons] Setting up coredns
INFO[0137] [addons] Saving ConfigMap for addon rke-coredns-addon to Kubernetes
INFO[0137] [addons] Successfully saved ConfigMap for addon rke-coredns-addon to Kubernetes
INFO[0137] [addons] Executing deploy job rke-coredns-addon
INFO[0142] [addons] CoreDNS deployed successfully
INFO[0142] [dns] DNS provider coredns deployed successfully
INFO[0142] [addons] Setting up Metrics Server
INFO[0142] [addons] Saving ConfigMap for addon rke-metrics-addon to Kubernetes
INFO[0142] [addons] Successfully saved ConfigMap for addon rke-metrics-addon to Kubernetes
INFO[0142] [addons] Executing deploy job rke-metrics-addon
INFO[0147] [addons] Metrics Server deployed successfully
INFO[0147] [ingress] Setting up nginx ingress controller
INFO[0147] [ingress] removing admission batch jobs if they exist
INFO[0147] [addons] Saving ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[0147] [addons] Successfully saved ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[0147] [addons] Executing deploy job rke-ingress-controller
INFO[0152] [ingress] removing default backend service and deployment if they exist
INFO[0152] [ingress] ingress controller nginx deployed successfully
INFO[0152] [addons] Setting up user addons
INFO[0152] [addons] no user addons defined
INFO[0152] Finished building Kubernetes cluster successfully

假如你遇到部署失败，报错：

shell 复制代码

WARN[0114] [etcd] host [xxxxxxx] failed to check etcd health: failed to get /health for host [xxxxxxx]: Get "https://xxxxxxxx:2379/health": remote error: tls: bad certificate
FATA[0114] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [xxxxxxxx] failed to report healthy. Check etcd container logs on each host for more information

可以对所有节点执行（无法删除就重启机器）：rm -rf /etc/kubernetes/ /var/lib/kubelet/ /var/lib/etcd/。

九、安装 kubectl 客户端

还是在 node1（控制节点）主机上操作。

kubectl 客户端安装

shell 复制代码

# wget https://storage.googleapis.com/kubernetes-release/release/v1.21.9/bin/linux/amd64/kubectl
# chmod +x kubectl 
# mv kubectl /usr/local/bin/kubectl
# kubectl version --client
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"f59f5c2fda36e4036b49ec027e556a15456108f0", GitTreeState:"clean", BuildDate:"2022-01-19T17:33:06Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

kubectl 客户端配置集群管理文件及应用验证

shell 复制代码

[root@node1 ~]# ls /app/rancher/
cluster.rkestate  cluster.yml  kube_config_cluster.yml
[root@node1 ~]# mkdir ./.kube
[root@node1 ~]# cp /app/rancher/kube_config_cluster.yml /root/.kube/config
[root@node1 ~]# kubectl get nodes
NAME            STATUS   ROLES          AGE     VERSION
192.168.10.10   Ready    controlplane   9m13s   v1.21.9
192.168.10.12   Ready    worker         9m12s   v1.21.9
192.168.10.14   Ready    etcd           9m12s   v1.21.9
[root@node1 ~]# kubectl get pods -n kube-system
NAME                                         READY   STATUS      RESTARTS   AGE
calico-kube-controllers-5685fbd9f7-gcwj7     1/1     Running     0          9m36s
canal-fz2bg                                  2/2     Running     0          9m36s
canal-qzw4n                                  2/2     Running     0          9m36s
canal-sstjn                                  2/2     Running     0          9m36s
coredns-8578b6dbdd-ftnf6                     1/1     Running     0          9m30s
coredns-autoscaler-f7b68ccb7-fzdgc           1/1     Running     0          9m30s
metrics-server-6bc7854fb5-kwppz              1/1     Running     0          9m25s
rke-coredns-addon-deploy-job--1-x56w2        0/1     Completed   0          9m31s
rke-ingress-controller-deploy-job--1-wzp2b   0/1     Completed   0          9m21s
rke-metrics-addon-deploy-job--1-ltlgn        0/1     Completed   0          9m26s
rke-network-plugin-deploy-job--1-nsbfn       0/1     Completed   0          9m41s

十、集群 Web 管理 Rancher

Rancher 控制面板主要方便用于控制 K8s 集群，查看集群状态，编辑集群等操作。

依旧在（控制节点）node1 运行以下命令：

使用 docker run 启动一个 rancher

shell 复制代码

# 注意映射端口改了
[root@node1 ~]# docker run -d --restart=unless-stopped --privileged --name rancher -p 1080:80 -p 1443:443 rancher/rancher:v2.5.9
[root@node1 ~]# docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED          STATUS          PORTS                                                                      NAMES
0fd46ee77655   rancher/rancher:v2.5.9               "entrypoint.sh"          5 seconds ago    Up 3 seconds    0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   rancher

访问 Rancher【这里映射端口 1080，1443】

shell 复制代码

[root@node1 ~]# ss -anput | grep ":1080"
tcp    LISTEN     0      128       *:1080                    *:*                   users:(("docker-proxy",pid=29564,fd=4))
tcp    LISTEN     0      128    [::]:1080                 [::]:*                   users:(("docker-proxy",pid=29570,fd=4))

以上便是 RKE 方式搭建 K8s 集群。

下面来讲讲，如何搞定 NebulaGraph 部署。

十一、搭建 NFS 服务器

shell 复制代码

[root@nfsserver ~]# mkdir -p /data/nfs
[root@nfsserver ~]# vim /etc/exports
/data/nfs       *(rw,no_root_squash,sync)

在所有节点中安装 NFS 客户端：

shell 复制代码

yum install nfs-utils -y

下面在工作节点中验证是否服务可用：

shell 复制代码

showmount -e 192.168.222.143 #NFS 服务器的地址

十二、使用 NFS 文件系统创建存储动态供给（Storage Class 安装）

PV 对存储系统的支持可通过其插件来实现，目前，Kubernetes 支持如下类型的插件。

官方地址：kubernetes.io/docs/concep...；

但是，官方插件是不支持 NFS 动态供给的，我们可以用第三方的插件来实现，第三方插件地址：github.com/kubernetes-...。

下载并创建 Storage Class

shell 复制代码

[root@k8s-master1 ~]# wget https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/class.yaml

[root@k8s-master1 ~]# mv class.yaml storageclass-nfs.yml
[root@k8s-master1 ~]# cat storageclass-nfs.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass                                # 类型
metadata:
  name: nfs-client                # 名称，要使用就需要调用此名称
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner         # 动态供给插件
parameters:
  archiveOnDelete: "false"                # 删除数据时是否存档，false 表示不存档，true 表示存档
[root@k8s-master1 ~]# kubectl apply -f storageclass-nfs.yml
storageclass.storage.k8s.io/managed-nfs-storage created
[root@k8s-master1 ~]# kubectl get storageclass
NAME         PROVISIONER                                   RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-client   k8s-sigs.io/nfs-subdir-external-provisioner   Delete          Immediate           false                  10s

# RECLAIMPOLICY PV 回收策略，Pod 或 PVC 被删除后，PV 是否删除还是保留。
# VOLUMEBINDINGMODE Immediate 模式下 PVC 与 PV 立即绑定，主要是不等待相关 Pod 调度完成，不关心其运行节点，直接完成绑定。相反的 WaitForFirstConsumer 模式下需要等待 Pod 调度完成后进行 PV 绑定。
# ALLOWVOLUMEEXPANSION PVC 扩容

下载并创建 RBAC

因为 Storage 自动创建 PV 需要经过 kube-apiserver，所以需要授权。

yaml 复制代码

[root@k8s-master1 ~]# wget https://raw.githubusercontent.com/kubernetes-sigs/nfs-subdir-external-provisioner/master/deploy/rbac.yaml

[root@k8s-master1 ~]# mv rbac.yaml storageclass-nfs-rbac.yaml
[root@k8s-master1 ~]# cat storageclass-nfs-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io
[root@k8s-master1 ~]# kubectl apply -f rbac.yaml
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created

创建动态供给的 deployment

这里需要一个 deployment 来专门实现 PV 与 PVC 的自动创建：

yaml 复制代码

[root@k8s-master1 ~]# vim deploy-nfs-client-provisioner.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccount: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: registry.cn-beijing.aliyuncs.com/pylixm/nfs-subdir-external-provisioner:v4.0.0
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: k8s-sigs.io/nfs-subdir-external-provisioner
            - name: NFS_SERVER
              value: 192.168.10.129
            - name: NFS_PATH
              value: /data/nfs
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.10.129
            path: /data/nfs
[root@k8s-master1 ~]# kubectl apply -f deploy-nfs-client-provisioner.yml
deployment.apps/nfs-client-provisioner created
[root@k8s-master1 ~]# kubectl get pods |grep nfs-client-provisioner
nfs-client-provisioner-5b5ddcd6c8-b6zbq   1/1     Running   0          34s

十三、NebulaGraph Operator 安装部署

这里参考官方文档来安装 NebulaGraph Operator 前，用户需要安装以下软件并确保安装版本的正确性（NebulaGraph Operator 不负责处理安装这些软件过程中出现的问题）。

软件	版本要求
Kubernetes	>= 1.16
Helm	>= 3.2.0
CoreDNS	>= 1.6.0

这里需要安装下 Helm：

shell 复制代码

[root@a ~]# curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 11345  100 11345    0     0  18830      0 --:--:-- --:--:-- --:--:-- 18845
[WARNING] Could not find git. It is required for plugin installation.
Downloading https://get.helm.sh/helm-v3.12.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm

而 CoreDNS 的安装，可以网上查阅资料，其实上面 RKE 安装过程中已经安装了 CoreDNS。再来一遍安装 NebulaGraph Operator 的具体流程：

第一步，添加 NebulaGraph Operator Helm 仓库。

shell 复制代码

helm repo add nebula-operator https://vesoft-inc.github.io/nebula-operator/charts

第二步，拉取最新的 Operator Helm 仓库。

shell 复制代码

helm repo update

参考 Helm 仓库以获取更多 helm repo 相关信息。

第三步，创建命名空间用于安装 NebulaGraph Operator。

shell 复制代码

kubectl create namespace <namespace_name>

例如，创建 operator 命名空间。

shell 复制代码

kubectl create namespace operator

nebula-operator Chart 中的所有资源都会安装在该命名空间下。
用户也可自行创建其他命名空间。

第四步，安装 NebulaGraph Operator。

shell 复制代码

helm install nebula-operator nebula-operator/nebula-operator --namespace=<namespace_name> --version=${chart_version}

目前 1.5.0 无法安装，可以不指定版本，默认安装最新版本。由于 gcr.io 无法下载镜像使用 kubesphere/kube-rbac-proxy:v0.8.0 代替。

shell 复制代码

helm install nebula-operator nebula-operator/nebula-operator --namespace=nebula-operator-system --set image.kubeRBACProxy.image=kubesphere/kube-rbac-proxy:v0.8.0

卸载 NebulaGraph Operator（如果需要卸载重新安装）

卸载 NebulaGraph Operator chart

shell 复制代码

helm uninstall nebula-operator --namespace=operator

删除 CRD

shell 复制代码

kubectl delete crd nebulaclusters.apps.nebula-graph.io

十四、部署 NebulaGraph（kubectl）

创建集群配置文件：创建名为 apps_v1alpha1_nebulacluster.yaml 的文件。官网提供了模板可以直接下载：

yaml 复制代码

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:
  graphd:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-graphd
    version: v3.5.0
    service:
      type: NodePort
      externalTrafficPolicy: Local
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: managed-nfs-storage
 metad:
  #    license:
   #      secretName: "nebula-license"
#      licenseKey: "nebula.license"
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-metad
    version: v3.5.0
    dataVolumeClaim:
      resources:
        requests:
          storage: 5Gi
      storageClassName:  managed-nfs-storage
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName:  managed-nfs-storage
  storaged:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"

注意：所有的 storageClassName 需要和上面安装 storageclass 时名称相同。即，storageclass-nfs.yml 中的 metadata.name: nfs-client。

来查看下服务状态：

shell 复制代码

nebula-exporter-66457984-w6zpn            1/1     Running   0          31m
nebula-graphd-0                           2/2     Running   0          31m
nebula-metad-0                            2/2     Running   0          35m
nebula-storaged-0                         2/2     Running   0          35m
nebula-storaged-1                         0/2     Pending   0          35m
nebula-storaged-2                         0/2     Pending   0          35m
nfs-client-provisioner-6bd7f48698-m4v94   1/1     Running   0          57m

十五、连接 NebulaGraph

按照官网提供的模板修改：

yaml 复制代码

apiVersion: v1
kind: Service
metadata:
  labels:
    # modify the cluster name
    app.kubernetes.io/cluster: "nebula"
    app.kubernetes.io/component: graphd
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
  name: nebula-graphd-nodeport-svc
  namespace: default
spec:
  externalTrafficPolicy: Cluster #改为local，无法连接到数据库，这里改为Cluster
  ports:
  - name: thrift
    port: 9669
    protocol: TCP
    targetPort: 9669
  - name: http
    port: 19669
    protocol: TCP
    targetPort: 19669
  selector:
    # modify the cluster name
    app.kubernetes.io/cluster: "nebula"
    app.kubernetes.io/component: graphd
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
  type: NodePort

执行以下命令使 Service 服务在集群中生效。

shell 复制代码

kubectl create -f graphd-nodeport-service.yaml

查看 Service 中 NebulaGraph 映射至集群节点的端口。

shell 复制代码

kubectl get services -l app.kubernetes.io/cluster=<nebula>  #<nebula>为变量值，请用实际集群名称替换。

shell 复制代码

NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                          AGE
nebula-graphd-svc-nodeport     NodePort    10.107.153.129 <none>        9669:32236/TCP,19669:31674/TCP,19670:31057/TCP   24h
...

NodePort 类型的 Service 中，映射至集群节点的端口为 32236。

使用节点 IP 和上述映射的节点端口连接 NebulaGraph。

shell 复制代码

kubectl run -ti --image vesoft/nebula-console:v3.5.0 --restart=Never -- <nebula_console_name> -addr <node_ip> -port <node_port> -u <username> -p <password>

示例如下：

shell 复制代码

kubectl run -ti --image vesoft/nebula-console:v3.5.0 --restart=Never -- nebula-console -addr 192.168.8.24 -port 32236 -u root -p vesoft
If you don't see a command prompt, try pressing enter.

(root@nebula) [(none)]>

--image：为连接 NebulaGraph 的工具 NebulaGraph Console 的镜像。
<nebula-console>：自定义的 Pod 名称，本示例为 nebula-console。
-addr：NebulaGraph 集群中任一节点 IP 地址，本示例为 192.168.8.24。
-port：NebulaGraph 映射至节点的端口，本示例为 32236。
-u：NebulaGraph 账号的用户名。未启用身份认证时，可以使用任意已存在的用户名（默认为 root）。
-p：用户名对应的密码。未启用身份认证时，密码可以填写任意字符。以上为本次实践，有任何问题欢迎论坛留言交流。

参考资料：

谢谢你读完本文 (///▽///)

如果你想尝鲜图数据库 NebulaGraph，记得去 GitHub 下载、使用、(^з^)-☆ star 它 -> GitHub；和其他的 NebulaGraph 用户一起交流图数据库技术和应用技能，留下「你的名片」一起玩耍呀~