Spark on k8s部署

一、环境准备

1.安装Jdk1.8

(1)Jdk1.8下载地址:https://www.oracle.com/java/technologies/downloads/archive/

将压缩包解压到/opt/目录

shell 复制代码
tar zxf jdk-8u212-linux-x64.tar.gz -C /opt/

(2)配置环境变量

编辑配置文件,vi /etc/profile,添加以下内容

shell 复制代码
#jdk1.8.0_121
export JAVA_HOME=/opt/jdk1.8.0_212
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

使环境变量生效

shell 复制代码
source /etc/profile

(3)添加jdk安全认证

引入如下三个包

并在java.security文件中添加配置 security.provider.10=org.bouncycastle.jce.provider.BouncyCastleProvider

2、获取Spark安装包文件

(1)使用wget命令下载Spark v3.2.3安装包文件。

shell 复制代码
wget https://archive.apache.org/dist/spark/spark-3.2.3/spark-3.2.3-bin-hadoop3.2.tgz

(2)解压并重命名

shell 复制代码
tar -zxvf spark-3.2.3-bin-hadoop3.2.tgz -C /opt/module
mv spark-3.2.3-bin-hadoop3.2 spark-3.2.3

3、初始化K8s环境

(1)创建metaSphere Namespace

编写metaSphere-namespace.yaml

shell 复制代码
vi metaSphere-namespace.yaml
shell 复制代码
apiVersion: v1
kind: Namespace
metadata:
  name: metasphere
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere

提交yaml创建namespace

shell 复制代码
kubectl apply -f metaSphere-namespace.yaml

查看namespace

shell 复制代码
kubectl get ns
(2)创建ServiceAccount

编写spark-service-account.yaml

shell 复制代码
vi spark-service-account.yaml
shell 复制代码
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: metasphere
  name: spark-service-account
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere
    app.kubernetes.io/version: v3.2.3

提交yaml创建ServiceAccount

shell 复制代码
kubectl apply -f spark-service-account.yaml

查看ServiceAccount

shell 复制代码
kubectl get sa -n metasphere
(3)创建Role和RoleBinding

编写spark-role.yaml

shell 复制代码
vi spark-role.yaml
shell 复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere
    app.kubernetes.io/version: v3.2.3
  namespace: metasphere
  name: spark-role
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list", "create", "delete"]
  - apiGroups: ["extensions", "apps"]
    resources: ["deployments"]
    verbs: ["get", "watch", "list", "create", "delete"]
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "create", "update", "delete"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere
    app.kubernetes.io/version: v3.2.3
  name: spark-role-binding
  namespace: metasphere
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: spark-role
subjects:
  - kind: ServiceAccount
    name: spark-service-account
    namespace: metasphere

提交yaml创建Role和RoleBinding

shell 复制代码
kubectl apply -f spark-role.yaml

查看Role和RoleBinding

shell 复制代码
kubectl get role -n metasphere
kubectl get rolebinding -n metasphere
(4)创建ClusterRole和ClusterRoleBinding

编写cluster-role.yaml

shell 复制代码
vi cluster-role.yaml
shell 复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere
    app.kubernetes.io/version: v3.2.3
  name: apache-spark-clusterrole
rules:
  - apiGroups:
      - ''
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
      - namespaces
    verbs:
      - list
      - watch
      - get
  - apiGroups:
      - ''
    resources:
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ''
    resources:
      - events
    verbs:
      - create
      - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: metasphere
    app.kubernetes.io/instance: metasphere
    app.kubernetes.io/version: v3.2.3
  name: apache-spark-clusterrole-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: apache-spark-clusterrole
subjects:
  - kind: ServiceAccount
    name: spark-service-account
    namespace: metasphere

提交yaml创建ClusterRole和ClusterRoleBinding

shell 复制代码
kubectl apply -f cluster-role.yaml

查看ClusterRole和ClusterRoleBinding

shell 复制代码
kubectl get ClusterRole | grep spark

kubectl get ClusterRoleBinding | grep spark 

二、Spark On K8s基本测试

1、拉取apache spark镜像

到Docker Hub查找apache spark的镜像,并拉取到本地

shell 复制代码
docker pull apache/spark:v3.2.3

如果因为网络原因无法下载镜像,则使用以下镜像

shell 复制代码
docker pull registry.cn-hangzhou.aliyuncs.com/cm_ns01/apache-spark:v3.2.3

2、查看k8s master的url

获取Kubernetes control plane URL

shell 复制代码
kubectl cluster-info

3、提交Spark程序到K8s上运行

shell 复制代码
/opt/module/spark-3.2.3/bin/spark-submit \
 --name SparkPi \
 --verbose \
 --master k8s://https://localhost:6443 \
 --deploy-mode cluster \
 --conf spark.network.timeout=300 \
 --conf spark.executor.instances=3 \
 --conf spark.driver.cores=1 \
 --conf spark.executor.cores=1 \
 --conf spark.driver.memory=1024m \
 --conf spark.executor.memory=1024m \
 --conf spark.kubernetes.namespace=metasphere \
 --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
 --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/cm_ns01/apache-spark:v3.2.3 \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-service-account \
 --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-service-account \
 --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
 --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
 --class org.apache.spark.examples.SparkPi \
 local:///opt/spark/examples/jars/spark-examples_2.12-3.2.3.jar \
 3000

参数说明:

--master为Kubernetes control plane URL

--deploy-mode为cluster,则driver和executor都运行在K8s里

--conf spark.kubernetes.namespace为前面创建的命名空间metasphere

--conf spark.kubernetes.container.image为Spark的镜像地址

--conf spark.kubernetes.authenticate.executor.serviceAccountName为前面创建的spark-service-account

--class为Spark程序的启动类

local:///opt/spark/examples/jars/spark-examples_2.12-3.2.3.jar为Spark程序所在的Jar文件,spark-examples_2.12-3.2.3.jar是Spark镜像自带的,所以使用local schema

3000是传入Spark程序的启动类的参数

4、观察driver pod和executor pod

shell 复制代码
watch -n 1 kubectl get all -owide -n metasphere

5、查看日志输出

shell 复制代码
kubectl logs sparkpi-b9de1a887b1163f1-driver -n metasphere

6、清理Driver Pod

shell 复制代码
kubectl delete pod sparkpi-b9de1a887b1163f1-driver -n metasphere
相关推荐
kura_tsuki9 小时前
[Docker集群] Docker 容器入门
运维·docker·容器
养生技术人17 小时前
Oracle OCP认证考试题目详解082系列第57题
运维·数据库·sql·oracle·开闭原则
计算机编程-吉哥17 小时前
大数据毕业设计-基于大数据的NBA美国职业篮球联赛数据分析可视化系统(高分计算机毕业设计选题·定制开发·真正大数据·机器学习毕业设计)
大数据·毕业设计·计算机毕业设计选题·机器学习毕业设计·大数据毕业设计·大数据毕业设计选题推荐·大数据毕设项目
计算机编程-吉哥17 小时前
大数据毕业设计-基于大数据的BOSS直聘岗位招聘数据可视化分析系统(高分计算机毕业设计选题·定制开发·真正大数据·机器学习毕业设计)
大数据·毕业设计·计算机毕业设计选题·机器学习毕业设计·大数据毕业设计·大数据毕业设计选题推荐·大数据毕设项目
XUE-521131417 小时前
路由策略与路由控制实验
运维·网络·网络协议·智能路由器
BullSmall18 小时前
linux zgrep命令介绍
linux·运维
RunningShare19 小时前
从“国庆景区人山人海”看大数据处理中的“数据倾斜”难题
大数据·flink
emma羊羊19 小时前
【文件读写】图片木马
linux·运维·服务器·网络安全·靶场
你疯了抱抱我20 小时前
【SSH】同一局域网下windows使用Xshell SSH连接另一台 ubuntu 22.04 电脑
运维·ubuntu·ssh