macos安装local模式spark

文章目录

配置说明

Scala - 3.18+

Spark - 3.5.0

Hadoop - 3.3.6

安装hadoop

  1. 这里下载相应版本的hadoop
  2. 下载后解压,配置系统环境变量
shell 复制代码
> sudo vim /etc/profile

添加以下两行

shell 复制代码
export HADOOP_HOME=/Users/collinsliu/hadoop-3.3.6/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

请自行替换位置

然后执行并生效系统环境变量

shell 复制代码
> source /etc/profile

安装Spark

  1. 这里下载相应版本的Spark
  2. 下载后解压,同时类似于hadoop,配置系统环境变量
shell 复制代码
> sudo vim /etc/profile

添加以下两行

shell 复制代码
export SPARK_HOME=/Users/collinsliu/spark-3.5.0
export PATH=$PATH:$SPARK_HOME/bin

请自行替换位置

然后执行并生效系统环境变量

shell 复制代码
> source /etc/profile
  1. 然后配置spark连接hadoop,形成local模式:
    a. 首先进入conf文件夹
shell 复制代码
> cd /Users/collinsliu/spark-3.5.0/conf

b. 其次替换配置文件

shell 复制代码
> cp spark-env.sh.template spark-env.sh
> vim spark-env.sh

c. 添加以下三条连接,使得spark能够找到对应的hadoop和相应的包

shell 复制代码
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_311.jdk/Contents/Home
export HADOOP_CONF_DIR=/Users/collinsliu/hadoop-3.3.6/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/Users/collinsliu/hadoop-3.3.6/bin/hadoop classpath)

测试安装成功

  1. 使用内置命令测试
shell 复制代码
> cd /Users/collinsliu/spark-3.5.0/
> ./run-example SparkPi

可以看到很多输出,最后找到

shell 复制代码
...
24/02/07 00:31:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0
24/02/07 00:31:33 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.0.100, executor driver, partition 0, PROCESS_LOCAL, 8263 bytes) 
24/02/07 00:31:33 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (192.168.0.100, executor driver, partition 1, PROCESS_LOCAL, 8263 bytes) 
24/02/07 00:31:33 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
24/02/07 00:31:33 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
24/02/07 00:31:34 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1101 bytes result sent to driver
24/02/07 00:31:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1101 bytes result sent to driver
24/02/07 00:31:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1120 ms on 192.168.0.100 (executor driver) (1/2)
24/02/07 00:31:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 923 ms on 192.168.0.100 (executor driver) (2/2)
24/02/07 00:31:34 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
24/02/07 00:31:34 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.737 s
24/02/07 00:31:34 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
24/02/07 00:31:34 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
24/02/07 00:31:34 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.807145 s
Pi is roughly 3.1405357026785135

说明安装成功

  1. 打开sparkshell
shell 复制代码
> spark-shell

出现以下内容

shell 复制代码
24/02/07 00:48:12 WARN Utils: Your hostname, Collinss-MacBook-Air.local resolves to a loopback address: 127.0.0.1; using 192.168.0.100 instead (on interface en0)
24/02/07 00:48:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.0
      /_/
         
Using Scala version 2.13.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_311)
Type in expressions to have them evaluated.
Type :help for more information.
24/02/07 00:48:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://192.168.0.100:4040
Spark context available as 'sc' (master = local[*], app id = local-1707238103536).
Spark session available as 'spark'.

scala> 

说明安装成功

相关推荐
成长之路51418 分钟前
【实证分析】数智化转型对制造企业全要素生产率的影响及机制探究(1999-2023年)
大数据
黑眼圈的小熊猫1 小时前
Git开发
大数据·git·elasticsearch
涛思数据(TDengine)1 小时前
虚拟表、TDgpt、JDBC 异步写入…TDengine 3.3.6.0 版本 8 大升级亮点
大数据·数据库·tdengine
goTsHgo2 小时前
Flink的 RecordWriter 数据通道 详解
大数据·flink
Romantic Rose3 小时前
你所拨打的电话是空号?手机状态查询API
大数据·人工智能
随缘而动,随遇而安3 小时前
第四十六篇 人力资源管理数据仓库架构设计与高阶实践
大数据·数据库·数据仓库·sql·数据库架构
小宋10214 小时前
Linux安装Elasticsearch详细教程
大数据·elasticsearch·搜索引擎
友善的猴子4 小时前
AlDente Pro for Mac电脑 充电限制保护工具
macos·电脑
程序员老周6664 小时前
数据仓库标准库模型架构相关概念浅讲
大数据·数据仓库·hive·数仓·拉链抽取·增量抽取·数据仓库架构
世界尽头与你5 小时前
MacOS红队常用攻击命令
安全·macos·网络安全