- 安装Scala, apache-spark, Hadoop
bash
brew install scala
brew install apache-spark
brew install hadoop
pip install pyspark
注意不要自己另外安装jdk, 会造成版本对不上报错。因为安装apache-spark
的过程中会自动安装openjdk。
- 配置环境变量
bash
JAVA_HOME=/opt/homebrew/Cellar/openjdk@11/11.0.26/libexec/openjdk.jdk/Contents/Home
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=$JAVA_HOME/lib
export JAVA_HOME
SCALA_HOME=/opt/homebrew/opt/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_PATH=/opt/homebrew/Cellar/apache-spark/3.5.4
export PATH=$PATH:$SPARK_PATH/bin
export openjdk_home=/opt/homebrew/Cellar/openjdk@17/17.0.14
HADOOP_HOME=/opt/homebrew/Cellar/hadoop/3.4.1
export PATH=$PATH:/$HADOOP_HOME/bin
注意jdk版本不对会造成Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.: java.lang.UnsupportedOperationException: getSubject is supported only if a security manager is allowed
- 下载jdbc driver
链接, 选择Platform Independent
将下载的解压包中的jar文件拷贝到apache-spark
的目录下: /opt/homebrew/Cellar/apache-spark/3.5.4/libexec/jars
- 使用如下代码来测试
python
from pyspark.sql import SparkSession
from pyspark import SparkContext
# sc = SparkContext.getOrCreate()
sparkdriver = SparkSession.builder.\
appName('demo').\
master('local[*]').\
config('spark.driver.extraClassPath', '/opt/homebrew/Cellar/apache-spark/3.5.4/libexec/jars/mysql-connector-j-9.2.0.jar').\
getOrCreate()
df_mysql = sparkdriver.read.format('jdbc').\
option('url', 'jdbc:mysql://localhost:3306').\
option('driver', 'com.mysql.jdbc.Driver').\
option('user', 'root').\
option('password', '123').\
option('query', 'select * from tablename').\
load()
df_mysql.show(10)