已解决:spark代码中sqlContext.createDataframe空指针异常

这段代码是使用local模式运行spark代码。但是在获取了spark.sqlContext之后,用sqlContext将rdd算子转换为Dataframe的时候报错空指针异常

java 复制代码
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.NullPointerException;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
	at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
	at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
	at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:300)
	at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:272)
	at cn.itcast.xc.dimen.AreaDimInsert$.main(AreaDimInsert.scala:39)
	at cn.itcast.xc.dimen.AreaDimInsert.main(AreaDimInsert.scala)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
	at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:180)
	at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:114)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:385)
	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:287)
	at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
	at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
	... 20 more
Caused by: java.lang.NullPointerException
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
	at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:656)
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:444)
	at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:293)
	at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:639)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:567)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
	... 35 more

sqlContext不为空指针,area也不为空指针,这个错的排查还是比较难的。

经发现,是本地模式下,如果在windows环境下运行该代码,并且windows没有配置HADOOP_HOME环境变量就会报这个错

这里直接给出解决方案
情况1: ⽆hadoop环境
先准备好winutils
下载地址:
链接:https://pan.baidu.com/s/17Oy_CHoHBFYGk3-fCo8bJw
提取码:jco5

将这个路径配置成HADOOP_HOME的环境变量

重启idea,再次运行代码,即可解决上述问题
情况2: 有hadoop环境
确认HADOOP_HOME环境变量已正确配置
把winutils.exe复制到HADOOP_HOME⽬录内bin⽬录下, 如下图所示:
环境变量配置:

⽬录结构:

相关推荐
武子康4 小时前
大数据-239 离线数仓 - 广告业务实战:Flume 导入日志到 HDFS,并完成 Hive ODS/DWD 分层加载
大数据·后端·apache hive
字节跳动数据平台1 天前
代码量减少 70%、GPU 利用率达 95%:火山引擎多模态数据湖如何释放模思智能的算法生产力
大数据
得物技术1 天前
深入剖析Spark UI界面:参数与界面详解|得物技术
大数据·后端·spark
武子康1 天前
大数据-238 离线数仓 - 广告业务 Hive分析实战:ADS 点击率、购买率与 Top100 排名避坑
大数据·后端·apache hive
武子康2 天前
大数据-237 离线数仓 - Hive 广告业务实战:ODS→DWD 事件解析、广告明细与转化分析落地
大数据·后端·apache hive
大大大大晴天2 天前
Flink生产问题排障-Kryo serializer scala extensions are not available
大数据·flink
武子康4 天前
大数据-236 离线数仓 - 会员指标验证、DataX 导出与广告业务 ODS/DWD/ADS 全流程
大数据·后端·apache hive
肌肉娃子5 天前
20260227.spark.Spark 性能刺客:千万别在 for 循环里写 withColumn
spark
初次攀爬者5 天前
ZooKeeper 实现分布式锁的两种方式
分布式·后端·zookeeper
武子康5 天前
大数据-235 离线数仓 - 实战:Flume+HDFS+Hive 搭建 ODS/DWD/DWS/ADS 会员分析链路
大数据·后端·apache hive