spark3.x之后时间格式数据偶发报错org.apache.spark.SparkUpgradeException

尘世壹俗人2024-11-29 15:12

3.x之后如果你去处理2.x生成的时间字符串数据，很容易遇到一个问题

bash 复制代码

Error operating ExecuteStatement: 
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse 
'20200725__cb90fcc3_8006_46b8_8f78_781aaff2e7f3' in the new parser. 
You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

这个问题的原因是2.x时，对时间数据的格式话用的是simpledateformat类，但是这个类对数据的容错很高，导致偶尔会生成数据后缀，3.x之后不用它了，但是在处理时遇到这种数据3.x的spark就会报上面的错误，而提示中说的spark.sql.legacy.timeParserPolicy=LEGACY是一种尝试修复的措施，并不能保证百分百没问题，最优解是处理数据的时候遇到时间字符串用substr截取一下

上一篇：v-model在h函数和jsx下应该如何写

下一篇：修改插槽样式，el-input 插槽 append 的样式