生态扩展Spark Doris Connector
doris官网去查找相匹配的spark
spark的安装:
tar -zxvf spark-3.1.2-bin-hadoop3.2.tgz
mv spark-3.1.2-bin-hadoop3.2 /opt/spark
spark环境配置:vim /etc/profile
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin
将编译好的spark-doris-connector-3.3_2.12-1.3.0-SNAPSHOT.jar复制到spark的jars目录
cp spark-doris-connector-3.2_2.12-1.3.0-SNAPSHOT.jar /opt/spark/jars/
代码库地址:https://github.com/apache/doris-spark-connector
编译与安装
准备工作
修改custom_env.sh.tpl文件,重命名为custom_env.sh
在源码目录下执行: sh build.sh 根据提示输入你需要的 Scala 2.12与 Spark3.2.3 版本进行编译。
验证:
scala> import org.apache.doris.spark._
import org.apache.doris.spark._
scala>
scala> val doris = spark.sql(
| s"""
| |CREATE TEMPORARY VIEW spark_doris
| |USING doris
| |OPTIONS(
| | "table.identifier"="demo.example_tbl",
| | "fenodes"="10.63.0.181:8030",
| | "user"="root",
| | "password"=""
| |);
| |""".stripMargin)
doris: org.apache.spark.sql.DataFrame = []
scala>
scala> spark.sql("SELECT * FROM spark_doris;").show
+-------+----------+----+---+---+-------------------+----+--------------+--------------+
|user_id| date|city|age|sex| last_visit_date|cost|max_dwell_time|min_dwell_time|
+-------+----------+----+---+---+-------------------+----+--------------+--------------+
| 10000|2017-10-01|北京| 20| 0|2017-10-01 07:00:00| 35| 10| 2|
| 10001|2017-10-01|北京| 30| 1|2017-10-01 17:05:45| 2| 22| 22|
| 10002|2017-10-02|上海| 20| 1|2017-10-02 12:59:12| 200| 5| 5|
| 10003|2017-10-02|广州| 32| 0|2017-10-02 11:20:00| 30| 11| 11|
| 10004|2017-10-01|深圳| 35| 0|2017-10-01 10:00:15| 100| 3| 3|
| 10004|2017-10-03|深圳| 35| 0|2017-10-03 10:20:22| 11| 6| 6|
+-------+----------+----+---+---+-------------------+----+--------------+--------------+
scala>