Spark读取MySQL数据库表

官方地址:JDBC To Other Databases - Spark 4.0.0 Documentation

官方案例:

复制代码
// Note: JDBC loading and saving can be achieved via either the load/save or jdbc methods
// Loading data from a JDBC source
val jdbcDF = spark.read
  .format("jdbc")
  .option("url", "jdbc:postgresql:dbserver")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()

val connectionProperties = new Properties()
connectionProperties.put("user", "username")
connectionProperties.put("password", "password")
val jdbcDF2 = spark.read
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
// Specifying the custom data types of the read schema
connectionProperties.put("customSchema", "id DECIMAL(38, 0), name STRING")
val jdbcDF3 = spark.read
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

// Saving data to a JDBC source
jdbcDF.write
  .format("jdbc")
  .option("url", "jdbc:postgresql:dbserver")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .save()

jdbcDF2.write
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

// Specifying create table column data types on write
jdbcDF.write
  .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)")
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

验证:

复制代码
<dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.13.16</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>8.0.15</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
  </dependencies>

import org.apache.spark.SparkConf

object SparkCase04 {


  import org.apache.spark.sql.SparkSession


  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setMaster("local[1]").setAppName("WC")

    val spark = SparkSession.builder().config(conf).getOrCreate()


   var df = spark.read.format("jdbc")
    .option("url", "jdbc:mysql://node11:3306/wjobs?useSSL=false")
    .option("dbtable", "user")
    .option("user", "root")
    .option("password", "root123")
    .load()

    df.show()
  }


}
相关推荐
Nturmoils5 小时前
订单列表慢查询,先看 WHERE、ORDER BY 和 LIMIT
数据库
渣波9 小时前
拒绝 SQL 焦虑!手把手带你用 NestJS + Prisma + DTO 写出“防弹”级后端代码
javascript·数据库·后端
Jim6001 天前
【吃透 MySQL InnoDB连载】第 1 章・解密线上数据库高频故障
mysql
GreatSQL1 天前
gt-checksum v4.0.0 新功能解读系列文章(4):SSL 加密连接——数据校验传输安全再升级
mysql
倔强的石头_1 天前
KingbaseES 新版MySQL 兼容版体验:旧版迁移 + 功能实测
数据库
倔强的石头_4 天前
《Kingbase护城河》——数据库存储空间全景探测与精细化瘦身实战
数据库
云技纵横4 天前
唯一索引 INSERT 死锁实战:5 秒复现交叉插入的 S 锁循环等待
sql·mysql
沉默王二4 天前
面试官:RAG 不用向量数据库,用 MySQL 硬扛?我:100 万向量不是很轻松?
mysql·面试·ai编程
冬奇Lab5 天前
每日一个开源项目(第134篇):Zvec - 阿里开源的嵌入式向量数据库,向量搜索界的 SQLite
数据库·人工智能·llm