Spark读取MySQL数据库表

官方地址:JDBC To Other Databases - Spark 4.0.0 Documentation

官方案例:

复制代码
// Note: JDBC loading and saving can be achieved via either the load/save or jdbc methods
// Loading data from a JDBC source
val jdbcDF = spark.read
  .format("jdbc")
  .option("url", "jdbc:postgresql:dbserver")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()

val connectionProperties = new Properties()
connectionProperties.put("user", "username")
connectionProperties.put("password", "password")
val jdbcDF2 = spark.read
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)
// Specifying the custom data types of the read schema
connectionProperties.put("customSchema", "id DECIMAL(38, 0), name STRING")
val jdbcDF3 = spark.read
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

// Saving data to a JDBC source
jdbcDF.write
  .format("jdbc")
  .option("url", "jdbc:postgresql:dbserver")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .save()

jdbcDF2.write
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

// Specifying create table column data types on write
jdbcDF.write
  .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)")
  .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties)

验证:

复制代码
<dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.13.16</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>8.0.15</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.13</artifactId>
      <version>4.0.0</version>
    </dependency>
  </dependencies>

import org.apache.spark.SparkConf

object SparkCase04 {


  import org.apache.spark.sql.SparkSession


  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setMaster("local[1]").setAppName("WC")

    val spark = SparkSession.builder().config(conf).getOrCreate()


   var df = spark.read.format("jdbc")
    .option("url", "jdbc:mysql://node11:3306/wjobs?useSSL=false")
    .option("dbtable", "user")
    .option("user", "root")
    .option("password", "root123")
    .load()

    df.show()
  }


}
相关推荐
张铁铁是个小胖子3 小时前
redis执行lua脚本的原子性和数据库原子性的区别
数据库·redis·lua
xiucai_cs5 小时前
MySQL深分页慢问题及性能优化
数据库·mysql·性能优化·深分页
当牛作馬5 小时前
ES常用查询命令
数据库·mysql·elasticsearch
chenglin0167 小时前
ES_索引的操作
大数据·数据库·elasticsearch
hzp6667 小时前
阿里云的centos8 服务器安装MySQL 8.0
mysql·阿里云·centos8
共享家95278 小时前
MYSQL库及表的操作
数据库
想回家的一天10 小时前
Go1.25的源码分析-src/runtime/runtime1.go(GMP)
数据库·redis·缓存
阿里云大数据AI技术11 小时前
鹰角网络基于阿里云EMR Serverless StarRocks的实时分析工程实践
数据库·数据分析