Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

复制代码
  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
qq_4639448617 小时前
【Spark征服之路-4.3-Kafka】
大数据·spark·kafka
勇哥的编程江湖19 小时前
spark入门-helloword
大数据·分布式·spark
鼠鼠我捏,要死了捏3 天前
Spark Shuffle性能优化实践指南:提升大数据处理效率
性能优化·spark·shuffle
Dragon online3 天前
数据仓库深度探索系列:架构选择与体系构建
大数据·数据仓库·分布式·架构·spark·大数据架构·数仓架构
qinbaby4 天前
pyspark使用
spark
不辉放弃5 天前
Spark的累加器(Accumulator)
大数据·数据库·spark
梦想养猫开书店5 天前
36、spark-measure 源码修改用于数据质量监控
大数据·分布式·spark
码界筑梦坊5 天前
91-基于Spark的空气质量数据分析可视化系统
大数据·python·数据分析·spark·django·numpy·pandas
linweidong5 天前
深入剖析 Spark Shuffle 机制:从原理到实战优化
大数据·分布式·spark·spark sql·数据开发·shuffle·数据倾斜
道一云黑板报6 天前
Spark初探:揭秘速度优势与生态融合实践
大数据·分布式·spark·流式处理