Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

复制代码
  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
isNotNullX11 小时前
数据中台架构解析:湖仓一体的实战设计
java·大数据·数据库·架构·spark
暗影八度3 天前
Spark流水线数据质量检查组件
大数据·分布式·spark
涤生大数据4 天前
Apache Spark 4.0:将大数据分析提升到新的水平
数据分析·spark·apache·数据开发
xufwind4 天前
spark standlone 集群离线安装
大数据·分布式·spark
大数据CLUB4 天前
基于spark的奥运会奖牌变化数据分析
大数据·hadoop·数据分析·spark
华子w9089258595 天前
基于 Python Django 和 Spark 的电力能耗数据分析系统设计与实现7000字论文实现
python·spark·django
小新学习屋5 天前
Spark从入门到熟悉(篇三)
大数据·分布式·spark
Aurora_NeAr6 天前
Spark SQL架构及高级用法
大数据·后端·spark
百度Geek说7 天前
搜索数据建设系列之数据架构重构
数据仓库·重构·架构·spark·dubbo