Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
Data跳动4 小时前
Spark内存都消耗在哪里了?
大数据·分布式·spark
lucky_syq6 小时前
流式处理,为什么Flink比Spark Streaming好?
大数据·flink·spark
goTsHgo11 小时前
在 Spark 上实现 Graph Embedding
大数据·spark·embedding
程序猿小柒11 小时前
【Spark】Spark SQL执行计划-精简版
大数据·sql·spark
隔着天花板看星星11 小时前
Spark-Streaming集成Kafka
大数据·分布式·中间件·spark·kafka
lucky_syq2 天前
Spark和Hive的区别
大数据·hive·spark
隔着天花板看星星2 天前
Spark-Streaming receiver模式源码解析
大数据·分布式·spark
Data跳动3 天前
Spark 运行时对哪些数据会做缓存?
java·缓存·spark
隔着天花板看星星3 天前
Spark-Streaming性能调优
大数据·分布式·spark