Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

复制代码
  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
petrel201510 小时前
【Spark】深度魔改 Spark 源码:打破静态限制,实现真正的运行时动态扩缩容
大数据·分布式·spark
zml.~1 天前
Spark 大数据分析:从原理到实战的一站式指南
大数据·数据分析·spark
zml.~1 天前
Spark大数据分析:解锁海量数据价值的核心利器
大数据·数据分析·spark
petrel20151 天前
【Spark】性能与联通性的终极博弈:Spark on K8s 主机网络改造深度实战
大数据·网络·spark·kubernetes·claude code
Moshow郑锴1 天前
Spark与Prophecy综合比较&&推荐Prophecy的理由
大数据·分布式·spark
high20112 天前
【Auron】-- 让 Spark SQL/DataFrame 跑得更快
大数据·sql·spark
走遍西兰花.jpg2 天前
spark的shuffle原理及调优
大数据·分布式·spark
小邓睡不饱耶2 天前
Spark 3.5.1 全栈实战指南:从环境部署到生产优化
大数据·分布式·spark
灯下夜无眠2 天前
spark本地模式基础配置流程
大数据·分布式·spark
伟大的大威2 天前
【AI 集群实战】多节点 DGX Spark 集群共享大模型
大数据·人工智能·spark