Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

复制代码
  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
乌恩大侠5 小时前
AI-RAN 在 Spark上部署 Sionna-RK
大数据·分布式·spark
pale_moonlight1 天前
九、Spark基础环境实战((上)虚拟机安装Scala与windows端安装Scala)
大数据·分布式·spark
青云交3 天前
Java 大视界 -- Java 大数据在智能物流无人配送车路径规划与协同调度中的应用
java·spark·路径规划·大数据分析·智能物流·无人配送车·协同调度
yumgpkpm4 天前
腾讯云TBDS与CDH迁移常见问题有哪些?建议由CDH迁移到CMP 7.13 平台(类Cloudera CDP,如华为鲲鹏 ARM 版)
hive·hadoop·zookeeper·flink·spark·kafka·hbase
bigdata-rookie5 天前
Spark 部署模式
大数据·分布式·spark
sheji34165 天前
【开题答辩全过程】以 基于Spark的药品库存可视化分析系统为例,包含答辩的问题和答案
大数据·分布式·spark
larance5 天前
spark-submit 常用方式
大数据·spark
A尘埃5 天前
Spark基于内存计算的数据处理
大数据·分布式·spark
bigdata-rookie6 天前
Flink Checkpoint 和 Spark Checkpoint 的区别
大数据·flink·spark
灯下夜无眠6 天前
conda打包环境上传spark集群
大数据·spark·conda