Spark Catalog

#iceberg catalog

https://iceberg.apache.org/docs/latest/spark-configuration/

相关接口

复制代码
  /**
   * (Scala-specific)
   * Create a table from the given path based on a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.0.0
   */
  @deprecated("use createTable instead.", "2.2.0")
  def createExternalTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame = {
    createTable(tableName, source, schema, options)
  }

  /**
   * (Scala-specific)
   * Create a table based on the dataset in a data source, a schema and a set of options.
   * Then, returns the corresponding DataFrame.
   *
   * @param tableName is either a qualified or unqualified name that designates a table.
   *                  If no database identifier is provided, it refers to a table in
   *                  the current database.
   * @since 2.2.0
   */
  def createTable(
      tableName: String,
      source: String,
      schema: StructType,
      options: Map[String, String]): DataFrame

hive metastore

The default implementation of the Hive metastore in Apache Spark uses Apache Derby for its database persistence. This is available with no configuration required but is limited to only one Spark session at any time for the purposes of metadata storage. This obviously makes it unsuitable for use in multi-user environments, such as when shared on a development team or used in Production.

相关推荐
阳爱铭1 天前
ClickHouse 中至关重要的两类复制表引擎——ReplicatedMergeTree和 ReplicatedReplacingMergeTree
大数据·hive·hadoop·sql·clickhouse·spark·hbase
2501_941089191 天前
5G技术与物联网的融合:智能城市与工业革命的加速器
spark
while(努力):进步2 天前
探索未来的技术变革:如何通过云计算与人工智能重塑数字化世界
zookeeper·spark
源码之家3 天前
机器学习:基于大数据二手房房价预测与分析系统 可视化 线性回归预测算法 Django框架 链家网站 二手房 计算机毕业设计✅
大数据·算法·机器学习·数据分析·spark·线性回归·推荐算法
Lansonli4 天前
大数据Spark(七十三):Transformation转换算子glom和foldByKey使用案例
大数据·分布式·spark
keep__go5 天前
spark 单机安装
大数据·运维·分布式·spark
蒙特卡洛的随机游走5 天前
Spark的persist和cache
大数据·分布式·spark
蒙特卡洛的随机游走5 天前
Spark 中 distribute by、sort by、cluster by 深度解析
大数据·分布式·spark
梦里不知身是客115 天前
Spark中的宽窄依赖-宽窄巷子
大数据·分布式·spark
闲人编程5 天前
Python与大数据:使用PySpark处理海量数据
大数据·开发语言·分布式·python·spark·codecapsule·大规模