最近看到一篇关于ducklake的文章,深入了解下这个新的湖仓数据规范。
来源: https://mp.weixin.qq.com/s/RFbW-ChAbUBSP-9ETqCcDA
云上数据工程的基本范式是计算存储分离,数据以 Parquet 格式躺在 S3 或 GCS 上,便宜、持久、没有厂商锁定。需要查询时拉起计算资源,用完释放。只读场景下几乎完美。问题出在写入上。
裸 Parquet 的困境在于,当两个 pipeline 同时往同一张表写(S3 上一个目录)。如果两边都在尝试覆盖昨天的数据,谁先写完谁的算数?S3 没有锁,没有事务,这个问题没有答案。改列的数据类型要重写所有历史文件,写错了一个 batch 也没法回滚。

以下是裸 Parquet 做不到的事
- 没有并发写入协调
- 没有原子提交
- 没有 schema evolution
- 没有时间旅行
- 没有增量查询
面对同样的问题,Iceberg 和 DuckLake 做了不同的架构取舍。核心分歧在于:元数据放哪里?
- Iceberg 选择不依赖任何外部系统,只要能读文件就能读数据。
- DuckLake 选择接受依赖一个 SQL 数据库,用这个代价换元数据操作效率。两个不同的取舍,对应不同的场景。
#mermaid-svg-z2qTpMUBFYDqPvRv{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-z2qTpMUBFYDqPvRv .error-icon{fill:#552222;}#mermaid-svg-z2qTpMUBFYDqPvRv .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-z2qTpMUBFYDqPvRv .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-z2qTpMUBFYDqPvRv .marker{fill:#333333;stroke:#333333;}#mermaid-svg-z2qTpMUBFYDqPvRv .marker.cross{stroke:#333333;}#mermaid-svg-z2qTpMUBFYDqPvRv svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-z2qTpMUBFYDqPvRv p{margin:0;}#mermaid-svg-z2qTpMUBFYDqPvRv .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster-label text{fill:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster-label span{color:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster-label span p{background-color:transparent;}#mermaid-svg-z2qTpMUBFYDqPvRv .label text,#mermaid-svg-z2qTpMUBFYDqPvRv span{fill:#333;color:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv .node rect,#mermaid-svg-z2qTpMUBFYDqPvRv .node circle,#mermaid-svg-z2qTpMUBFYDqPvRv .node ellipse,#mermaid-svg-z2qTpMUBFYDqPvRv .node polygon,#mermaid-svg-z2qTpMUBFYDqPvRv .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-z2qTpMUBFYDqPvRv .rough-node .label text,#mermaid-svg-z2qTpMUBFYDqPvRv .node .label text,#mermaid-svg-z2qTpMUBFYDqPvRv .image-shape .label,#mermaid-svg-z2qTpMUBFYDqPvRv .icon-shape .label{text-anchor:middle;}#mermaid-svg-z2qTpMUBFYDqPvRv .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-z2qTpMUBFYDqPvRv .rough-node .label,#mermaid-svg-z2qTpMUBFYDqPvRv .node .label,#mermaid-svg-z2qTpMUBFYDqPvRv .image-shape .label,#mermaid-svg-z2qTpMUBFYDqPvRv .icon-shape .label{text-align:center;}#mermaid-svg-z2qTpMUBFYDqPvRv .node.clickable{cursor:pointer;}#mermaid-svg-z2qTpMUBFYDqPvRv .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-z2qTpMUBFYDqPvRv .arrowheadPath{fill:#333333;}#mermaid-svg-z2qTpMUBFYDqPvRv .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-z2qTpMUBFYDqPvRv .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-z2qTpMUBFYDqPvRv .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-z2qTpMUBFYDqPvRv .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-z2qTpMUBFYDqPvRv .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-z2qTpMUBFYDqPvRv .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster text{fill:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv .cluster span{color:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-z2qTpMUBFYDqPvRv .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-z2qTpMUBFYDqPvRv rect.text{fill:none;stroke-width:0;}#mermaid-svg-z2qTpMUBFYDqPvRv .icon-shape,#mermaid-svg-z2qTpMUBFYDqPvRv .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-z2qTpMUBFYDqPvRv .icon-shape p,#mermaid-svg-z2qTpMUBFYDqPvRv .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-z2qTpMUBFYDqPvRv .icon-shape .label rect,#mermaid-svg-z2qTpMUBFYDqPvRv .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-z2qTpMUBFYDqPvRv .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-z2qTpMUBFYDqPvRv .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-z2qTpMUBFYDqPvRv :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} DuckLake:元数据放 SQL 数据库
查询规划
查询规划
查询规划
PostgreSQL
(28 张元数据表)
Parquet 文件 1
Parquet 文件 2
Parquet 文件 N
Iceberg:元数据全部放对象存储
snapshot JSON
manifest list
(Avro)
manifest 1
manifest 2
manifest N
Parquet 文件
Parquet 文件
Parquet 文件
| 维度 | Iceberg | DuckLake |
|---|---|---|
| 元数据存储 | 对象存储上的 JSON + Avro 文件 | SQL 数据库的 28 张表 |
| 每次写入元数据开销 | S3 新增 3~4 个文件 | 一个 SQL 事务,几条 INSERT |
| 外部依赖 | Catalog 服务 (REST/Hive/Glue) | 一个 SQL 数据库 (PostgreSQL/MySQL) |
| 并发控制 | 乐观并发 + Catalog CAS | 数据库行锁 + 事务隔离 |
| 设计目标 | 不依赖任何外部系统就能读数据 | 用数据库换元数据操作效率 |
环境搭建
启动数据库服务
bash
# 创建网络
docker network create ducklake-net
# 启动 PostgreSQL
docker run -d --name ducklake-postgres --network ducklake-net \
-e POSTGRES_USER=ducklake -e POSTGRES_PASSWORD=ducklake \
-e POSTGRES_DB=ducklake_catalog \
-p 15432:5432 postgres:16
# 初始化 RustFS 数据目录权限
docker run --rm --network ducklake-net \
-v ducklake_rustfs_data:/data alpine chown -R 10001:10001 /data
# 启动 RustFS
docker run -d --name ducklake-rustfs --network ducklake-net \
-p 19000:9000 -p 19001:9001 \
-e RUSTFS_VOLUMES="/data/rustfs{0..3}" \
-e RUSTFS_ADDRESS="0.0.0.0:9000" \
-e RUSTFS_CONSOLE_ADDRESS="0.0.0.0:9001" \
-e RUSTFS_CONSOLE_ENABLE=true \
-e RUSTFS_ACCESS_KEY=ducklakeadmin \
-e RUSTFS_SECRET_KEY=ducklakeadmin \
-v ducklake_rustfs_data:/data \
037047667284.dkr.ecr.cn-north-1.amazonaws.com.cn/rustfs:1.0.0-alpha.83 /data
验证服务状态
$ docker ps --format "table {{.Names}}\t{{.Status}}"
NAMES STATUS
ducklake-rustfs Up
ducklake-postgres Up
注意:rustfs容器以 UID 10001 运行,数据目录必须先 chown。如果跳过权限初始化步骤,RustFS 启动后会因 permission denied 反复重启。RustFS 启动日志中应看到 started successfully at 0.0.0.0:9000。
创建 S3 Bucket
RustFS 需要 AWS Signature V4 签名才能创建 bucket。用 Python boto3发送请求
python
import boto3
from botocore.client import Config
s3 = boto3.client('s3',
endpoint_url='http://localhost:19000',
aws_access_key_id='ducklakeadmin',
aws_secret_access_key='ducklakeadmin',
config=Config(signature_version='s3v4'),
region_name='us-east-1')
for name in ['ducklake-data', 'iceberg-data']:
s3.create_bucket(Bucket=name)
初始化DuckDB连接
bash
uv venv .venv && source .venv/bin/activate
uv pip install duckdb psycopg2-binary pytz boto3
连接脚本如下,后续都会用到:
python
import duckdb
con = duckdb.connect()
con.execute("INSTALL ducklake")
con.execute("INSTALL postgres")
con.execute("INSTALL httpfs")
con.execute("LOAD ducklake")
con.execute("LOAD postgres")
con.execute("LOAD httpfs")
con.execute("SET s3_endpoint='localhost:19000'")
con.execute("SET s3_access_key_id='ducklakeadmin'")
con.execute("SET s3_secret_access_key='ducklakeadmin'")
con.execute("SET s3_use_ssl=false")
con.execute("SET s3_url_style='path'")
ATTACH语句中ducklake:postgres 前缀告诉 DuckDB 使用 PostgreSQL 作为 catalog 后端。DATA_PATH 指定 Parquet 文件在 S3 上的存放路径。所有元数据(schema、快照、文件指针、列统计)存在 PostgreSQL 中,只有实际数据文件存在 RustFS 上
py
con.execute("""
ATTACH 'ducklake:postgres:dbname=ducklake_catalog host=localhost port=15432 user=ducklake password=ducklake' AS lake
(DATA_PATH 's3://ducklake-data/lake/')
""")
注意:Python 环境中调用 ducklake_snapshots 等函数时,DuckDB 内部需要 pytz 包。如果缺少,会报 ModuleNotFoundError: No module named 'pytz'。务必提前安装。
整体的环境架构如下
#mermaid-svg-xR0Q2iUeEZ7OZg8R{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .error-icon{fill:#552222;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .marker{fill:#333333;stroke:#333333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .marker.cross{stroke:#333333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R p{margin:0;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster-label text{fill:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster-label span{color:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster-label span p{background-color:transparent;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .label text,#mermaid-svg-xR0Q2iUeEZ7OZg8R span{fill:#333;color:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .node rect,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node circle,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node ellipse,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node polygon,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .rough-node .label text,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node .label text,#mermaid-svg-xR0Q2iUeEZ7OZg8R .image-shape .label,#mermaid-svg-xR0Q2iUeEZ7OZg8R .icon-shape .label{text-anchor:middle;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .rough-node .label,#mermaid-svg-xR0Q2iUeEZ7OZg8R .node .label,#mermaid-svg-xR0Q2iUeEZ7OZg8R .image-shape .label,#mermaid-svg-xR0Q2iUeEZ7OZg8R .icon-shape .label{text-align:center;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .node.clickable{cursor:pointer;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .arrowheadPath{fill:#333333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-xR0Q2iUeEZ7OZg8R .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xR0Q2iUeEZ7OZg8R .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster text{fill:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .cluster span{color:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-xR0Q2iUeEZ7OZg8R rect.text{fill:none;stroke-width:0;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .icon-shape,#mermaid-svg-xR0Q2iUeEZ7OZg8R .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .icon-shape p,#mermaid-svg-xR0Q2iUeEZ7OZg8R .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .icon-shape .label rect,#mermaid-svg-xR0Q2iUeEZ7OZg8R .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-xR0Q2iUeEZ7OZg8R .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-xR0Q2iUeEZ7OZg8R .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-xR0Q2iUeEZ7OZg8R :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Docker 容器
主机
DuckLake catalog
Parquet 读写
RustFS (S3 兼容)
s3://ducklake-data/
s3://iceberg-data/
PostgreSQL 内部 (DuckLake catalog)
ducklake_snapshot
ducklake_data_file
... 共 28 张表
Python 3.12 + DuckDB v1.5.3
PostgreSQL 16
port: 15432
RustFS 1.0.0-alpha.83
port: 19000 (S3 API)
port: 19001 (Console)
基本操作
建表、写入、查询
sql
CREATE TABLE lake.events (
event_id INTEGER,
event_type VARCHAR,
user_id INTEGER,
amount DOUBLE,
created_at TIMESTAMP
);
INSERT INTO lake.events VALUES
(1, 'click', 100, 0.0, '2026-01-15 10:00:00'),
(2, 'purchase', 100, 99.9, '2026-01-15 10:05:00'),
(3, 'click', 200, 0.0, '2026-01-15 10:10:00'),
(4, 'purchase', 200, 49.9, '2026-01-15 10:15:00'),
(5, 'refund', 100, -99.9, '2026-01-15 11:00:00');
SELECT * FROM lake.events ORDER BY event_id;
[OK] SELECT: 5 rows
(1, 'click', 100, 0.0, datetime.datetime(2026, 1, 15, 10, 0))
(2, 'purchase', 100, 99.9, datetime.datetime(2026, 1, 15, 10, 5))
(3, 'click', 200, 0.0, datetime.datetime(2026, 1, 15, 10, 10))
(4, 'purchase', 200, 49.9, datetime.datetime(2026, 1, 15, 10, 15))
(5, 'refund', 100, -99.9, datetime.datetime(2026, 1, 15, 11, 0))
查看元数据DuckLake 在 S3 上产生了多少文件:
sql
SELECT count(*) AS parquet_file_count FROM ducklake_list_files('lake', 'events');
FROM ducklake_snapshots('lake');
[OBSERVE] Parquet files on S3: 0
再插入一批数据,观察文件增长:
sql
INSERT INTO lake.events VALUES
(6, 'click', 300, 0.0, '2026-01-16 09:00:00'),
(7, 'purchase', 300, 29.9, '2026-01-16 09:05:00'),
(8, 'click', 100, 0.0, '2026-01-16 10:00:00');
SELECT count(*) AS parquet_file_count FROM ducklake_list_files('lake', 'events');
[OBSERVE] Parquet files after 2nd insert: 0
为什么 0 个文件? 因为 INSERT 5 行和 3 行都低于
data_inlining_row_limit默认值 10,数据被 inline 到 PostgreSQL catalog 中,没有写 Parquet 到 RustFS。这是 DuckLake 的核心设计------小写入走 catalog,不碰对象存储。
同样一次 INSERT 操作,Iceberg 和 DuckLake 背后发生的事完全不同:DuckLake 元数据操作是几条 SQL,数据 ≤10 行时甚至不写 S3。Iceberg 元数据操作是 3~4 个 S3 文件。
#mermaid-svg-VGcRSgV06kpT90S5{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-VGcRSgV06kpT90S5 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-VGcRSgV06kpT90S5 .error-icon{fill:#552222;}#mermaid-svg-VGcRSgV06kpT90S5 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-VGcRSgV06kpT90S5 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-VGcRSgV06kpT90S5 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-VGcRSgV06kpT90S5 .marker.cross{stroke:#333333;}#mermaid-svg-VGcRSgV06kpT90S5 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-VGcRSgV06kpT90S5 p{margin:0;}#mermaid-svg-VGcRSgV06kpT90S5 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-VGcRSgV06kpT90S5 .cluster-label text{fill:#333;}#mermaid-svg-VGcRSgV06kpT90S5 .cluster-label span{color:#333;}#mermaid-svg-VGcRSgV06kpT90S5 .cluster-label span p{background-color:transparent;}#mermaid-svg-VGcRSgV06kpT90S5 .label text,#mermaid-svg-VGcRSgV06kpT90S5 span{fill:#333;color:#333;}#mermaid-svg-VGcRSgV06kpT90S5 .node rect,#mermaid-svg-VGcRSgV06kpT90S5 .node circle,#mermaid-svg-VGcRSgV06kpT90S5 .node ellipse,#mermaid-svg-VGcRSgV06kpT90S5 .node polygon,#mermaid-svg-VGcRSgV06kpT90S5 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-VGcRSgV06kpT90S5 .rough-node .label text,#mermaid-svg-VGcRSgV06kpT90S5 .node .label text,#mermaid-svg-VGcRSgV06kpT90S5 .image-shape .label,#mermaid-svg-VGcRSgV06kpT90S5 .icon-shape .label{text-anchor:middle;}#mermaid-svg-VGcRSgV06kpT90S5 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-VGcRSgV06kpT90S5 .rough-node .label,#mermaid-svg-VGcRSgV06kpT90S5 .node .label,#mermaid-svg-VGcRSgV06kpT90S5 .image-shape .label,#mermaid-svg-VGcRSgV06kpT90S5 .icon-shape .label{text-align:center;}#mermaid-svg-VGcRSgV06kpT90S5 .node.clickable{cursor:pointer;}#mermaid-svg-VGcRSgV06kpT90S5 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-VGcRSgV06kpT90S5 .arrowheadPath{fill:#333333;}#mermaid-svg-VGcRSgV06kpT90S5 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-VGcRSgV06kpT90S5 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-VGcRSgV06kpT90S5 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-VGcRSgV06kpT90S5 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-VGcRSgV06kpT90S5 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-VGcRSgV06kpT90S5 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-VGcRSgV06kpT90S5 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-VGcRSgV06kpT90S5 .cluster text{fill:#333;}#mermaid-svg-VGcRSgV06kpT90S5 .cluster span{color:#333;}#mermaid-svg-VGcRSgV06kpT90S5 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-VGcRSgV06kpT90S5 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-VGcRSgV06kpT90S5 rect.text{fill:none;stroke-width:0;}#mermaid-svg-VGcRSgV06kpT90S5 .icon-shape,#mermaid-svg-VGcRSgV06kpT90S5 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-VGcRSgV06kpT90S5 .icon-shape p,#mermaid-svg-VGcRSgV06kpT90S5 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-VGcRSgV06kpT90S5 .icon-shape .label rect,#mermaid-svg-VGcRSgV06kpT90S5 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-VGcRSgV06kpT90S5 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-VGcRSgV06kpT90S5 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-VGcRSgV06kpT90S5 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Iceberg 写入流程
写 Parquet 数据文件
写 manifest 文件
(Avro)
写 manifest list
(Avro)
写 snapshot JSON
DuckLake 写入流程
BEGIN TRANSACTION
INSERT INTO
ducklake_data_file
INSERT INTO
ducklake_snapshot
COMMIT
高频小批量写入
这是 DuckLake 的核心优势场景。模拟事件流接入100 批写入,每批 10 行。
sql
CREATE TABLE lake.small_batch_events (
event_id INTEGER,
sensor_id INTEGER,
value DOUBLE,
ts TIMESTAMP
);
-- 100 条 INSERT,每条 10 行
INSERT INTO lake.small_batch_events
SELECT range AS event_id, (range % 5) + 1 AS sensor_id,
random() * 100 AS value, now() + (range || ' seconds')::INTERVAL AS ts
FROM range(0, 10);
INSERT INTO lake.small_batch_events
SELECT range AS event_id, (range % 5) + 1 AS sensor_id,
random() * 100 AS value, now() + (range || ' seconds')::INTERVAL AS ts
FROM range(10, 20);
-- ... 共 100 条,range(0,10) 到 range(990,1000) ...
SELECT count(*) AS total_rows FROM lake.small_batch_events;
SELECT count(*) AS parquet_file_count FROM ducklake_list_files('lake', 'small_batch_events');
FROM ducklake_snapshots('lake');
[OK] 100 batches inserted in 1.80s
[OBSERVE] Total rows: 1000
[OBSERVE] Parquet files on S3: 0 (expected 0 with inlining!)
1000 行数据,耗时 1.80s,RustFS 上 0 个 Parquet 文件 。因为每次 INSERT 恰好 10 行,等于默认的 data_inlining_row_limit 阈值,全部被 inline 到 PostgreSQL。
#mermaid-svg-DcdcKTsswnZR4Tz2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-DcdcKTsswnZR4Tz2 .error-icon{fill:#552222;}#mermaid-svg-DcdcKTsswnZR4Tz2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-DcdcKTsswnZR4Tz2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .marker.cross{stroke:#333333;}#mermaid-svg-DcdcKTsswnZR4Tz2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-DcdcKTsswnZR4Tz2 p{margin:0;}#mermaid-svg-DcdcKTsswnZR4Tz2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster-label text{fill:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster-label span{color:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster-label span p{background-color:transparent;}#mermaid-svg-DcdcKTsswnZR4Tz2 .label text,#mermaid-svg-DcdcKTsswnZR4Tz2 span{fill:#333;color:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .node rect,#mermaid-svg-DcdcKTsswnZR4Tz2 .node circle,#mermaid-svg-DcdcKTsswnZR4Tz2 .node ellipse,#mermaid-svg-DcdcKTsswnZR4Tz2 .node polygon,#mermaid-svg-DcdcKTsswnZR4Tz2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .rough-node .label text,#mermaid-svg-DcdcKTsswnZR4Tz2 .node .label text,#mermaid-svg-DcdcKTsswnZR4Tz2 .image-shape .label,#mermaid-svg-DcdcKTsswnZR4Tz2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-DcdcKTsswnZR4Tz2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .rough-node .label,#mermaid-svg-DcdcKTsswnZR4Tz2 .node .label,#mermaid-svg-DcdcKTsswnZR4Tz2 .image-shape .label,#mermaid-svg-DcdcKTsswnZR4Tz2 .icon-shape .label{text-align:center;}#mermaid-svg-DcdcKTsswnZR4Tz2 .node.clickable{cursor:pointer;}#mermaid-svg-DcdcKTsswnZR4Tz2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .arrowheadPath{fill:#333333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DcdcKTsswnZR4Tz2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-DcdcKTsswnZR4Tz2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DcdcKTsswnZR4Tz2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster text{fill:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 .cluster span{color:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-DcdcKTsswnZR4Tz2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-DcdcKTsswnZR4Tz2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-DcdcKTsswnZR4Tz2 .icon-shape,#mermaid-svg-DcdcKTsswnZR4Tz2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-DcdcKTsswnZR4Tz2 .icon-shape p,#mermaid-svg-DcdcKTsswnZR4Tz2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-DcdcKTsswnZR4Tz2 .icon-shape .label rect,#mermaid-svg-DcdcKTsswnZR4Tz2 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-DcdcKTsswnZR4Tz2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-DcdcKTsswnZR4Tz2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-DcdcKTsswnZR4Tz2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Iceberg: 同等场景
S3 上
100 个 Parquet
100 个 manifest
100 个 manifest list
100 个 snapshot JSON
= ~400 个文件
DuckLake: 100 批 x 10 行
PostgreSQL
100 个 snapshot
0 个 S3 文件
如果同样的场景用 Iceberg 跑,每批写入需要在 S3 上产生 3~4 个文件,100 批就是 300~400 个文件。这就是小文件膨胀问题的根源。
Data Inlining 阈值 :默认 10 行。可通过
CALL ducklake_set_option('lake', 'data_inlining_row_limit', 'N', table_name => '表名')调整。调大意味着更多写入被 inline 到 catalog,减少小文件,但增加了 PostgreSQL 的存储压力。
Data Inlining 是 DuckLake 的旗舰特性。小写入(INSERT/DELETE/UPDATE)直接操作 catalog 数据库中的表,不产生 Parquet 文件。
小写入 inline,行数 ≤ data_inlining_row_limit 就 inline,行数 > data_inlining_row_limit 就写 Parquet。11 行突破 10 行的默认阈值,立即产生 Parquet 文件。
sql
CREATE TABLE lake.sensors (
sensor_id INTEGER, temperature DOUBLE, humidity DOUBLE, ts TIMESTAMP
);
[OK] CREATE TABLE lake.sensors
sql
INSERT INTO lake.sensors VALUES (1, 21.5, 45.0, '2026-05-01 10:00:00');
INSERT INTO lake.sensors VALUES (2, 22.1, 48.0, '2026-05-01 10:00:10');
INSERT INTO lake.sensors VALUES (1, 21.8, 46.0, '2026-05-01 10:00:20');
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'sensors');
[OK] 3 single-row inserts (inlined)
[OBSERVE] Parquet files: 0
默认阈值 data_inlining_row_limit = 10,行为是 行数 ≤ 10 就 inline。下面实测边界:
sql
-- 9 行:低于阈值
CREATE TABLE lake.boundary_9 (id INTEGER, v DOUBLE);
INSERT INTO lake.boundary_9 SELECT range, random()*100 FROM range(0, 9);
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'boundary_9');
-- 10 行:恰好等于阈值
CREATE TABLE lake.boundary_10 (id INTEGER, v DOUBLE);
INSERT INTO lake.boundary_10 SELECT range, random()*100 FROM range(0, 10);
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'boundary_10');
-- 11 行:刚超阈值
CREATE TABLE lake.boundary_11 (id INTEGER, v DOUBLE);
INSERT INTO lake.boundary_11 SELECT range, random()*100 FROM range(0, 11);
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'boundary_11');
-- 13 行:明确超出
CREATE TABLE lake.boundary_13 (id INTEGER, v DOUBLE);
INSERT INTO lake.boundary_13 SELECT range, random()*100 FROM range(0, 13);
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'boundary_13');
[OBSERVE] 9 rows → 0 Parquet files (inline)
[OBSERVE] 10 rows → 0 Parquet files (inline)
[OBSERVE] 11 rows → 1 Parquet file (writes Parquet, 439 bytes)
[OBSERVE] 13 rows → 1 Parquet file (writes Parquet, 647 bytes)
大写入产生 Parquet
sql
INSERT INTO lake.sensors
SELECT (range % 3) + 1, 20 + random() * 5, 40 + random() * 15,
'2026-05-01 10:01:00'::TIMESTAMP + (range || ' seconds')::INTERVAL
FROM range(0, 50);
SELECT count(*) AS parquet_files FROM ducklake_list_files('lake', 'sensors');
[OK] 50-row insert (exceeds threshold)
[OBSERVE] Parquet files: 1
调整阈值
sql
CALL ducklake_set_option('lake', 'data_inlining_row_limit', '100', table_name => 'sensors');
INSERT INTO lake.sensors
SELECT (range % 3) + 1, 20 + random() * 5, 40 + random() * 15,
'2026-05-01 10:05:00'::TIMESTAMP + (range || ' seconds')::INTERVAL
FROM range(0, 80);
SELECT count(*) AS parquet_files_after_80row_insert FROM ducklake_list_files('lake', 'sensors');
[OBSERVE] After 80-row insert (threshold=100): 2 Parquet files (80 rows inlined)
阈值改为 100 后,80 行的 INSERT 仍然被 inline,不产生 Parquet。
DELETE/UPDATE 也能 inline
sql
DELETE FROM lake.sensors WHERE sensor_id = 2 AND temperature = 22.1;
SELECT count(*) AS parquet_files_after_delete FROM ducklake_list_files('lake', 'sensors');
[OK] DELETE 1 row (inlined)
[OBSERVE] Parquet files after delete: 1
sql
UPDATE lake.sensors SET temperature = 25.0 WHERE sensor_id = 1 AND ts = '2026-05-01 10:00:00';
SELECT count(*) AS parquet_files_after_update FROM ducklake_list_files('lake', 'sensors');
[OK] UPDATE 1 row (inlined)
[OBSERVE] Parquet files after update: 1
Flush将 inlined 数据刷到 S3
sql
CALL ducklake_flush_inlined_data('lake');
SELECT count(*) AS parquet_files_after_flush FROM ducklake_list_files('lake', 'sensors');
SELECT count(*) AS total_rows_after_flush FROM lake.sensors;
[OK] FLUSH
[OBSERVE] After flush: 2 Parquet files, 52 total rows
DELETE 两种路径 :如果被删的行还在 inlined data table 中,DuckLake 直接设置该行的 end_snapshot 标记删除;如果行已在 Parquet 文件中,DuckLake 在 catalog 中创建一个 inlined deletion table 记录哪个文件的哪行被删。两种情况都不产生新 Parquet 文件。
Flush 不是一次性的:长时间不 flush,inlined 数据积累在 PostgreSQL 中,查询性能会逐渐下降(因为引擎需要同时扫 catalog 和 S3)。Flush 是维持查询性能的常规运维操作。对于 Kafka/CDC 流式接入场景,建议定期 flush(如每 5 分钟或每 N 行)。
实测发现 DuckLake v1.0 不支持在表级别用 ALTER TABLE 设置 inlining 阈值,会报 Binder Error: Unsupported ALTER TABLE type in DuckLake。正确写法是 CALL ducklake_set_option('lake', 'data_inlining_row_limit', '100', table_name => 'sensors')。
Time Travel
DuckLake 的时间旅行基于 catalog 中的 snapshot 表。每次写操作(INSERT/UPDATE/DELETE)都会产生一个新的 snapshot。
PostgreSQL (catalog) DuckDB User PostgreSQL (catalog) DuckDB User #mermaid-svg-JjmTt9O7NAvvfKv0{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-JjmTt9O7NAvvfKv0 .error-icon{fill:#552222;}#mermaid-svg-JjmTt9O7NAvvfKv0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JjmTt9O7NAvvfKv0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JjmTt9O7NAvvfKv0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JjmTt9O7NAvvfKv0 .marker.cross{stroke:#333333;}#mermaid-svg-JjmTt9O7NAvvfKv0 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JjmTt9O7NAvvfKv0 p{margin:0;}#mermaid-svg-JjmTt9O7NAvvfKv0 .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-JjmTt9O7NAvvfKv0 text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-JjmTt9O7NAvvfKv0 .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-JjmTt9O7NAvvfKv0 .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-JjmTt9O7NAvvfKv0 #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-JjmTt9O7NAvvfKv0 .sequenceNumber{fill:white;}#mermaid-svg-JjmTt9O7NAvvfKv0 #sequencenumber{fill:#333;}#mermaid-svg-JjmTt9O7NAvvfKv0 #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-JjmTt9O7NAvvfKv0 .messageText{fill:#333;stroke:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-JjmTt9O7NAvvfKv0 .labelText,#mermaid-svg-JjmTt9O7NAvvfKv0 .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .loopText,#mermaid-svg-JjmTt9O7NAvvfKv0 .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-JjmTt9O7NAvvfKv0 .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-JjmTt9O7NAvvfKv0 .noteText,#mermaid-svg-JjmTt9O7NAvvfKv0 .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-JjmTt9O7NAvvfKv0 .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-JjmTt9O7NAvvfKv0 .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-JjmTt9O7NAvvfKv0 .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-JjmTt9O7NAvvfKv0 .actorPopupMenu{position:absolute;}#mermaid-svg-JjmTt9O7NAvvfKv0 .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-JjmTt9O7NAvvfKv0 .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-JjmTt9O7NAvvfKv0 .actor-man circle,#mermaid-svg-JjmTt9O7NAvvfKv0 line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-JjmTt9O7NAvvfKv0 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 操作序列 Time Travel 查询 一条带条件的 SQL 成本不随历史版本数增长 INSERT 2 行 (Beijing, Shanghai) snapshot_id=352 INSERT 1 行 (Shenzhen) snapshot_id=353 UPDATE Beijing → Updated snapshot_id=354 DELETE Shanghai snapshot_id=355 SELECT ... AT (VERSION => 353) SQL 查询 snapshot_id=353 的有效文件 文件列表 + inlined rows
操作与快照追踪
sql
CREATE TABLE lake.travel_log (
id INTEGER, city VARCHAR, visited_at TIMESTAMP
);
[OK] CREATE TABLE lake.travel_log
sql
INSERT INTO lake.travel_log VALUES (1, 'Beijing', '2026-01-01 08:00:00');
INSERT INTO lake.travel_log VALUES (2, 'Shanghai', '2026-02-01 08:00:00');
SELECT snapshot_id, snapshot_time
FROM ducklake_snapshots('lake')
ORDER BY snapshot_id DESC LIMIT 1;
[OK] Initial insert, snapshot_id=352
记下输出的 snapshot_id 值,后续查询要用。
继续操作:
sql
INSERT INTO lake.travel_log VALUES (3, 'Shenzhen', '2026-03-01 08:00:00');
UPDATE lake.travel_log SET city = 'Beijing (Updated)' WHERE id = 1;
DELETE FROM lake.travel_log WHERE id = 2;
SELECT * FROM lake.travel_log ORDER BY id;
FROM ducklake_snapshots('lake');
[OK] Added Shenzhen, snapshot_id=353
[OK] Updated Beijing, snapshot_id=354
[OK] Deleted Shanghai, snapshot_id=355
[OK] Current state: [(1, 'Beijing (Updated)', ...), (3, 'Shenzhen', ...)]
按版本号查询
将下面查询中的 V1、V2、V3 替换为 ducklake_snapshots 输出的实际 snapshot_id 值:
sql
SELECT * FROM lake.travel_log AT (VERSION => V1) ORDER BY id;
SELECT * FROM lake.travel_log AT (VERSION => V2) ORDER BY id;
SELECT * FROM lake.travel_log AT (VERSION => V3) ORDER BY id;
[OK] Time travel to v352 (after initial): [(1, 'Beijing', ...), (2, 'Shanghai', ...)]
[OK] Time travel to v353 (after Shenzhen): [(1, 'Beijing', ...), (2, 'Shanghai', ...), (3, 'Shenzhen', ...)]
[OK] Time travel to v354 (after update): [(1, 'Beijing (Updated)', ...), (2, 'Shanghai', ...), (3, 'Shenzhen', ...)]
增量变更
Time Travel 的前提 :快照没有被 ducklake_expire_snapshots 清理。清理后无法再 time travel 到被清理的版本。与 Iceberg 的差异 在于Iceberg 的 Time Travel 需要沿 snapshot 链往回走,找到目标时间点对应的 snapshot,再重建 manifest 树,历史版本越多遍历开销越大。DuckLake 的 AT (TIMESTAMP => ...) 就是一条加了时间条件的 SQL,查询成本基本不随历史版本数增长。
sql
FROM ducklake_table_insertions('lake', 'travel_log', start_snapshot => V1, end_snapshot => V2);
FROM ducklake_table_deletions('lake', 'travel_log', start_snapshot => V1, end_snapshot => V4);
冷启动查询规划
查询规划阶段,引擎需要确定读哪些 Parquet 文件。两种格式的冷启动路径截然不同。
#mermaid-svg-vg31zRPEmlzQQV30{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-vg31zRPEmlzQQV30 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-vg31zRPEmlzQQV30 .error-icon{fill:#552222;}#mermaid-svg-vg31zRPEmlzQQV30 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-vg31zRPEmlzQQV30 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-vg31zRPEmlzQQV30 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-vg31zRPEmlzQQV30 .marker.cross{stroke:#333333;}#mermaid-svg-vg31zRPEmlzQQV30 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-vg31zRPEmlzQQV30 p{margin:0;}#mermaid-svg-vg31zRPEmlzQQV30 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-vg31zRPEmlzQQV30 .cluster-label text{fill:#333;}#mermaid-svg-vg31zRPEmlzQQV30 .cluster-label span{color:#333;}#mermaid-svg-vg31zRPEmlzQQV30 .cluster-label span p{background-color:transparent;}#mermaid-svg-vg31zRPEmlzQQV30 .label text,#mermaid-svg-vg31zRPEmlzQQV30 span{fill:#333;color:#333;}#mermaid-svg-vg31zRPEmlzQQV30 .node rect,#mermaid-svg-vg31zRPEmlzQQV30 .node circle,#mermaid-svg-vg31zRPEmlzQQV30 .node ellipse,#mermaid-svg-vg31zRPEmlzQQV30 .node polygon,#mermaid-svg-vg31zRPEmlzQQV30 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-vg31zRPEmlzQQV30 .rough-node .label text,#mermaid-svg-vg31zRPEmlzQQV30 .node .label text,#mermaid-svg-vg31zRPEmlzQQV30 .image-shape .label,#mermaid-svg-vg31zRPEmlzQQV30 .icon-shape .label{text-anchor:middle;}#mermaid-svg-vg31zRPEmlzQQV30 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-vg31zRPEmlzQQV30 .rough-node .label,#mermaid-svg-vg31zRPEmlzQQV30 .node .label,#mermaid-svg-vg31zRPEmlzQQV30 .image-shape .label,#mermaid-svg-vg31zRPEmlzQQV30 .icon-shape .label{text-align:center;}#mermaid-svg-vg31zRPEmlzQQV30 .node.clickable{cursor:pointer;}#mermaid-svg-vg31zRPEmlzQQV30 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-vg31zRPEmlzQQV30 .arrowheadPath{fill:#333333;}#mermaid-svg-vg31zRPEmlzQQV30 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-vg31zRPEmlzQQV30 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-vg31zRPEmlzQQV30 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vg31zRPEmlzQQV30 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-vg31zRPEmlzQQV30 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vg31zRPEmlzQQV30 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-vg31zRPEmlzQQV30 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-vg31zRPEmlzQQV30 .cluster text{fill:#333;}#mermaid-svg-vg31zRPEmlzQQV30 .cluster span{color:#333;}#mermaid-svg-vg31zRPEmlzQQV30 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-vg31zRPEmlzQQV30 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-vg31zRPEmlzQQV30 rect.text{fill:none;stroke-width:0;}#mermaid-svg-vg31zRPEmlzQQV30 .icon-shape,#mermaid-svg-vg31zRPEmlzQQV30 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-vg31zRPEmlzQQV30 .icon-shape p,#mermaid-svg-vg31zRPEmlzQQV30 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-vg31zRPEmlzQQV30 .icon-shape .label rect,#mermaid-svg-vg31zRPEmlzQQV30 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-vg31zRPEmlzQQV30 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-vg31zRPEmlzQQV30 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-vg31zRPEmlzQQV30 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} DuckLake 冷启动
一条带索引的 SQL
文件列表
Iceberg 冷启动
① GET snapshot.json
② GET manifest-list.avro
③ GET manifest-1.avro
④ GET manifest-2.avro
⑤ GET manifest-N.avro
文件列表
准备数据
sql
CREATE TABLE lake.cold_start_data (
id INTEGER, category VARCHAR, value DOUBLE, created_at TIMESTAMP
);
INSERT INTO lake.cold_start_data
SELECT range, 'cat_' || (range % 10), random()*1000,
now() - (range || ' seconds')::INTERVAL FROM range(0, 10000);
INSERT INTO lake.cold_start_data
SELECT range + 10000, 'cat_' || (range % 10), random()*1000,
now() - (range || ' seconds')::INTERVAL FROM range(0, 10000);
INSERT INTO lake.cold_start_data
SELECT range + 20000, 'cat_' || (range % 10), random()*1000,
now() - (range || ' seconds')::INTERVAL FROM range(0, 10000);
CALL ducklake_flush_inlined_data('lake');
SELECT count(*) AS total_files FROM ducklake_list_files('lake', 'cold_start_data');
[OK] Inserted 30000 rows in 3 batches
[OBSERVE] Data files: 3
sql
SELECT category, count(*), avg(value)
FROM lake.cold_start_data GROUP BY category ORDER BY category;
[OK] Aggregation query: 10 groups in 31.8ms
sql
SELECT count(*) FROM lake.cold_start_data WHERE category = 'cat_5' AND value > 500;
[OK] Filtered query: 1476 rows in 22.2ms
并发写入
DuckLake 的并发协调交给 catalog 数据库的事务机制,多个 DuckDB 实例通过 PostgreSQL 的事务协调写入。
PostgreSQL Writer 4 Writer 3 Writer 2 Writer 1 PostgreSQL Writer 4 Writer 3 Writer 2 Writer 1 #mermaid-svg-jSJAKw3xR5Pz49Dj{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-jSJAKw3xR5Pz49Dj .error-icon{fill:#552222;}#mermaid-svg-jSJAKw3xR5Pz49Dj .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-jSJAKw3xR5Pz49Dj .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-jSJAKw3xR5Pz49Dj .marker{fill:#333333;stroke:#333333;}#mermaid-svg-jSJAKw3xR5Pz49Dj .marker.cross{stroke:#333333;}#mermaid-svg-jSJAKw3xR5Pz49Dj svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-jSJAKw3xR5Pz49Dj p{margin:0;}#mermaid-svg-jSJAKw3xR5Pz49Dj .actor{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-jSJAKw3xR5Pz49Dj text.actor>tspan{fill:black;stroke:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .actor-line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-jSJAKw3xR5Pz49Dj .innerArc{stroke-width:1.5;stroke-dasharray:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333;}#mermaid-svg-jSJAKw3xR5Pz49Dj .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:#333;}#mermaid-svg-jSJAKw3xR5Pz49Dj #arrowhead path{fill:#333;stroke:#333;}#mermaid-svg-jSJAKw3xR5Pz49Dj .sequenceNumber{fill:white;}#mermaid-svg-jSJAKw3xR5Pz49Dj #sequencenumber{fill:#333;}#mermaid-svg-jSJAKw3xR5Pz49Dj #crosshead path{fill:#333;stroke:#333;}#mermaid-svg-jSJAKw3xR5Pz49Dj .messageText{fill:#333;stroke:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .labelBox{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-jSJAKw3xR5Pz49Dj .labelText,#mermaid-svg-jSJAKw3xR5Pz49Dj .labelText>tspan{fill:black;stroke:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .loopText,#mermaid-svg-jSJAKw3xR5Pz49Dj .loopText>tspan{fill:black;stroke:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);}#mermaid-svg-jSJAKw3xR5Pz49Dj .note{stroke:#aaaa33;fill:#fff5ad;}#mermaid-svg-jSJAKw3xR5Pz49Dj .noteText,#mermaid-svg-jSJAKw3xR5Pz49Dj .noteText>tspan{fill:black;stroke:none;}#mermaid-svg-jSJAKw3xR5Pz49Dj .activation0{fill:#f4f4f4;stroke:#666;}#mermaid-svg-jSJAKw3xR5Pz49Dj .activation1{fill:#f4f4f4;stroke:#666;}#mermaid-svg-jSJAKw3xR5Pz49Dj .activation2{fill:#f4f4f4;stroke:#666;}#mermaid-svg-jSJAKw3xR5Pz49Dj .actorPopupMenu{position:absolute;}#mermaid-svg-jSJAKw3xR5Pz49Dj .actorPopupMenuPanel{position:absolute;fill:#ECECFF;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#mermaid-svg-jSJAKw3xR5Pz49Dj .actor-man line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;}#mermaid-svg-jSJAKw3xR5Pz49Dj .actor-man circle,#mermaid-svg-jSJAKw3xR5Pz49Dj line{stroke:hsl(259.6261682243, 59.7765363128%, 87.9019607843%);fill:#ECECFF;stroke-width:2px;}#mermaid-svg-jSJAKw3xR5Pz49Dj :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} par 并发写入 (4 个进程) 行锁 + 事务隔离保证一致性 1000 行无丢失 BEGIN → INSERT 250 行 → COMMIT BEGIN → INSERT 250 行 → COMMIT BEGIN → INSERT 250 行 → COMMIT BEGIN → INSERT 250 行 → COMMIT
运行方式
用 Python multiprocessing 模拟 4 个独立 writer 进程,每个写入 25 批 × 10 行 = 250 行:
python
import multiprocessing
import duckdb
def writer(worker_id, start_batch, num_batches):
con = get_con() # 各自独立的 DuckDB 连接
for i in range(num_batches):
offset = start_batch + i * 10
con.execute(f"""INSERT INTO lake.concurrent_test
SELECT {worker_id}, range, random()*100, now()
FROM range({offset}, {offset + 10})""")
con.close()
processes = []
for w in range(4):
p = multiprocessing.Process(target=writer, args=(w + 1, w * 250, 25))
processes.append(p)
for p in processes: p.start()
for p in processes: p.join()
[Step 1] Launching 4 concurrent writers, 25 batches each...
Writer 2: 25 batches in 0.71s
Writer 4: 25 batches in 1.21s
Writer 3: 25 batches in 1.56s
Writer 1: 25 batches in 1.98s
[Step 2] All 4 writers completed in 2.66s
[Step 3] Verifying results...
Writer 1: 250 rows
Writer 2: 250 rows
Writer 3: 250 rows
Writer 4: 250 rows
Total rows: 1000 (expected 1000)
Data integrity: PASS
Parquet files: 0 (inlined)
4 个进程并发写入,总计 1000 行,0 行丢失,数据完整性 PASS。全部被 inline,RustFS 上无 Parquet 文件。
原生 DuckDB 是单写的(同一个数据库文件同一时刻只能有一个 writer)。DuckLake 把并发协调交给 catalog 数据库后,多个 DuckDB 实例可以同时写同一张表。这是 DuckLake 解锁的一个关键能力。
与 Iceberg 的对比:Iceberg 用乐观并发控制(OCC),两个 writer 同时提交时 Catalog 做 CAS 操作,检查当前元数据指针是否还是你读到时的那个版本。不是就拒绝,writer 重新读取最新状态再重试。高并发时冲突概率升高,吞吐下降。DuckLake 直接用数据库事务处理,行锁和事务隔离负责仲裁,冲突的事务自动回滚重试。
Compaction
Iceberg 的 compaction 有两个维度数据文件 compaction(合并小 Parquet)和元数据 compaction(清理 manifest 文件和过期快照)。DuckLake 只需要做数据文件 compaction,元数据清理就是一条 DELETE 语句。
制造碎片
sql
CREATE TABLE lake.compaction_demo (id INTEGER, value DOUBLE, ts TIMESTAMP);
INSERT INTO lake.compaction_demo SELECT range, random()*100, now() FROM range(0, 100);
INSERT INTO lake.compaction_demo SELECT range + 100, random()*100, now() FROM range(0, 100);
-- ... 共 10 批 ...
CALL ducklake_flush_inlined_data('lake');
[OK] 10 batches inserted
合并前检查
sql
SELECT count(*) AS file_count_before,
sum(df.record_count) AS total_rows
FROM ducklake_list_files('lake', 'compaction_demo') df;
[OBSERVE] Files before compaction: 10
执行 compaction
sql
CALL ducklake_set_option('lake', 'target_file_size', '1MB');
CALL ducklake_merge_adjacent_files('lake');
[OK] Compaction executed
[OBSERVE] Files after compaction: 1, rows: 1000
清理旧快照
sql
CALL ducklake_expire_snapshots('lake', older_than => now() - INTERVAL '1 hour');
CALL ducklake_cleanup_old_files('lake');
[OK] Snapshots expired + old files cleaned
10 个文件合并为 1,行数 1000 不变。
#mermaid-svg-uAmnugDXiBNxJXhM{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-uAmnugDXiBNxJXhM .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-uAmnugDXiBNxJXhM .error-icon{fill:#552222;}#mermaid-svg-uAmnugDXiBNxJXhM .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uAmnugDXiBNxJXhM .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uAmnugDXiBNxJXhM .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uAmnugDXiBNxJXhM .marker.cross{stroke:#333333;}#mermaid-svg-uAmnugDXiBNxJXhM svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uAmnugDXiBNxJXhM p{margin:0;}#mermaid-svg-uAmnugDXiBNxJXhM .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-uAmnugDXiBNxJXhM .cluster-label text{fill:#333;}#mermaid-svg-uAmnugDXiBNxJXhM .cluster-label span{color:#333;}#mermaid-svg-uAmnugDXiBNxJXhM .cluster-label span p{background-color:transparent;}#mermaid-svg-uAmnugDXiBNxJXhM .label text,#mermaid-svg-uAmnugDXiBNxJXhM span{fill:#333;color:#333;}#mermaid-svg-uAmnugDXiBNxJXhM .node rect,#mermaid-svg-uAmnugDXiBNxJXhM .node circle,#mermaid-svg-uAmnugDXiBNxJXhM .node ellipse,#mermaid-svg-uAmnugDXiBNxJXhM .node polygon,#mermaid-svg-uAmnugDXiBNxJXhM .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-uAmnugDXiBNxJXhM .rough-node .label text,#mermaid-svg-uAmnugDXiBNxJXhM .node .label text,#mermaid-svg-uAmnugDXiBNxJXhM .image-shape .label,#mermaid-svg-uAmnugDXiBNxJXhM .icon-shape .label{text-anchor:middle;}#mermaid-svg-uAmnugDXiBNxJXhM .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-uAmnugDXiBNxJXhM .rough-node .label,#mermaid-svg-uAmnugDXiBNxJXhM .node .label,#mermaid-svg-uAmnugDXiBNxJXhM .image-shape .label,#mermaid-svg-uAmnugDXiBNxJXhM .icon-shape .label{text-align:center;}#mermaid-svg-uAmnugDXiBNxJXhM .node.clickable{cursor:pointer;}#mermaid-svg-uAmnugDXiBNxJXhM .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-uAmnugDXiBNxJXhM .arrowheadPath{fill:#333333;}#mermaid-svg-uAmnugDXiBNxJXhM .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-uAmnugDXiBNxJXhM .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-uAmnugDXiBNxJXhM .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uAmnugDXiBNxJXhM .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-uAmnugDXiBNxJXhM .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uAmnugDXiBNxJXhM .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-uAmnugDXiBNxJXhM .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-uAmnugDXiBNxJXhM .cluster text{fill:#333;}#mermaid-svg-uAmnugDXiBNxJXhM .cluster span{color:#333;}#mermaid-svg-uAmnugDXiBNxJXhM div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-uAmnugDXiBNxJXhM .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-uAmnugDXiBNxJXhM rect.text{fill:none;stroke-width:0;}#mermaid-svg-uAmnugDXiBNxJXhM .icon-shape,#mermaid-svg-uAmnugDXiBNxJXhM .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-uAmnugDXiBNxJXhM .icon-shape p,#mermaid-svg-uAmnugDXiBNxJXhM .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-uAmnugDXiBNxJXhM .icon-shape .label rect,#mermaid-svg-uAmnugDXiBNxJXhM .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-uAmnugDXiBNxJXhM .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-uAmnugDXiBNxJXhM .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-uAmnugDXiBNxJXhM :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Iceberg Compaction (4+ 步)
Rewrite data files
Rewrite manifest files
Remove orphan files
Expire snapshots
DuckLake Compaction (3 步)
merge_adjacent_files
expire_snapshots
cleanup_old_files
DuckLake 元数据清理的本质 :旧快照就是 PostgreSQL 中的旧行。ducklake_expire_snapshots 本质上是 DELETE FROM ducklake_snapshot WHERE ...。和 Iceberg 需要重写 manifest 文件不同,DuckLake 的元数据清理是纯数据库操作,瞬间完成。
选型建议
何时选择 DuckLake
- DuckDB 生态用户,数据规模中小,高频小批量写入场景(Kafka 消费、CDC 同步)
- 需要 Multiplayer DuckDB(多人同时写同一张表)
何时选择 Iceberg
- 核心链路是 Spark 或 Flink,多引擎互通需求(Snowflake、BigQuery、Trino 同时访问)
- 数据规模 PB 级
- 不能接受 catalog 数据库成为单点(Iceberg 即使 Catalog 挂了,S3 上的文件还在)
不可忽视的风险
- DuckLake 的单点问题:PostgreSQL catalog 数据库一旦不可访问,数据就读不了。你对 PostgreSQL 的可用性要求,和你对数据本身的可用性要求,从此绑在了一起。Iceberg 即使 Catalog 服务挂了,数据文件还在 S3 上,理论上可以扫描文件重建元数据。DuckLake 做不到。
- 生态差距:Iceberg 背后是一个庞大社区的多年持续投入,Spark、Flink、Trino、Snowflake、BigQuery 全部支持。DuckLake v1.0 的多引擎支持还需补齐。这个现实不会因为设计好就改变。