这个插件使postgresql能访问ducklake数据湖。

存储库地址:https://github.com/relytcloud/pg_ducklake

  1. 拉取docker镜像

    sudo docker pull docker.1ms.run/pgducklake/pgducklake:18-main
    输入密码
    18-main: Pulling from pgducklake/pgducklake
    d997cc310c98: Pull complete
    b5ed69009603: Pull complete
    cff374c7356c: Pull complete
    cf8420628f40: Pull complete
    159f3aaadd71: Pull complete
    ec50f454fdf0: Pull complete
    c5204b920d75: Pull complete
    b3de9652abb2: Pull complete
    f3779ed79afa: Pull complete
    eaad140a72db: Pull complete
    5be10d048583: Pull complete
    1a1f20eb8102: Pull complete
    c89a454fbafa: Pull complete
    d45fcd79ff10: Pull complete
    e80c313dbf75: Pull complete
    9a5e596e136b: Pull complete
    5bfc9ec5cce9: Pull complete
    d72962811c59: Download complete
    Digest: sha256:c7dcfa1bafa8e262fa4a6328f0e936f5b5eb3495d707df3defb9b8231d8b42fc
    Status: Downloaded newer image for docker.1ms.run/pgducklake/pgducklake:18-main
    docker.1ms.run/pgducklake/pgducklake:18-main

  2. 运行容器

    sudo docker run -d -e POSTGRES_PASSWORD=duckdb -v /home/aaa/par:/par --network host --name pgducklake docker.1ms.run/pgducklake/pgducklake:18-main
    b9dfe45ca98849164babacbf644877c7f8692265a991c16c1d56bf484c3b462f
    sudo docker exec -it pgducklake psql
    psql (18.3 (Debian 18.3-1.pgdg12+1))
    Type "help" for help.

  3. 在postgresql中测试列存储表

    postgres=# CREATE TABLE row_store_table AS
    SELECT i AS id, 'hello pg_ducklake' AS msg
    FROM generate_series(1, 10000) AS i;
    SELECT 10000
    postgres=# CREATE TABLE col_store_table USING ducklake AS
    SELECT *
    FROM row_store_table;
    SELECT 10000
    postgres=# SELECT max(id) FROM col_store_table;
    max

    10000
    (1 row)

    postgres=# CREATE TABLE titanic USING ducklake AS
    SELECT * FROM read_csv('/par/lineitem.csv');
    SELECT 59986052
    postgres=# \timing on
    Timing is on.
    postgres=# select sum(L_QUANTITY) from titanic;
    sum

    1529738036
    (1 row)

    Time: 87.255 ms
    postgres=#

查看物理存储文件

复制代码
postgres=# select * FROM ducklake.list_files('public', 'titanic');
Time: 4.600 ms

                                                   data_file                                                    | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
----------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-27a5-7b6a-a086-ed394fc5a8e9.parquet |            527428314 |                182108 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-3fb0-77ac-9702-0f8a51b83c8e.parquet |            524592064 |                181908 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-56f4-7e9d-a78f-cd792a917766.parquet |            538558467 |                186965 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-6fa9-73fc-9fa3-83c14927643c.parquet |            529799189 |                183423 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-885d-7000-be71-6bf3ecb992b1.parquet |            139023964 |                 52960 |                          |             |                        |                         | 
(5 rows)



postgres=# delete from col_store_table where id=2000;
DELETE 1
Time: 14.703 ms

postgres=# SELECT sum(id) FROM col_store_table;
   sum    
----------
 50003000
(1 row)

Time: 6.045 ms

postgres=# select * FROM ducklake.list_files('public', 'col_store_table');
Time: 6.112 ms

                                                       data_file                                                        | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet |                40468 |                   302 |                          |             |                        |                         | 
(1 row)

删除是通过给删除行在另一个文件中做标记完成的,但是我没查到delete_file的位置。

复制代码
postgres=# 
\q
sudo docker exec -it pgducklake bash
输入密码         
postgres@kylin-pc:/ls /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/
ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet
postgres@kylin-pc:/$ /par/duckdb
DuckDB v1.5.1 (Variegata)
Enter ".help" for usage hints.
memory D SELECT sum(id) FROM '/var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet';
┌─────────────────┐
│     sum(id)     │
│     int128      │
├─────────────────┤
│    50005000     │
│ (50.01 million) │
└─────────────────┘

直接查parquet文件,还是未删除行的状态。

相关推荐
黄俊懿20 分钟前
复合索引设计指南:最左前缀 & 字段排座次
数据库·sql·mysql·adb·性能优化·dba·db
桃花键神35 分钟前
【2026精品项目】基于SpringBoot3+Vue3的旧物置换系统(包含源码+项目文档+SQL脚本+部署教程)
数据库·spring boot·sql·vue
Fan_-_1 小时前
MySQL / PostgreSQL DDL 审核自动化:从人工 review 到 CI 拦截
mysql·postgresql·自动化
.柒宇.1 小时前
Redis高频面试题与跳跃表原理详解
数据库·redis·缓存
Bryce学亮1 小时前
股票数据成本分析工具
数据库
思麟呀1 小时前
MySQL表的约束
数据库·mysql
步十人1 小时前
【FastAPI】ORM-02.使用 ORM 高效处理数据库逻辑
服务器·数据库·fastapi
Apache IoTDB2 小时前
时序数据库 IoTDB + 时序智能服务平台 TimechoAI 亮相中国核电信息技术高峰论坛
数据库·时序数据库·iotdb
未若君雅裁2 小时前
Redis 和 MySQL 双写一致性:延迟双删、读写锁、MQ、Canal 怎么选?
数据库·redis·面试
罗超驿2 小时前
9.深度剖析MySQL约束的工程设计:自增主键的分布式局限、外键约束的权衡,与CHECK的版本适配实践
数据库·mysql