这个插件使postgresql能访问ducklake数据湖。

存储库地址:https://github.com/relytcloud/pg_ducklake

  1. 拉取docker镜像

    sudo docker pull docker.1ms.run/pgducklake/pgducklake:18-main
    输入密码
    18-main: Pulling from pgducklake/pgducklake
    d997cc310c98: Pull complete
    b5ed69009603: Pull complete
    cff374c7356c: Pull complete
    cf8420628f40: Pull complete
    159f3aaadd71: Pull complete
    ec50f454fdf0: Pull complete
    c5204b920d75: Pull complete
    b3de9652abb2: Pull complete
    f3779ed79afa: Pull complete
    eaad140a72db: Pull complete
    5be10d048583: Pull complete
    1a1f20eb8102: Pull complete
    c89a454fbafa: Pull complete
    d45fcd79ff10: Pull complete
    e80c313dbf75: Pull complete
    9a5e596e136b: Pull complete
    5bfc9ec5cce9: Pull complete
    d72962811c59: Download complete
    Digest: sha256:c7dcfa1bafa8e262fa4a6328f0e936f5b5eb3495d707df3defb9b8231d8b42fc
    Status: Downloaded newer image for docker.1ms.run/pgducklake/pgducklake:18-main
    docker.1ms.run/pgducklake/pgducklake:18-main

  2. 运行容器

    sudo docker run -d -e POSTGRES_PASSWORD=duckdb -v /home/aaa/par:/par --network host --name pgducklake docker.1ms.run/pgducklake/pgducklake:18-main
    b9dfe45ca98849164babacbf644877c7f8692265a991c16c1d56bf484c3b462f
    sudo docker exec -it pgducklake psql
    psql (18.3 (Debian 18.3-1.pgdg12+1))
    Type "help" for help.

  3. 在postgresql中测试列存储表

    postgres=# CREATE TABLE row_store_table AS
    SELECT i AS id, 'hello pg_ducklake' AS msg
    FROM generate_series(1, 10000) AS i;
    SELECT 10000
    postgres=# CREATE TABLE col_store_table USING ducklake AS
    SELECT *
    FROM row_store_table;
    SELECT 10000
    postgres=# SELECT max(id) FROM col_store_table;
    max

    10000
    (1 row)

    postgres=# CREATE TABLE titanic USING ducklake AS
    SELECT * FROM read_csv('/par/lineitem.csv');
    SELECT 59986052
    postgres=# \timing on
    Timing is on.
    postgres=# select sum(L_QUANTITY) from titanic;
    sum

    1529738036
    (1 row)

    Time: 87.255 ms
    postgres=#

查看物理存储文件

复制代码
postgres=# select * FROM ducklake.list_files('public', 'titanic');
Time: 4.600 ms

                                                   data_file                                                    | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
----------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-27a5-7b6a-a086-ed394fc5a8e9.parquet |            527428314 |                182108 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-3fb0-77ac-9702-0f8a51b83c8e.parquet |            524592064 |                181908 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-56f4-7e9d-a78f-cd792a917766.parquet |            538558467 |                186965 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-6fa9-73fc-9fa3-83c14927643c.parquet |            529799189 |                183423 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-885d-7000-be71-6bf3ecb992b1.parquet |            139023964 |                 52960 |                          |             |                        |                         | 
(5 rows)



postgres=# delete from col_store_table where id=2000;
DELETE 1
Time: 14.703 ms

postgres=# SELECT sum(id) FROM col_store_table;
   sum    
----------
 50003000
(1 row)

Time: 6.045 ms

postgres=# select * FROM ducklake.list_files('public', 'col_store_table');
Time: 6.112 ms

                                                       data_file                                                        | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet |                40468 |                   302 |                          |             |                        |                         | 
(1 row)

删除是通过给删除行在另一个文件中做标记完成的,但是我没查到delete_file的位置。

复制代码
postgres=# 
\q
sudo docker exec -it pgducklake bash
输入密码         
postgres@kylin-pc:/ls /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/
ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet
postgres@kylin-pc:/$ /par/duckdb
DuckDB v1.5.1 (Variegata)
Enter ".help" for usage hints.
memory D SELECT sum(id) FROM '/var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet';
┌─────────────────┐
│     sum(id)     │
│     int128      │
├─────────────────┤
│    50005000     │
│ (50.01 million) │
└─────────────────┘

直接查parquet文件,还是未删除行的状态。

相关推荐
samson_www2 小时前
用nssm部署FASTAPI服务
数据库·python·fastapi
@insist1232 小时前
数据库系统工程师-分布式数据库与数据仓库核心考点及应用体系
数据库·数据仓库·分布式·软考·数据库系统工程师·软件水平考试
电商API&Tina2 小时前
唯品会数据采集API接口||电商API数据采集
java·javascript·数据库·python·sql·json
回到原点的码农2 小时前
maven导入spring框架
数据库·spring·maven
Wyawsl3 小时前
Mysql数据库备份与恢复
数据库·mysql
AIminminHu3 小时前
OpenGL渲染与几何内核那点事-项目实践理论补充(一-1-(4):GstarCAD / AutoCAD 客户端相关产品 —— 深入骨髓的数据库哲学)
数据库·几何·cad开发
二等饼干~za8986683 小时前
豆包GEO优化源码开发全解析:技术架构、实现逻辑与实操指南
数据库·sql·重构·架构·mybatis·音视频
青梅煮酒与君饮3 小时前
深度刨析RAG检索增强
数据库·人工智能·深度学习·语言模型·知识图谱