这个插件使postgresql能访问ducklake数据湖。

存储库地址:https://github.com/relytcloud/pg_ducklake

  1. 拉取docker镜像

    sudo docker pull docker.1ms.run/pgducklake/pgducklake:18-main
    输入密码
    18-main: Pulling from pgducklake/pgducklake
    d997cc310c98: Pull complete
    b5ed69009603: Pull complete
    cff374c7356c: Pull complete
    cf8420628f40: Pull complete
    159f3aaadd71: Pull complete
    ec50f454fdf0: Pull complete
    c5204b920d75: Pull complete
    b3de9652abb2: Pull complete
    f3779ed79afa: Pull complete
    eaad140a72db: Pull complete
    5be10d048583: Pull complete
    1a1f20eb8102: Pull complete
    c89a454fbafa: Pull complete
    d45fcd79ff10: Pull complete
    e80c313dbf75: Pull complete
    9a5e596e136b: Pull complete
    5bfc9ec5cce9: Pull complete
    d72962811c59: Download complete
    Digest: sha256:c7dcfa1bafa8e262fa4a6328f0e936f5b5eb3495d707df3defb9b8231d8b42fc
    Status: Downloaded newer image for docker.1ms.run/pgducklake/pgducklake:18-main
    docker.1ms.run/pgducklake/pgducklake:18-main

  2. 运行容器

    sudo docker run -d -e POSTGRES_PASSWORD=duckdb -v /home/aaa/par:/par --network host --name pgducklake docker.1ms.run/pgducklake/pgducklake:18-main
    b9dfe45ca98849164babacbf644877c7f8692265a991c16c1d56bf484c3b462f
    sudo docker exec -it pgducklake psql
    psql (18.3 (Debian 18.3-1.pgdg12+1))
    Type "help" for help.

  3. 在postgresql中测试列存储表

    postgres=# CREATE TABLE row_store_table AS
    SELECT i AS id, 'hello pg_ducklake' AS msg
    FROM generate_series(1, 10000) AS i;
    SELECT 10000
    postgres=# CREATE TABLE col_store_table USING ducklake AS
    SELECT *
    FROM row_store_table;
    SELECT 10000
    postgres=# SELECT max(id) FROM col_store_table;
    max

    10000
    (1 row)

    postgres=# CREATE TABLE titanic USING ducklake AS
    SELECT * FROM read_csv('/par/lineitem.csv');
    SELECT 59986052
    postgres=# \timing on
    Timing is on.
    postgres=# select sum(L_QUANTITY) from titanic;
    sum

    1529738036
    (1 row)

    Time: 87.255 ms
    postgres=#

查看物理存储文件

复制代码
postgres=# select * FROM ducklake.list_files('public', 'titanic');
Time: 4.600 ms

                                                   data_file                                                    | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
----------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-27a5-7b6a-a086-ed394fc5a8e9.parquet |            527428314 |                182108 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-3fb0-77ac-9702-0f8a51b83c8e.parquet |            524592064 |                181908 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-56f4-7e9d-a78f-cd792a917766.parquet |            538558467 |                186965 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-6fa9-73fc-9fa3-83c14927643c.parquet |            529799189 |                183423 |                          |             |                        |                         | 
 /var/lib/postgresql/18/docker/pg_ducklake/public/titanic/ducklake-019d3c29-885d-7000-be71-6bf3ecb992b1.parquet |            139023964 |                 52960 |                          |             |                        |                         | 
(5 rows)



postgres=# delete from col_store_table where id=2000;
DELETE 1
Time: 14.703 ms

postgres=# SELECT sum(id) FROM col_store_table;
   sum    
----------
 50003000
(1 row)

Time: 6.045 ms

postgres=# select * FROM ducklake.list_files('public', 'col_store_table');
Time: 6.112 ms

                                                       data_file                                                        | data_file_size_bytes | data_file_footer_size | data_file_encryption_key | delete_file | delete_file_size_bytes | delete_file_footer_size | delete_file_encryption_key 
------------------------------------------------------------------------------------------------------------------------+----------------------+-----------------------+--------------------------+-------------+------------------------+-------------------------+----------------------------
 /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet |                40468 |                   302 |                          |             |                        |                         | 
(1 row)

删除是通过给删除行在另一个文件中做标记完成的,但是我没查到delete_file的位置。

复制代码
postgres=# 
\q
sudo docker exec -it pgducklake bash
输入密码         
postgres@kylin-pc:/ls /var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/
ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet
postgres@kylin-pc:/$ /par/duckdb
DuckDB v1.5.1 (Variegata)
Enter ".help" for usage hints.
memory D SELECT sum(id) FROM '/var/lib/postgresql/18/docker/pg_ducklake/public/col_store_table/ducklake-019d3c25-7e58-7cfd-a1ea-d83222a57b7a.parquet';
┌─────────────────┐
│     sum(id)     │
│     int128      │
├─────────────────┤
│    50005000     │
│ (50.01 million) │
└─────────────────┘

直接查parquet文件,还是未删除行的状态。

相关推荐
步十人14 小时前
【Redis】持久化机制
数据库·redis·缓存
Quincy_Freak15 小时前
银河麒麟aarch64如何高效做数据分析?分享一款内网离线数据分析利器
大数据·数据库·数据挖掘·数据分析·aarch64
香气袭人知骤暖15 小时前
PG数据库 Docker 容器自动备份方案
数据库·docker·容器
me83215 小时前
【Linux】Linux 目录命名规范溯源(Linux各个目录究竟是干嘛的)
linux·运维·数据库
土狗TuGou15 小时前
SQL内功笔记 · 第2篇:列的约束
数据库·笔记·sql
java_cj16 小时前
MySQL 执行原理深度剖析:查询成本计算与优化器内幕
数据库·后端·mysql
java_cj16 小时前
数据库范式化设计与性能优化全攻略
数据库·后端·性能优化·架构·开源
Noushiki16 小时前
MySQL索引优化实战:高效查询的黄金法则
数据库·sql·mysql
TDengine (老段)16 小时前
TDengine Commit 与 Flush 机制 — 从内存到磁盘的数据落盘全流程
大数据·数据库·物联网·架构·时序数据库·iot·tdengine
ID_1800790547317 小时前
(淘宝 / 京东)商品评论 API 接口:技术实战案例与架构分析
服务器·数据库·架构