-
建立虚拟环境
安装python扩展包duckdb、polars和pandaspython -m venv pwin313
pwin313\scripts\activate.bat
pwin313)C:\d\pwin313>pip install duckdb polars pandas -
在python中引入扩展包,安装加载duckdb插件
pwin313) C:\d\pwin313>python
Python 3.13.2 (tags/v3.13.2:4f8bb39, Feb 4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.import duckdb
duckdb.sql("install excel")
duckdb.sql("install rusty_sheet from community")
import timeduckdb.sql("load excel")
duckdb.sql("load rusty_sheet")
rusty_sheet是 v0.4.2版。
分别测试两种插件不转成数据框和转成polars、duckdb数据框
>>> t=time.time();duckdb.sql("select * from read_xlsx('/d/lineitem.xlsx')");t1=time.time();print(t1-t)
0.3100736141204834
>>> t=time.time();duckdb.sql("select * from read_xlsx('/d/lineitem.xlsx')").pl();t1=time.time();print(t1-t)
4.7497313022613525
>>> t=time.time();duckdb.sql("select * from read_xlsx('/d/lineitem.xlsx')").fetchdf();t1=time.time();print(t1-t)
4.8916003704071045
>>> t=time.time();duckdb.sql("select * from read_sheet('/d/lineitem.xlsx',range='2:')");t1=time.time();print(t1-t)
4.251304626464844
>>> t=time.time();duckdb.sql("select * from read_sheet('/d/lineitem.xlsx',range='2:')").pl();t1=time.time();print(t1-t)
9.083962440490723
>>> t=time.time();duckdb.sql("select * from read_sheet('/d/lineitem.xlsx',range='2:')").fetchdf();t1=time.time();print(t1-t)
9.565066576004028
可见,read_xlsx不转数据框,则不实际读取数据,而read_sheet读取。