在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

复制代码
sudo docker exec -it python3143 bash
root@DESKTOP-59T6U68:/# pip install chdb
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s  0:00:04
Collecting pandas<3.0.0,>=2.1.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s  0:00:00
Collecting pyarrow>=13.0.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s  0:00:01
Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s  0:00:00
Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb
Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

复制代码
root@DESKTOP-59T6U68:/# python3
Python 3.14.3 (main, Feb  4 2026, 20:08:31) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import chdb
>>> res = chdb.query('select version()', 'Pretty'); print(res)
   ┏━━━━━━━━━━━┓
   ┃ version() ┃
   ┡━━━━━━━━━━━┩
1. │ 25.8.2.1  │
   └───────────┘

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res)
"13522500"


>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res)
   ┏━━━━━━━━━━┓
   ┃ mpz_sum  ┃
   ┡━━━━━━━━━━┩
1. │ 13522500 │
   └──────────┘

>>>

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res)
{
        "meta":
        [
                {
                        "name": "mpz_sum",
                        "type": "Nullable(String)"
                }
        ],

        "data":
        [
                {
                        "mpz_sum": "13522500"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.037387861,
                "rows_read": 0,
                "bytes_read": 0
        }
}

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res)
   ┌─mpz_sum──┐
1. │ 13522500 │
   └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。

如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

复制代码
:) set input_format_tsv_crlf_end_of_line=true;

SET input_format_tsv_crlf_end_of_line = true

Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949

Ok.

0 rows in set. Elapsed: 0.009 sec.


:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: 5246c40d-2915-412c-8f28-97912d88e14f

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.067 sec.

:) set input_format_tsv_crlf_end_of_line=false;

SET input_format_tsv_crlf_end_of_line = false

Query id: 591cf022-de92-4165-80e1-9744b8bdf93c

Ok.

0 rows in set. Elapsed: 0.000 sec.

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9


Elapsed: 0.071 sec.

Received exception:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

:)

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)
SETTINGS input_format_tsv_crlf_end_of_line = true

Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.008 sec.

:) \q
Bye.

在python中只能用后一种

复制代码
>>> import chdb
>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
  File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query
    res = conn.query(sql, output_format, params=params)
RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1).
You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE)

>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res)
   ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ id ┃ name    ┃ language   ┃
   ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
1. │  1 │ Joe     │ Java       │
   ├────┼─────────┼────────────┤
2. │  2 │ Alice   │ JavaScript │
   ├────┼─────────┼────────────┤
3. │  3 │ Leon    │ C/C++      │
   ├────┼─────────┼────────────┤
4. │  4 │ William │ Java       │
   ├────┼─────────┼────────────┤
5. │  5 │ James   │ C/C++      │
   ├────┼─────────┼────────────┤
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>>
>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res)
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact')
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

相关推荐
唐叔在学习1 小时前
就算没有服务器,我照样能够同步数据
后端·python·程序员
曲幽3 小时前
FastAPI流式输出实战与避坑指南:让AI像人一样“边想边说”
python·ai·fastapi·web·stream·chat·async·generator·ollama
Flittly3 小时前
【从零手写 AI Agent:learn-claude-code 项目实战笔记】(1)The Agent Loop (智能体循环)
python·agent
vivo互联网技术5 小时前
ICLR2026 | 视频虚化新突破!Any-to-Bokeh 一键生成电影感连贯效果
人工智能·python·深度学习
敏编程6 小时前
一天一个Python库:virtualenv - 隔离你的Python环境,保持项目整洁
python
喝茶与编码8 小时前
Python异步并发控制:asyncio.gather 与 Semaphore 协同设计解析
后端·python
zone77398 小时前
003:RAG 入门-LangChain 读取图片数据
后端·python·面试
用户8356290780518 小时前
在 PowerPoint 中用 Python 添加和定制形状的完整教程
后端·python
用户962377954489 小时前
🚀 docx2md-picgo:Word 文档图片一键上传图床工具
python·markdown