在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

复制代码
sudo docker exec -it python3143 bash
root@DESKTOP-59T6U68:/# pip install chdb
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s  0:00:04
Collecting pandas<3.0.0,>=2.1.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s  0:00:00
Collecting pyarrow>=13.0.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s  0:00:01
Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s  0:00:00
Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb
Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

复制代码
root@DESKTOP-59T6U68:/# python3
Python 3.14.3 (main, Feb  4 2026, 20:08:31) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import chdb
>>> res = chdb.query('select version()', 'Pretty'); print(res)
   ┏━━━━━━━━━━━┓
   ┃ version() ┃
   ┡━━━━━━━━━━━┩
1. │ 25.8.2.1  │
   └───────────┘

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res)
"13522500"


>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res)
   ┏━━━━━━━━━━┓
   ┃ mpz_sum  ┃
   ┡━━━━━━━━━━┩
1. │ 13522500 │
   └──────────┘

>>>

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res)
{
        "meta":
        [
                {
                        "name": "mpz_sum",
                        "type": "Nullable(String)"
                }
        ],

        "data":
        [
                {
                        "mpz_sum": "13522500"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.037387861,
                "rows_read": 0,
                "bytes_read": 0
        }
}

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res)
   ┌─mpz_sum──┐
1. │ 13522500 │
   └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。

如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

复制代码
:) set input_format_tsv_crlf_end_of_line=true;

SET input_format_tsv_crlf_end_of_line = true

Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949

Ok.

0 rows in set. Elapsed: 0.009 sec.


:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: 5246c40d-2915-412c-8f28-97912d88e14f

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.067 sec.

:) set input_format_tsv_crlf_end_of_line=false;

SET input_format_tsv_crlf_end_of_line = false

Query id: 591cf022-de92-4165-80e1-9744b8bdf93c

Ok.

0 rows in set. Elapsed: 0.000 sec.

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9


Elapsed: 0.071 sec.

Received exception:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

:)

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)
SETTINGS input_format_tsv_crlf_end_of_line = true

Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.008 sec.

:) \q
Bye.

在python中只能用后一种

复制代码
>>> import chdb
>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
  File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query
    res = conn.query(sql, output_format, params=params)
RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1).
You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE)

>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res)
   ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ id ┃ name    ┃ language   ┃
   ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
1. │  1 │ Joe     │ Java       │
   ├────┼─────────┼────────────┤
2. │  2 │ Alice   │ JavaScript │
   ├────┼─────────┼────────────┤
3. │  3 │ Leon    │ C/C++      │
   ├────┼─────────┼────────────┤
4. │  4 │ William │ Java       │
   ├────┼─────────┼────────────┤
5. │  5 │ James   │ C/C++      │
   ├────┼─────────┼────────────┤
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>>
>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res)
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact')
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

相关推荐
ZTLJQ1 天前
序列化的艺术:Python JSON处理完全解析
开发语言·python·json
2401_891482171 天前
多平台UI框架C++开发
开发语言·c++·算法
H5css�海秀1 天前
今天是自学大模型的第一天(sanjose)
后端·python·node.js·php
阿贵---1 天前
使用XGBoost赢得Kaggle比赛
jvm·数据库·python
88号技师1 天前
2026年3月中科院一区SCI-贝塞尔曲线优化算法Bezier curve-based optimization-附Matlab免费代码
开发语言·算法·matlab·优化算法
t198751281 天前
三维点云最小二乘拟合MATLAB程序
开发语言·算法·matlab
无敌昊哥战神1 天前
【LeetCode 257】二叉树的所有路径(回溯法/深度优先遍历)- Python/C/C++详细题解
c语言·c++·python·leetcode·深度优先
m0_726965981 天前
面面面,面面(1)
java·开发语言
2401_831920741 天前
分布式系统安全通信
开发语言·c++·算法
~无忧花开~1 天前
React状态管理完全指南
开发语言·前端·javascript·react.js·前端框架