在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

复制代码
sudo docker exec -it python3143 bash
root@DESKTOP-59T6U68:/# pip install chdb
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s  0:00:04
Collecting pandas<3.0.0,>=2.1.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s  0:00:00
Collecting pyarrow>=13.0.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s  0:00:01
Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s  0:00:00
Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb
Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

复制代码
root@DESKTOP-59T6U68:/# python3
Python 3.14.3 (main, Feb  4 2026, 20:08:31) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import chdb
>>> res = chdb.query('select version()', 'Pretty'); print(res)
   ┏━━━━━━━━━━━┓
   ┃ version() ┃
   ┡━━━━━━━━━━━┩
1. │ 25.8.2.1  │
   └───────────┘

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res)
"13522500"


>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res)
   ┏━━━━━━━━━━┓
   ┃ mpz_sum  ┃
   ┡━━━━━━━━━━┩
1. │ 13522500 │
   └──────────┘

>>>

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res)
{
        "meta":
        [
                {
                        "name": "mpz_sum",
                        "type": "Nullable(String)"
                }
        ],

        "data":
        [
                {
                        "mpz_sum": "13522500"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.037387861,
                "rows_read": 0,
                "bytes_read": 0
        }
}

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res)
   ┌─mpz_sum──┐
1. │ 13522500 │
   └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。

如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

复制代码
:) set input_format_tsv_crlf_end_of_line=true;

SET input_format_tsv_crlf_end_of_line = true

Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949

Ok.

0 rows in set. Elapsed: 0.009 sec.


:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: 5246c40d-2915-412c-8f28-97912d88e14f

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.067 sec.

:) set input_format_tsv_crlf_end_of_line=false;

SET input_format_tsv_crlf_end_of_line = false

Query id: 591cf022-de92-4165-80e1-9744b8bdf93c

Ok.

0 rows in set. Elapsed: 0.000 sec.

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9


Elapsed: 0.071 sec.

Received exception:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

:)

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)
SETTINGS input_format_tsv_crlf_end_of_line = true

Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.008 sec.

:) \q
Bye.

在python中只能用后一种

复制代码
>>> import chdb
>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
  File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query
    res = conn.query(sql, output_format, params=params)
RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1).
You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE)

>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res)
   ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ id ┃ name    ┃ language   ┃
   ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
1. │  1 │ Joe     │ Java       │
   ├────┼─────────┼────────────┤
2. │  2 │ Alice   │ JavaScript │
   ├────┼─────────┼────────────┤
3. │  3 │ Leon    │ C/C++      │
   ├────┼─────────┼────────────┤
4. │  4 │ William │ Java       │
   ├────┼─────────┼────────────┤
5. │  5 │ James   │ C/C++      │
   ├────┼─────────┼────────────┤
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>>
>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res)
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact')
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

相关推荐
梵刹古音1 小时前
【C++】函数重写
开发语言·c++
yuanmenghao2 小时前
Linux 性能实战 | 第 17 篇:strace 系统调用分析与性能调优 [特殊字符]
linux·python·性能优化
民国二十三画生2 小时前
C++(兼容 C 语言) 的标准输入语法,用来读取一行文本
c语言·开发语言·c++
bst@微胖子2 小时前
PyTorch深度学习框架项目合集一
人工智能·pytorch·python
Codiggerworld2 小时前
从字节码到JVM:深入理解Java的“一次编写,到处运行”魔法
java·开发语言·jvm
Boxsc_midnight2 小时前
【vLLM服务器并发能力测试程序】写一个python小程序来进行并发测试
服务器·python·vllm
深蓝电商API2 小时前
爬虫日志分析:快速定位被封原因
爬虫·python
禾叙_2 小时前
【netty】Channel
开发语言·javascript·ecmascript
云深处@3 小时前
【C++11】包装器,智能指针
开发语言·c++