在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

复制代码
sudo docker exec -it python3143 bash
root@DESKTOP-59T6U68:/# pip install chdb
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s  0:00:04
Collecting pandas<3.0.0,>=2.1.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s  0:00:00
Collecting pyarrow>=13.0.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s  0:00:01
Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s  0:00:00
Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb
Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

复制代码
root@DESKTOP-59T6U68:/# python3
Python 3.14.3 (main, Feb  4 2026, 20:08:31) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import chdb
>>> res = chdb.query('select version()', 'Pretty'); print(res)
   ┏━━━━━━━━━━━┓
   ┃ version() ┃
   ┡━━━━━━━━━━━┩
1. │ 25.8.2.1  │
   └───────────┘

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res)
"13522500"


>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res)
   ┏━━━━━━━━━━┓
   ┃ mpz_sum  ┃
   ┡━━━━━━━━━━┩
1. │ 13522500 │
   └──────────┘

>>>

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res)
{
        "meta":
        [
                {
                        "name": "mpz_sum",
                        "type": "Nullable(String)"
                }
        ],

        "data":
        [
                {
                        "mpz_sum": "13522500"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.037387861,
                "rows_read": 0,
                "bytes_read": 0
        }
}

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res)
   ┌─mpz_sum──┐
1. │ 13522500 │
   └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。

如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

复制代码
:) set input_format_tsv_crlf_end_of_line=true;

SET input_format_tsv_crlf_end_of_line = true

Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949

Ok.

0 rows in set. Elapsed: 0.009 sec.


:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: 5246c40d-2915-412c-8f28-97912d88e14f

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.067 sec.

:) set input_format_tsv_crlf_end_of_line=false;

SET input_format_tsv_crlf_end_of_line = false

Query id: 591cf022-de92-4165-80e1-9744b8bdf93c

Ok.

0 rows in set. Elapsed: 0.000 sec.

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9


Elapsed: 0.071 sec.

Received exception:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

:)

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)
SETTINGS input_format_tsv_crlf_end_of_line = true

Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.008 sec.

:) \q
Bye.

在python中只能用后一种

复制代码
>>> import chdb
>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
  File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query
    res = conn.query(sql, output_format, params=params)
RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1).
You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE)

>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res)
   ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ id ┃ name    ┃ language   ┃
   ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
1. │  1 │ Joe     │ Java       │
   ├────┼─────────┼────────────┤
2. │  2 │ Alice   │ JavaScript │
   ├────┼─────────┼────────────┤
3. │  3 │ Leon    │ C/C++      │
   ├────┼─────────┼────────────┤
4. │  4 │ William │ Java       │
   ├────┼─────────┼────────────┤
5. │  5 │ James   │ C/C++      │
   ├────┼─────────┼────────────┤
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>>
>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res)
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact')
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

相关推荐
m0_377618233 分钟前
C# 异步范围Asynchronous Disposal方法 C# await using如何使用
jvm·数据库·python
沐雪轻挽萤5 分钟前
10. C++17新特性-保证的拷贝消除 (Guaranteed Copy Elision / RVO)
开发语言·c++
qq_2837200514 分钟前
Python 模块精讲:platform 获取系统信息,从入门到实战全攻略
python·platform
河阿里20 分钟前
Java-JWT令牌技术深度指南
java·开发语言
vb攻城狮23 分钟前
前端害怕被蒸馏 快速入门Python 【demo_03】
python
qq_1898070330 分钟前
SQL多表嵌套查询数据重复怎么办_使用DISTINCT去重优化策略
jvm·数据库·python
m0_7478545237 分钟前
mysql如何设置数据库连接字符编码_修改default-character
jvm·数据库·python
文静小土豆1 小时前
Java 应用上 K8s 全指南:从部署到治理的生产级实践
java·开发语言·kubernetes
Wyz201210241 小时前
如何在 React 中正确将父组件函数传递给子组件并触发调用
jvm·数据库·python
2401_865439631 小时前
Go语言如何用logrus_Go语言logrus日志框架教程【技巧】
jvm·数据库·python