在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

复制代码
sudo docker exec -it python3143 bash
root@DESKTOP-59T6U68:/# pip install chdb
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chdb
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s  0:00:04
Collecting pandas<3.0.0,>=2.1.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s  0:00:00
Collecting pyarrow>=13.0.0 (from chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s  0:00:01
Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s  0:00:00
Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB)
Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb
Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

复制代码
root@DESKTOP-59T6U68:/# python3
Python 3.14.3 (main, Feb  4 2026, 20:08:31) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import chdb
>>> res = chdb.query('select version()', 'Pretty'); print(res)
   ┏━━━━━━━━━━━┓
   ┃ version() ┃
   ┡━━━━━━━━━━━┩
1. │ 25.8.2.1  │
   └───────────┘

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res)
"13522500"


>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res)
   ┏━━━━━━━━━━┓
   ┃ mpz_sum  ┃
   ┡━━━━━━━━━━┩
1. │ 13522500 │
   └──────────┘

>>>

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res)
{
        "meta":
        [
                {
                        "name": "mpz_sum",
                        "type": "Nullable(String)"
                }
        ],

        "data":
        [
                {
                        "mpz_sum": "13522500"
                }
        ],

        "rows": 1,

        "statistics":
        {
                "elapsed": 0.037387861,
                "rows_read": 0,
                "bytes_read": 0
        }
}

>>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res)
   ┌─mpz_sum──┐
1. │ 13522500 │
   └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。

如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

复制代码
:) set input_format_tsv_crlf_end_of_line=true;

SET input_format_tsv_crlf_end_of_line = true

Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949

Ok.

0 rows in set. Elapsed: 0.009 sec.


:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: 5246c40d-2915-412c-8f28-97912d88e14f

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.067 sec.

:) set input_format_tsv_crlf_end_of_line=false;

SET input_format_tsv_crlf_end_of_line = false

Query id: 591cf022-de92-4165-80e1-9744b8bdf93c

Ok.

0 rows in set. Elapsed: 0.000 sec.

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)

Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9


Elapsed: 0.071 sec.

Received exception:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

:)

:) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true;

SELECT *
FROM file('/mnt/c/d/data/cxy.tsv', Tsv)
SETTINGS input_format_tsv_crlf_end_of_line = true

Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2

   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

6 rows in set. Elapsed: 0.008 sec.

:) \q
Bye.

在python中只能用后一种

复制代码
>>> import chdb
>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res)
  File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query
    res = conn.query(sql, output_format, params=params)
RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error:
Code: 117. DB::Exception:
You have carriage return (\r, 0x0D, ASCII 13) at end of first row.
It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format.
But if you really need carriage return at end of string value of last column, you need to escape it as \r
or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1).
You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE)

>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res)
   ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ id ┃ name    ┃ language   ┃
   ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩
1. │  1 │ Joe     │ Java       │
   ├────┼─────────┼────────────┤
2. │  2 │ Alice   │ JavaScript │
   ├────┼─────────┼────────────┤
3. │  3 │ Leon    │ C/C++      │
   ├────┼─────────┼────────────┤
4. │  4 │ William │ Java       │
   ├────┼─────────┼────────────┤
5. │  5 │ James   │ C/C++      │
   ├────┼─────────┼────────────┤
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>>
>>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res)
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

>>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact')
   ┌─id─┬─name────┬─language───┐
1. │  1 │ Joe     │ Java       │
2. │  2 │ Alice   │ JavaScript │
3. │  3 │ Leon    │ C/C++      │
4. │  4 │ William │ Java       │
5. │  5 │ James   │ C/C++      │
6. │  6 │ Enson   │ C/C++      │
   └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

相关推荐
方也_arkling1 天前
【Java-Day08】static / final / 枚举
java·开发语言
风吹夏回1 天前
Python 全局异常处理:从“满屏 try-except”到优雅兜底
开发语言·python
Chengbei111 天前
一站式源码安全检测工具、云安全 / APP / 小程序源码敏感信息递归多层目录扫描AK、JWT、手机号、身份证等敏感信息
java·开发语言·安全·web安全·网络安全·系统安全·安全架构
llz_1121 天前
web-第一次课后作业
java·开发语言·idea
小熊Coding1 天前
Python爬取当当网二手图书项目实战!
开发语言·爬虫·python·beautifulsoup·requests·二手图书
秋91 天前
Java项目运行5天左右自动宕机:系统性定位与解决方案
java·开发语言·python
小江的记录本1 天前
【JVM虚拟机】垃圾回收GC:垃圾收集器:CMS:核心原理、回收流程、优缺点、废弃原因(附《思维导图》+《面试高频考点清单》)
java·jvm·后端·python·spring·面试·maven
xiaoshuaishuai81 天前
C# 内存管理与资源泄漏
开发语言·c#
lsx2024061 天前
SVN 检出操作
开发语言
田里的水稻1 天前
OE_ubuntu26.04与宿主机之间复制粘贴内容
人工智能·python·机器人