调用python函数的不同方法效率对比测试

方法1:

用注册DuckDB自定义函数的方法, 我用的是1.3.2版的DuckDB, 在create_function时总是报错,

复制代码
  File "C:\d\pyduck.txt", line 32, in <module>
    duckdb.create_function("count_composition_ways_optimized", count_composition_ways_optimized, [VARCHAR, [VARCHAR]], BIGINT)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Unable to cast Python instance of type <class 'list'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

后来看到文档,
Type Annotation

When the function has type annotation it's often possible to leave out all of the optional parameters. Using DuckDBPyType we can implicitly convert many known types to DuckDBs type system.

不指定类型,反而好了。

复制代码
def count_composition_ways_optimized(a: str, b: set) -> int:
    """
    优化版本:限制子串最大长度,提高效率
    """
    n = len(a)
    if n == 0:
        return 1
    
    # 计算集合中字符串的最大长度
    max_len = max(len(word) for word in b) if b else 0
    
    dp = [0] * (n + 1)
    dp[0] = 1
    
    for i in range(1, n + 1):
        # 只检查可能的子串长度,避免不必要的检查
        for length in range(1, min(i, max_len) + 1):
            start = i - length
            if a[start:i] in b:
                dp[i] += dp[start]
            #print(i, length,start,dp)
        #print(i, dp)
    return dp[n]

a,b='bill', ['bi', 'l','ike']

print(count_composition_ways_optimized(a, b))

import duckdb
from duckdb.typing import VARCHAR, BIGINT
from duckdb import list_type, struct_type
duckdb.create_function("count_composition_ways_optimized", count_composition_ways_optimized)  #, [VARCHAR, [[VARCHAR]], BIGINT)
s="""
with recursive co as(
select trim(unnest(string_split(c, ',')))c from (from read_csv('2419-input.txt', header=0, delim='-')t(c) limit 1)), 
c2 as(select list(c) c from co), 
w as(select w from read_csv('2419-input.txt', header=0, delim='-', skip=2) t(w))
select sum(count_composition_ways_optimized(w, c)) sum_of_way from c2,w 
-- select w, count_composition_ways_optimized(w, c) sum_of_way from c2,w limit 3
"""
import time
t=time.time();print(duckdb.sql(s), time.time()-t)

上述程序的运行结果

复制代码
C:\d>python pyduck.txt
1
┌─────────────────┐
│   sum_of_way    │
│     int128      │
├─────────────────┤
│ 950763269786650 │
└─────────────────┘
 0.0038411617279052734

方法2:

用手工方法把数据拆成a、b两部分,a部分是待拆解字符串,用csv模块读取一行,b是子字符串列表。

复制代码
import sys
import time
import csv
with open('2419-input.txt', mode='r') as file:
    csvFile = csv.reader(file)
    for lines in csvFile:
        b=[i.strip() for i in lines]
        #print(lines)
        break

def count_composition_ways_optimized(a: str, b: set) -> int:
    """
    优化版本:限制子串最大长度,提高效率
    """
    n = len(a)
    if n == 0:
        return 1
    
    # 计算集合中字符串的最大长度
    max_len = max(len(word) for word in b) if b else 0
    
    dp = [0] * (n + 1)
    dp[0] = 1
    
    for i in range(1, n + 1):
        # 只检查可能的子串长度,避免不必要的检查
        for length in range(1, min(i, max_len) + 1):
            start = i - length
            if a[start:i] in b:
                dp[i] += dp[start]
    
    return dp[n]

def main():
    if len(sys.argv) != 2:
        print("用法: python script.py <文件名>")
        return
    t = time.time()
    filename = sys.argv[1]
    cnt=0
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            while True:
                line = file.readline().strip()
                if line == '':
                    break
                
                cnt+=count_composition_ways_optimized(line, b)

    except FileNotFoundError:
        print(f"错误: 文件 '{filename}' 不存在")
    except Exception as e:
        print(f"错误: {e}")
    print(cnt)
    print(f"耗时: {round(time.time()-t, 4)} s")


if __name__ == "__main__":
    main()

执行结果是

复制代码
950763269786650
耗时: 0.3117 s

算得的结果相同,反而是DuckDB调用python函数更快,估计一与DuckDB并行计算有关, 二是Python确实慢。同样的程序用pypy执行,就是下面的结果

复制代码
950763269786650
鑰楁椂: 0.0885 s

补记:

我搞错了,方法1不该把计时输出放在同一个print里,这样就正常了

复制代码
t=time.time();print(duckdb.sql(s));print(time.time()-t)
0.6274137496948242
相关推荐
代码小书生41 分钟前
statistics,一个统计的 Python 库!
开发语言·python
摇滚侠1 小时前
整洁的桌面和任务栏 Java 开发工程师提效方法
java·开发语言
千月落1 小时前
Redis数据迁移
数据库·redis·缓存
知识分享小能手1 小时前
R语言入门学习教程,从入门到精通,R语言数据计算与分组统计(9)
开发语言·学习·r语言
山居秋暝LS1 小时前
安装C++版opencv和opencv_contrib
开发语言·c++·opencv
STLearner1 小时前
SIGIR 2026 | LLM × Graph论文总结(图增强LLM,GraphRAG,Agent,多模态,知识图谱,搜索,推
人工智能·python·深度学习·神经网络·机器学习·数据挖掘·知识图谱
FreakStudio1 小时前
MicroPython 内核开发者直接狂喜!这个 Claude 插件市场,把开发全流程做成了「对话式外挂」
python·单片机·嵌入式·面向对象·并行计算·电子diy
老陈说编程2 小时前
12. LangChain 6大核心调用方法:invoke/stream/batch同步异步全解析,新手也能轻松学会
开发语言·人工智能·python·深度学习·机器学习·ai·langchain
给自己做减法2 小时前
rag混合检索
人工智能·python·rag
014-code2 小时前
Java 并发中的原子类
java·开发语言·并发