说在前面
近日因使用open ai的api调用,因数据量大,故不能一一申请调用gpt api,现来学习一下pyhton并行处理的功能ThreadPoolExecutor。它可以开n个线程,给定任务量m,这n个线程会同时处理这m个任务,直至任务量处理完毕。
代码
python
def request_openai(system_prompt):
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": system_prompt}]
)
# 获取token信息
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens
return response.choices[0].message.content, total_tokens
# 使用ThreadPoolExecutor并发处理多个请求
def handle_multiple_requests(prompts):
# Create a ThreadPoolExecutor to process multiple tasks concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
# Submit each task for processing
futures = [executor.submit(request_openai, prompt) for prompt in prompts]
# Process the results as they complete
for future in concurrent.futures.as_completed(futures):
try:
response, tokens = future.result()
except Exception as e:
print(f"Error processing: {e}")
讲解:开通线程数量 max_workers=3,任务函数 request_openai,任务数据 prompts。
解释:假设1000条prompt,该代码功能就是这3个线程并行调用这1000条prompts,这三个线程会轮流处理这些任务,当一个线程完成一个任务后,它会去处理下一个任务,直到所有任务都被处理完毕。
llm_output:llm的输出数据的格式是由request_openai函数中规定的,而future.result()可以获取每次调用LLM的输出结果。一个future代表着一次调用gpt api。
改进版本代码
python
def request_openai(item):
prompt = get_prompts(item)
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
total_tokens = response.usage.total_tokens
return response.choices[0].message.content, total_tokens
open_path = './data/chinese1/testData.json'
with open(open_path, 'r', encoding='utf-8') as file:
data = json.load(file)
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
# 使用字典 futures 来存储 future 对象,键是 future 对象,值是对应的 item
futures = {executor.submit(request_openai, item) : item for item in data[:2]}
with tqdm(total=total_tasks, desc="Processing") as pbar:
for idx, future in enumerate(concurrent.futures.as_completed(futures)):
item = futures[future]
try:
response_dict, total_tokens = future.result()
# 更新进度条
pbar.update(1)
except Exception as e:
print(f"Error processing id {item['id']}: {e}")
# 更新进度条即便发生错误
pbar.update(1)
不同之处:在这里为了能让 for future里面也能使用data数据,把每个data对象保存到字典futures中。最后再加上tqdm显示进度条。