有一个50列的表格,里面都是英文,要翻译成中文:
在ChatGPT中输入提示词:
你是一个开发AI大模型应用的Python编程专家,要完成以下任务的Python脚本:
打开Excel文件:"F:\AI自媒体内容\AI行业数据分析\poetop50bots.xlsx"
读取A2到B51这个区域中的每一个单元格内容,
调用deepseek-chat模型(上下文长度32K,最大输出长度4K)来将单元格的内容翻译成中文;
模型的base_url为:https://api.deepseek.com
模型的api_key为:XXX
temperature 参数设置认为 1.1
提示词为:把英文内容翻译为中文
调用deepseek-chat模型API的示例参照【】里面的内容:
【# Please install OpenAI SDK first:`pip3 install openai`
from openai import OpenAI
client = OpenAI(api_key="<deepseek api key>", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"},
],
stream=False
)
print(response.choices[0].message.content)
模型返回的json数据示例:
{
"id": "65c327b06948c8d635c8316c6885d95e",
"choices": [
{
"index": 0,
"message": {
"content": "Hello! How can I assist you with your programming or computer science questions today?",
"role": "assistant"
},
"finish_reason": "stop",
"logprobs": null
}
],
"created": 1717069572,
"model": "deepseek-coder",
"system_fingerprint": "fp_ded2115e5a",
"object": "chat.completion",
"usage": {
"prompt_tokens": 18,
"completion_tokens": 16,
"total_tokens": 34
}
}
】
解析返回的JSON数据,获取翻译后的文本,将翻译后的文本内容写入一个新的表格文件中;
注意:
每一步都要输出相关信息到屏幕上
如果单元格的文本长度或者翻译返回的文本长度超过模型限制,那就进行拆分,然后组合在一起;
Chatpgt生成的代码:
import pandas as pd
import requests
import json
读取Excel文件
file_path = "F:\\AI自媒体内容\\AI行业数据分析\\poetop50bots.xlsx"
df = pd.read_excel(file_path, header=None, usecols="A:B", skiprows=1, nrows=50)
DeepSeek API配置信息
base_url = "https://api.deepseek.com"
api_key = "XXX"
model = "deepseek-chat"
def translate_text(text):
headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}
data = {
"model": model,
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": f"把英文内容翻译为中文:{text}"}
],
"stream": False
}
response = requests.post(f"{base_url}/v1/chat/completions", headers=headers, json=data)
response.raise_for_status() # 检查请求是否成功
translated_text = response.json()['choices'][0]['message']['content']
return translated_text
def split_text(text, max_length):
return [text[i:i + max_length] for i in range(0, len(text), max_length)]
输出每一步的信息到屏幕上
print(f"读取文件: {file_path}")
for index, row in df.iterrows():
for col in [0, 1]: # 遍历A和B列
text = row[col]
if pd.isna(text):
continue
text = str(text)
print(f"正在翻译单元格({index+2}, {chr(col+65)}): {text}")
max_input_length = 32000
max_output_length = 4000
if len(text) > max_input_length:
parts = split_text(text, max_input_length)
translated_parts = [translate_text(part) for part in parts]
translated_text = ''.join(translated_parts)
else:
translated_text = translate_text(text)
print(f"翻译结果: {translated_text}")
将翻译后的文本写入DataFrame中
df.iat[index, col] = translated_text
将结果写回到新的Excel文件中
new_file_path = "F:\\AI自媒体内容\\AI行业数据分析\\translated_poetop50bots.xlsx"
df.to_excel(new_file_path, index=False, header=False, startrow=1)
print(f"翻译完成,结果已写入文件: {new_file_path}")