文章目录
-
-
-
- [openai微调 fine-tuning介绍](#openai微调 fine-tuning介绍)
- openai微调地址
- jsonl格式数据集准备
- 点击上传文件
-
-
openai微调 fine-tuning介绍
openai微调地址
网址:https://platform.openai.com/finetune
jsonl格式数据集准备
- 使用
Chinese-medical-dialogue-data
数据集 git clone
进行下载
git clone https://github.com/Toyhom/Chinese-medical-dialogue-data
-
选择其中心血管科中的部分数据进行微调
微调需要进行付费,
token
越多收费越多,并且gpt-3.5-turbo
的token
数最多为4096
-
dataframe导入csv文件
python
import pandas as pd
df = pd.read_csv('Chinese-medical-dialogue-data/样例_内科5000-6000.csv',encoding='gbk')
df
- 提取样本
python
train_data = df[df['department']=='心血管科'].iloc[0:50,:]
valid_data = df[df['department']=='心血管科'].iloc[50:70,:]
train_data
- jsonl格式数据构建
python
lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."
for index,row in train_data.iterrows():
each = []
each.append({"role":"system","content":sys_content})
each.append({"role":"user","content":row['ask']})
each.append({"role":"assistant","content":row['answer']})
#print(each)
lis1.append(each)
for index,row in valid_data.iterrows():
each = []
each.append({"role":"system","content":sys_content})
each.append({"role":"user","content":row['ask']})
each.append({"role":"assistant","content":row['answer']})
#print(each)
lis2.append(each)
lis1
- jsonl数据导出
python
lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."
for index,row in train_data.iterrows():
each = []
each.append({"role":"system","content":sys_content})
each.append({"role":"user","content":row['ask']})
each.append({"role":"assistant","content":row['answer']})
#print(each)
lis1.append(each)
for index,row in valid_data.iterrows():
each = []
each.append({"role":"system","content":sys_content})
each.append({"role":"user","content":row['ask']})
each.append({"role":"assistant","content":row['answer']})
#print(each)
lis2.append(each)
lis1
点击上传文件
- 上传文件(钱不够了)