OpenAI开放gpt-3.5turbo微调fine-tuning测试教程

文章目录

- - - [openai微调 fine-tuning介绍](#openai微调 fine-tuning介绍)
    - openai微调地址
    - jsonl格式数据集准备
    - 点击上传文件

openai微调 fine-tuning介绍

openai微调地址

网址：https://platform.openai.com/finetune

jsonl格式数据集准备

使用Chinese-medical-dialogue-data数据集
git clone进行下载

git clone https://github.com/Toyhom/Chinese-medical-dialogue-data

选择其中心血管科中的部分数据进行微调

微调需要进行付费，token越多收费越多，并且gpt-3.5-turbo的token数最多为4096
dataframe导入csv文件

python 复制代码

import pandas as pd

df = pd.read_csv('Chinese-medical-dialogue-data/样例_内科5000-6000.csv',encoding='gbk')

df

提取样本

python 复制代码

train_data = df[df['department']=='心血管科'].iloc[0:50,:]
valid_data = df[df['department']=='心血管科'].iloc[50:70,:]

train_data

jsonl格式数据构建

python 复制代码

lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."

for index,row in train_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis1.append(each)

for index,row in valid_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis2.append(each)

lis1

jsonl数据导出

python 复制代码

lis1 = []
lis2 = []
sys_content = "You are a specialist in cardiovascular disease and you will apply your expertise to give your specialized answers to patients."

for index,row in train_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis1.append(each)

for index,row in valid_data.iterrows():
    each = []
    each.append({"role":"system","content":sys_content})
    each.append({"role":"user","content":row['ask']})
    each.append({"role":"assistant","content":row['answer']})
    #print(each)
    lis2.append(each)

lis1

点击上传文件

上传文件(钱不够了)