使用 ChatGPT 为生物信息学初学者赋能

论文:Empowering Beginners in Bioinformatics with ChatGPT. 2023

对于生信初学者而言,最大的困难是身边没有经验丰富的人给予指导。而ChatGTP的出现可能改变这一现状,学生可以自己作为导师,指导ChatGPT完成数据分析工作。

众所周知,与ChatGPT互动,给予的指令越精确,那么它给出的答案越精准。这篇论文提出一个与ChatGPT互动的模型:OPTICAL。其基本思想是通过迭代不断优化给予ChatGPT的指令。

该模型的流程图如下:

  1. 给予初始提示。

  2. 机器人产生分析代码。

  3. 运行代码。

    如果出现错误,转向优化提示词。

    如果代码正确,继续下一步。

  4. 评估结果。

    如果结果不符合预期,转向优化提示词。

    如果结果符合预期,继续下一步。

  5. 审查代码,得到最终提示词并归档方法。

这个模型本身平平无奇,符合平常人们使用ChatGPT的习惯:即不断优化提示词,直至得到正确答案。下面两个案例很好地体现了这一过程。

案例一:下一代测序的短读段比对和视觉检查

定义聊天机器人的行为:

Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand.

迭代0

I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, and then covert it to bigwig file. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash.

迭代1

E::idx_find_and_load\] Could not retrieve index file for 'ENCFF000AVS_1m.bam' **迭代2** Wait, I saw that you have "samtools index" before "bamcoverage". Does bamcoverage as bam to be sorted before using as input? **审查代码** I need to insert line-by-line comments to the below code which works well to address the needs for the data analysis task. Wait for my code. **最终提示词(粗体字是经过迭代加入的提示细节):** Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, **index it** , and then covert it to bigwig file with **CPM normalization**. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash. **安全二:推断DNA序列的分子进化系统发育树** **定义聊天机器人的行为:** Act as an experienced bioinformatician proficient in R, you will write code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. **迭代0** You have a multiple alignment file named as tp53.clustal in ClustalW format. Please write R code that can load the file, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. **迭代1** I got an error message complaining "could not find function "read.alignment". Please fix it. **迭代2** I got a warning message " In dist.dna(aln) : NAs introduced by coercion". Please fix it. **迭代3** I wrote an R program to read a multiple alignment file named as tp53.clustal in ClustalW format, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. But I want to root the tree with the Zebrafish sequence as the outgroup. Can you help me revise the R code? Below is my R code. ```php # Load the required packages library(seqinr) library(ape) # Read in the alignment file aln <- read.alignment("tp53.clustal", format="clustal") # Calculate the evolutionary distance dist <- dist.dna(as.DNAbin(aln)) # Build the NJ tree tree <- nj(dist) # Plot the phylogeny plot(tree) ``` **迭代4** I got an error message complaining " Error in nj(dist, outgroup = zebrafish_idx) unused argument (outgroup = zebrafish_idx)". Please fix it. **迭代5** I got an error message complaining "Error in if (newroot == ROOT) { : argument is of length zero". Please fix it. **审查代码** I created the following R code. Please add inline comments. **最终提示词** 无。 ### 关于简说基因 * #### 生信平台 Galaxy中国(UseGalaxy.cn)致力于打造中国人的云上生物信息基础设施。大量在线工具免费使用。无需安装,用完即走。活跃的用户社区,随时交流使用心得。 * #### 生信培训 #### 简说基因的生信培训班,荣获学员的一致好评。如果你也对生物信息学感兴趣,欢迎来跟[**简说基因,学真生信**]()。 * #### 生信分析 #### 我们能够承接所有 NGS 组学数据分析业务,包括但不限于 WGS / WES / RNA-seq 等。基因组组装、注释,以及各种重测序业务都可以与简说基因合作。

相关推荐
zandy10111 分钟前
指标管理 + AI:衡石科技如何让业务指标“自动洞察、主动预警”
人工智能·科技
viperrrrrrrrrr77 分钟前
开源模型如何盈利
人工智能·开源·deepseek-v4
一瞬祈望8 分钟前
⭐ 深度学习入门体系(第 19 篇): 过拟合,它是什么?为什么会发生?又该如何解决?
人工智能·深度学习
jiayong2311 分钟前
model.onnx 深度分析报告(系列汇总)
人工智能·机器学习·自动化
CV-杨帆12 分钟前
论文阅读:arxiv 2026 Extracting books from production language models
论文阅读·人工智能
斯文by累14 分钟前
AI产品推荐:NoteBookLM
人工智能
week_泽14 分钟前
第2课:深度剖析AI Agent核心模块 - 学习笔记_2
人工智能·笔记·学习·ai agent
沙漠的浪人18 分钟前
Deep Research 怎么才算 "Deep"
人工智能·agent
Oflycomm18 分钟前
高通推出新一代机器人全栈技术,加速家用与人形机器人物理 AI 落地
人工智能·高通·wifi7模块·ces2026·qogrisys·欧飞信
沙漠的浪人20 分钟前
多Agent系统中的用户干预(Human-in-the-Loop)设计
人工智能·agent