使用 ChatGPT 为生物信息学初学者赋能

论文:Empowering Beginners in Bioinformatics with ChatGPT. 2023

对于生信初学者而言,最大的困难是身边没有经验丰富的人给予指导。而ChatGTP的出现可能改变这一现状,学生可以自己作为导师,指导ChatGPT完成数据分析工作。

众所周知,与ChatGPT互动,给予的指令越精确,那么它给出的答案越精准。这篇论文提出一个与ChatGPT互动的模型:OPTICAL。其基本思想是通过迭代不断优化给予ChatGPT的指令。

该模型的流程图如下:

  1. 给予初始提示。

  2. 机器人产生分析代码。

  3. 运行代码。

    如果出现错误,转向优化提示词。

    如果代码正确,继续下一步。

  4. 评估结果。

    如果结果不符合预期,转向优化提示词。

    如果结果符合预期,继续下一步。

  5. 审查代码,得到最终提示词并归档方法。

这个模型本身平平无奇,符合平常人们使用ChatGPT的习惯:即不断优化提示词,直至得到正确答案。下面两个案例很好地体现了这一过程。

案例一:下一代测序的短读段比对和视觉检查

定义聊天机器人的行为:

Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand.

迭代0

I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, and then covert it to bigwig file. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash.

迭代1

E::idx_find_and_load\] Could not retrieve index file for 'ENCFF000AVS_1m.bam' **迭代2** Wait, I saw that you have "samtools index" before "bamcoverage". Does bamcoverage as bam to be sorted before using as input? **审查代码** I need to insert line-by-line comments to the below code which works well to address the needs for the data analysis task. Wait for my code. **最终提示词(粗体字是经过迭代加入的提示细节):** Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, **index it** , and then covert it to bigwig file with **CPM normalization**. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash. **安全二:推断DNA序列的分子进化系统发育树** **定义聊天机器人的行为:** Act as an experienced bioinformatician proficient in R, you will write code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. **迭代0** You have a multiple alignment file named as tp53.clustal in ClustalW format. Please write R code that can load the file, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. **迭代1** I got an error message complaining "could not find function "read.alignment". Please fix it. **迭代2** I got a warning message " In dist.dna(aln) : NAs introduced by coercion". Please fix it. **迭代3** I wrote an R program to read a multiple alignment file named as tp53.clustal in ClustalW format, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. But I want to root the tree with the Zebrafish sequence as the outgroup. Can you help me revise the R code? Below is my R code. ```php # Load the required packages library(seqinr) library(ape) # Read in the alignment file aln <- read.alignment("tp53.clustal", format="clustal") # Calculate the evolutionary distance dist <- dist.dna(as.DNAbin(aln)) # Build the NJ tree tree <- nj(dist) # Plot the phylogeny plot(tree) ``` **迭代4** I got an error message complaining " Error in nj(dist, outgroup = zebrafish_idx) unused argument (outgroup = zebrafish_idx)". Please fix it. **迭代5** I got an error message complaining "Error in if (newroot == ROOT) { : argument is of length zero". Please fix it. **审查代码** I created the following R code. Please add inline comments. **最终提示词** 无。 ### 关于简说基因 * #### 生信平台 Galaxy中国(UseGalaxy.cn)致力于打造中国人的云上生物信息基础设施。大量在线工具免费使用。无需安装,用完即走。活跃的用户社区,随时交流使用心得。 * #### 生信培训 #### 简说基因的生信培训班,荣获学员的一致好评。如果你也对生物信息学感兴趣,欢迎来跟[**简说基因,学真生信**]()。 * #### 生信分析 #### 我们能够承接所有 NGS 组学数据分析业务,包括但不限于 WGS / WES / RNA-seq 等。基因组组装、注释,以及各种重测序业务都可以与简说基因合作。

相关推荐
needn2 小时前
TRAE为什么要发布SOLO版本?
人工智能·ai编程
毅航2 小时前
自然语言处理发展史:从规则、统计到深度学习
人工智能·后端
前端付豪2 小时前
LangChain链 写一篇完美推文?用SequencialChain链接不同的组件
人工智能·python·langchain
ursazoo3 小时前
写了一份 7000字指南,让 AI 帮我消化每天的信息流
人工智能·开源·github
_志哥_6 小时前
Superpowers 技术指南:让 AI 编程助手拥有超能力
人工智能·ai编程·测试
YongGit7 小时前
OpenClaw 本地 AI 助手完全指南:飞书接入 + 远程部署实战
人工智能
程序员鱼皮8 小时前
斯坦福大学竟然开了个 AI 编程课?!我已经学上了
人工智能·ai编程
星浩AI9 小时前
Skill 的核心要素与渐进式加载架构——如何设计一个生产可用的 Skill?
人工智能·agent
树獭非懒9 小时前
告别繁琐多端开发:DivKit 带你玩转 Server-Driven UI!
android·前端·人工智能
阿尔的代码屋9 小时前
[大模型实战 07] 基于 LlamaIndex ReAct 框架手搓全自动博客监控 Agent
人工智能·python