使用 ChatGPT 为生物信息学初学者赋能

论文:Empowering Beginners in Bioinformatics with ChatGPT. 2023

对于生信初学者而言,最大的困难是身边没有经验丰富的人给予指导。而ChatGTP的出现可能改变这一现状,学生可以自己作为导师,指导ChatGPT完成数据分析工作。

众所周知,与ChatGPT互动,给予的指令越精确,那么它给出的答案越精准。这篇论文提出一个与ChatGPT互动的模型:OPTICAL。其基本思想是通过迭代不断优化给予ChatGPT的指令。

该模型的流程图如下:

  1. 给予初始提示。

  2. 机器人产生分析代码。

  3. 运行代码。

    如果出现错误,转向优化提示词。

    如果代码正确,继续下一步。

  4. 评估结果。

    如果结果不符合预期,转向优化提示词。

    如果结果符合预期,继续下一步。

  5. 审查代码,得到最终提示词并归档方法。

这个模型本身平平无奇,符合平常人们使用ChatGPT的习惯:即不断优化提示词,直至得到正确答案。下面两个案例很好地体现了这一过程。

案例一:下一代测序的短读段比对和视觉检查

定义聊天机器人的行为:

Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand.

迭代0

I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, and then covert it to bigwig file. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash.

迭代1

E::idx_find_and_load\] Could not retrieve index file for 'ENCFF000AVS_1m.bam' **迭代2** Wait, I saw that you have "samtools index" before "bamcoverage". Does bamcoverage as bam to be sorted before using as input? **审查代码** I need to insert line-by-line comments to the below code which works well to address the needs for the data analysis task. Wait for my code. **最终提示词(粗体字是经过迭代加入的提示细节):** Act as an experienced bioinformatician proficient in ChIP-Seq data analysis, you will assist me by writing code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. I have two fastq files in current folder from single-end sequencing of a ChIP-Seq library: ENCFF000AVS_1m.fastq.gz, and ENCFF000AVS_10m.fastq.gz. For each fastq file, align reads to the human reference genome, save to bam file, **index it** , and then covert it to bigwig file with **CPM normalization**. Tools to use: bowtie2, samtools, and deepTools. The index for bowtie2 is in the folder "../data/indx/bowtie2_whole_genome/" with "hg38" as the prefix. Use 24 CPU for the alignment. Please draft the code in bash. **安全二:推断DNA序列的分子进化系统发育树** **定义聊天机器人的行为:** Act as an experienced bioinformatician proficient in R, you will write code with number of lines as minimal as possible. Rest the thread if asked to. Reply "YES" if understand. **迭代0** You have a multiple alignment file named as tp53.clustal in ClustalW format. Please write R code that can load the file, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. **迭代1** I got an error message complaining "could not find function "read.alignment". Please fix it. **迭代2** I got a warning message " In dist.dna(aln) : NAs introduced by coercion". Please fix it. **迭代3** I wrote an R program to read a multiple alignment file named as tp53.clustal in ClustalW format, calculate evolutionary distance, build a NJ tree, and visualize the phylogeny. But I want to root the tree with the Zebrafish sequence as the outgroup. Can you help me revise the R code? Below is my R code. ```php # Load the required packages library(seqinr) library(ape) # Read in the alignment file aln <- read.alignment("tp53.clustal", format="clustal") # Calculate the evolutionary distance dist <- dist.dna(as.DNAbin(aln)) # Build the NJ tree tree <- nj(dist) # Plot the phylogeny plot(tree) ``` **迭代4** I got an error message complaining " Error in nj(dist, outgroup = zebrafish_idx) unused argument (outgroup = zebrafish_idx)". Please fix it. **迭代5** I got an error message complaining "Error in if (newroot == ROOT) { : argument is of length zero". Please fix it. **审查代码** I created the following R code. Please add inline comments. **最终提示词** 无。 ### 关于简说基因 * #### 生信平台 Galaxy中国(UseGalaxy.cn)致力于打造中国人的云上生物信息基础设施。大量在线工具免费使用。无需安装,用完即走。活跃的用户社区,随时交流使用心得。 * #### 生信培训 #### 简说基因的生信培训班,荣获学员的一致好评。如果你也对生物信息学感兴趣,欢迎来跟[**简说基因,学真生信**]()。 * #### 生信分析 #### 我们能够承接所有 NGS 组学数据分析业务,包括但不限于 WGS / WES / RNA-seq 等。基因组组装、注释,以及各种重测序业务都可以与简说基因合作。

相关推荐
Tech Synapse5 分钟前
深度解析3D模型生成器:基于StyleGAN3与PyTorch3D的多风格生成工具开发实战
人工智能·pytorch·深度学习·3d
Qforepost15 分钟前
英伟达有意入股 PsiQuantum,释放战略转向量子计算的重要信号
人工智能·量子计算·量子
Blossom.11816 分钟前
量子计算在金融科技中的应用前景
大数据·人工智能·安全·机器学习·计算机视觉·金融·量子计算
旺仔溜溜没25 分钟前
Autodl训练Faster-RCNN网络(自己的数据集)
人工智能·深度学习·神经网络·cnn
广州华锐视点1 小时前
VR 航天科普,沉浸式体验宇宙奥秘
人工智能·vr
纠结哥_Shrek1 小时前
ollama+open-webui搭建可视化大模型聊天
人工智能·电商·ollama·open-webui
熊猫在哪1 小时前
野火鲁班猫(arrch64架构debian)从零实现用MobileFaceNet算法进行实时人脸识别(一)conda环境搭建
linux·人工智能·python·嵌入式硬件·神经网络·机器学习·边缘计算
技能咖2 小时前
人工智能解析:技术革命下的认知重构
人工智能·重构
腾讯云qcloud07552 小时前
腾讯位置服务重构出行行业的技术底层逻辑
人工智能·重构·智慧城市
拓端研究室TRL2 小时前
MATLAB贝叶斯超参数优化LSTM预测设备寿命应用——以航空发动机退化数据为例
开发语言·人工智能·rnn·matlab·lstm