BayesPrism 增速 400 倍？！免疫反卷积算法 InstaPrism

生信碱移

BayesPrism 加速

计算细胞类型去卷积是一种重要的分析技术，用于模拟整体基因表达数据的成分异质性。之前小编给大家介绍过一篇子刊文章，其综合比较了多种细胞去卷积算法（图1）。研究的结果显示，BayesPrism 作为一种全新的贝叶斯方法，在反卷积的准确程度和对模型误差的鲁棒性方面具有优越性。

▲ 图1：点击图片跳转到推文。

尽管如此，由于 BayesPrism 依赖于 Gibbs 采样，因此其计算成本比标准方法高出多个数量级。为此，另一项研究介绍了 InstaPrism 算法，它通过用定点算法替换 BayesPrism 中耗时的 Gibbs 采样步骤，在一个去随机化框架中重新实现了 BayesPrism。根据作者的表述，InstaPrism 在实现相同性能的条件下，将算法的运行时间大大缩短了 400 倍，并且运行所占用的内存降低了 20 倍。

▲ InstaPrism的运行速度，来源于 gihub 描述：https://github.com/humengying0907/InstaPrism

这篇推文中，小编给大家简单介绍一下这个 R 包的使用。当然，感兴趣的铁子也可以进入上面的 github 链接看看。

一、R包安装

可以使用以下代码安装该 R 包：

复制代码

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!require("Biobase", quietly = TRUE))
    BiocManager::install("Biobase")

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("humengying0907/InstaPrism")

一、执行反卷积

① 加载单细胞参考对象 。该包内置了一部分癌症的单细胞参考，如果你要使用已经准备好的参考对象，则可以使用下方代码，OV 特指卵巢癌：

复制代码

OV_ref <- InstaPrism_reference('OV') 
str(OV_ref)
#Formal class 'refPhi_cs' [package "InstaPrism"] with 2 slots
#  ..@ phi.cs: num [1:32053, 1:40] 1.30e-08 1.00e-08 3.86e-07 5.21e-08 1.30e-08 ...
#  .. ..- attr(*, "dimnames")=List of 2
#  .. .. ..$ : chr [1:32053] "MIR1302-2HG" "OR4F5" "AL627309.1" "AL627309.3" ...
#  .. .. ..$ : chr [1:40] "Cycling.cancer.cell.2" "Cycling.cancer.cell.1" "fallopian tube secretory epithelial cell" "Cancer.cell.1" ...
#  ..@ map   :List of 9
#  .. ..$ malignant       : chr [1:11] "Cycling.cancer.cell.2" "Cycling.cancer.cell.1" "fallopian tube secretory epithelial cell" "Cancer.cell.1" ...
#  .. ..$ fibroblast      : chr "fibroblast"
#  .. ..$ T cell          : chr [1:11] "T cell" "CD8.T.cytotoxic" "CD4.T.reg" "CD4.T.naive" ...
#  .. ..$ endothelial cell: chr "endothelial cell"
#  .. ..$ plasma cell     : chr "plasma cell"
#  .. ..$ B cell          : chr "B cell"
#  .. ..$ dendritic cell  : chr [1:4] "pDC" "mDC" "cDC2" "cDC1"
#  .. ..$ mast cell       : chr "Mast.cell"
#  .. ..$ monocyte        : chr [1:9] "Cycling.M" "M1.S100A8" "M2.CXCL10" "M2.SELENOP" ...

其他已经准备好的参考有：

当然，也可以使用下方代码自定义单细胞参考对象：

复制代码

sc.dat <- readRDS("10.sc.dat.rds")           # 处理好的单细胞count矩阵
sc.dat[1:5, 1:5]
#                 A1BG A1CF A2M A2M-AS1 A2ML1
#HCC9_COL18_ROW25    2    0   2       0     0
#HCC9_COL19_ROW39    1   28   0       0     0
#HCC9_COL12_ROW23    1    0   0       0     0
#HCC1_COL14_ROW31   26   98 264       0     3
#HCC1_COL6_ROW4    115  121   1       0     0

cell_meta <- readRDS("10.cell_meta.rds")     # 处理好的单细胞注释文件
cell_meta[1:5, c("orig.ident", "Celltype")]
# A tibble: 5 × 2
#  orig.ident Celltype   
#  <fct>      <fct>      
#1 HCC9       Hepatocytes
#2 HCC9       Hepatocytes
#3 HCC9       Hepatocytes
#4 HCC1       Hepatocytes
#5 HCC1       Hepatocytes

HCC_ref <- refPrepare(t(sc.dat),
                     cell_meta$Celltype,
                     cell_meta$Celltype, 
                     pseudo.min = 1e-08)

② 读入表达矩阵，这里读入的是示例数据的表达矩阵（行为基因，列为样本，填充值为基因的表达量）：

复制代码

bulk_expr <- read.csv(system.file('extdata',
                                  'example_bulk.csv',
                                  package = 'InstaPrism')) 
bulk_expr[1:5, 1:5]
#           simulated1 simulated2  simulated3 simulated4 simulated5
#AL627309.1   4.083809   2.506129  0.13908015 11.0578698   3.548361
#LINC00115    1.339467   3.788035  0.04873877  0.4734699   1.364752
#SAMD11       1.310678   2.586227  0.00000000  0.6388871   1.059823
#NOC2L       51.436820  64.370597 48.12664982 25.8605285  46.726963
#HES4        18.179669  17.233774 20.57675914 13.4103796   6.820316

③ 执行 InstaPrism 估计 ，随后可以使用 deconv_res@Post.ini.ct@theta 获得细胞的 theta 估计值：

复制代码

deconv_res <- InstaPrism(bulk_Expr = bulk_expr,
                         refPhi_cs = OV_ref)

# 使用 @Post.ini.ct@theta 即可获得细胞的 theta 估计值
estimated_frac <-  t(deconv_res@Post.ini.ct@theta)
head(estimated_frac)

write.csv(estimated_frac, "theta.csv", quote = F, row.names = T) # 保存结果

# 也可以使用 get_Z_array 函数获得反卷积的基因表达 Z
get_Z_array(deconv_res)

增速400倍？

感兴趣的老铁可以试试