单细胞分析|将 Seurat 与多模态数据结合使用

数据加载

同时测量同一细胞的多种数据类型的能力(称为多模态分析)代表了单细胞基因组学的一个令人兴奋的新前沿。例如,CITE-seq 能够同时测量同一细胞的转录组和细胞表面蛋白。其他令人兴奋的多模式技术,例如 10x 多组试剂盒,可以对细胞转录组和染色质可及性进行测量(即 scRNA-seq+scATAC-seq)。 Seurat 可以实现各种多模态单细胞数据集的无缝存储、分析和探索。

本教程中,我们介绍了创建多模态 Seurat 对象并执行初始分析的工作流程。例如,我们演示了如何根据测量的细胞转录组对 CITE-seq 数据集进行聚类,并随后发现每个聚类中富集的细胞表面蛋白。我们注意到,Seurat 还支持更先进的技术来分析多模态数据,特别是我们的加权最近邻 (WNN) 方法的应用,该方法能够基于两种模态的加权组合同时对细胞进行聚类。

在这里,我们分析了 8,617 个脐带血单核细胞 (CBMC) 的数据集,其中转录组测量与 11 种表面蛋白的丰度估计值配对,其水平通过 DNA 条形码抗体进行量化。首先,我们加载两个计数矩阵:一个用于 RNA 测量,另一个用于抗体衍生标签 (ADT)。您可以在此处下载数据。

R 复制代码
library(Seurat)
library(ggplot2)
library(patchwork)

# Load in the RNA UMI matrix

# Note that this dataset also contains ~5% of mouse cells, which we can use as negative
# controls for the protein measurements. For this reason, the gene expression matrix has
# HUMAN_ or MOUSE_ appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "/brahms/shared/vignette-data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz",
    sep = ",", header = TRUE, row.names = 1))

# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)

# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "/brahms/shared/vignette-data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz",
    sep = ",", header = TRUE, row.names = 1))

# Note that since measurements were made in the same cells, the two matrices have identical
# column names
all.equal(colnames(cbmc.rna), colnames(cbmc.adt))
## [1] TRUE

构建Seurat对象

现在我们创建一个 Seurat 对象,并添加 ADT 数据作为第二个assay

R 复制代码
# creates a Seurat object based on the scRNA-seq data
cbmc <- CreateSeuratObject(counts = cbmc.rna)

# We can see that by default, the cbmc object contains an assay storing RNA measurement
Assays(cbmc)
## [1] "RNA"

# create a new assay to store ADT information
adt_assay <- CreateAssay5Object(counts = cbmc.adt)

# add this assay to the previously created Seurat object
cbmc[["ADT"]] <- adt_assay

# Validate that the object now contains multiple assays
Assays(cbmc)
## [1] "RNA" "ADT"

# Extract a list of features measured in the ADT assay
rownames(cbmc[["ADT"]])
##  [1] "CD3"    "CD4"    "CD8"    "CD45RA" "CD56"   "CD16"   "CD10"   "CD11c" 
##  [9] "CD14"   "CD19"   "CD34"   "CCR5"   "CCR7"


# Note that we can easily switch back and forth between the two assays to specify the default
# for visualization and analysis

# List the current default assay
DefaultAssay(cbmc)
## [1] "RNA"

# Switch the default to ADT
DefaultAssay(cbmc) <- "ADT"
DefaultAssay(cbmc)
## [1] "ADT"

聚类

以下步骤基于 scRNA-seq 数据的 PBMC 快速聚类

R 复制代码
# Note that all operations below are performed on the RNA assay Set and verify that the
# default assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay(cbmc)
## [1] "RNA"

# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)

数据可视化

现在我们已经从 scRNA-seq 配置文件中获得了簇,我们可以可视化数据集中蛋白质或 RNA 分子的表达。重要的是,Seurat 提供了几种在模态之间切换的方法,并指定您有兴趣分析或可视化的模态。这一点特别重要,因为在某些情况下,相同的特征可以以多种方式出现 - 例如,该数据集包含 B 细胞标记 CD19(蛋白质和 RNA 水平)的独立测量结果。

R 复制代码
# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"

# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")

# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can
# visualize one or the other
DefaultAssay(cbmc) <- "ADT"
p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
DefaultAssay(cbmc) <- "RNA"
p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")

# place plots side-by-side
p1 | p2
R 复制代码
# Alternately, we can use specific assay keys to specify a specific modality Identify the key
# for the RNA and protein assays
Key(cbmc[["RNA"]])
## [1] "rna_"

Key(cbmc[["ADT"]])
## [1] "adt_"

# Now, we can include the key in the feature name, which overrides the default assay
p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA")
p1 | p2

marker 鉴定

我们可以利用配对的 CITE-seq 测量来帮助注释来自 scRNA-seq 的簇,并识别蛋白质和 RNA 标记。

R 复制代码
# as we know that CD19 is a B cell marker, we can identify cluster 6 as expressing CD19 on the
# surface
VlnPlot(cbmc, "adt_CD19")
R 复制代码
# we can also identify alternative protein and RNA markers for this cluster through
# differential expression
adt_markers <- FindMarkers(cbmc, ident.1 = 6, assay = "ADT")
rna_markers <- FindMarkers(cbmc, ident.1 = 6, assay = "RNA")

head(adt_markers)
##                p_val avg_log2FC pct.1 pct.2     p_val_adj
## CD19   2.067533e-215  2.5741873     1     1 2.687793e-214
## CD45RA 8.108073e-109  0.5300346     1     1 1.054049e-107
## CD4    1.123162e-107 -1.6707420     1     1 1.460110e-106
## CD14   7.212876e-106 -1.0332070     1     1 9.376739e-105
## CD3     1.639633e-87 -1.5823056     1     1  2.131523e-86
## CCR5    2.552859e-63  0.3753989     1     1  3.318716e-62

head(rna_markers)
##       p_val avg_log2FC pct.1 pct.2 p_val_adj
## IGHM      0   6.660187 0.977 0.044         0
## CD79A     0   6.748356 0.965 0.045         0
## TCL1A     0   7.428099 0.904 0.028         0
## CD79B     0   5.525568 0.944 0.089         0
## IGHD      0   7.811884 0.857 0.015         0
## MS4A1     0   7.523215 0.851 0.016         0

附加可视化

R 复制代码
# Draw ADT scatter plots (like biaxial plots for FACS). Note that you can even 'gate' cells if
# desired by using HoverLocator and FeatureLocator
FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")
R 复制代码
# view relationship between protein and RNA
FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")
R 复制代码
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8")
R 复制代码
# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high,
# particularly in comparison to RNA values. This is due to the significantly higher protein
# copy number in cells, which significantly reduces 'drop-out' in ADT data
FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")

10X多模态数据加载

Seurat 还能够分析使用 CellRanger v3 处理的多模态 10X 实验的数据;例如,我们使用 7,900 个外周血单核细胞 (PBMC) 的数据集重新创建了上面的图。

R 复制代码
pbmc10k.data <- Read10X(data.dir = "/brahms/shared/vignette-data/pbmc10k/filtered_feature_bc_matrix/")
rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "",
    x = rownames(x = pbmc10k.data[["Antibody Capture"]]))

pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)
pbmc10k <- NormalizeData(pbmc10k)
pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])
pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")

plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1)
plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1)
plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1)
(plot1 + plot2 + plot3) & NoLegend()
R 复制代码
plot <- FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3") + NoLegend() + theme(axis.title = element_text(size = 18),
    legend.text = element_text(size = 18))
ggsave(filename = "../output/images/citeseq_plot.jpg", height = 7, width = 12, plot = plot, quality = 50)

未完待续,持续更新,欢迎关注!

本文由mdnice多平台发布

相关推荐
uzong4 小时前
技术故障复盘模版
后端
GetcharZp4 小时前
基于 Dify + 通义千问的多模态大模型 搭建发票识别 Agent
后端·llm·agent
桦说编程5 小时前
Java 中如何创建不可变类型
java·后端·函数式编程
IT毕设实战小研5 小时前
基于Spring Boot 4s店车辆管理系统 租车管理系统 停车位管理系统 智慧车辆管理系统
java·开发语言·spring boot·后端·spring·毕业设计·课程设计
wyiyiyi5 小时前
【Web后端】Django、flask及其场景——以构建系统原型为例
前端·数据库·后端·python·django·flask
阿华的代码王国6 小时前
【Android】RecyclerView复用CheckBox的异常状态
android·xml·java·前端·后端
Jimmy6 小时前
AI 代理是什么,其有助于我们实现更智能编程
前端·后端·ai编程
AntBlack7 小时前
不当韭菜V1.1 :增强能力 ,辅助构建自己的交易规则
后端·python·pyqt
bobz9658 小时前
pip install 已经不再安全
后端
寻月隐君8 小时前
硬核实战:从零到一,用 Rust 和 Axum 构建高性能聊天服务后端
后端·rust·github