R语言读取单细胞转录组基因表达矩阵loom文件

以GSE160756数据集为例,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE160756

下载上传服务器,解压缩为loom文件后,先尝试用python来打开。

import loompy

import numpy as np

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

print(ds.shape) # 输出矩阵维度

print(ds.row_attrs["Gene"][:10]) # 输出前10个基因名

print(ds.col_attrs["CellID"][:5]) # 输出前5个细胞ID

出现报错:

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 1634, in connect

return LoomConnection(filename, mode, validate=validate)

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 86, in init

raise ValueError("\n".join(lv.errors) + f"\n{filename} does not appear to be a valid Loom file according to Loom spec version '{lv.version}'")

ValueError: Row attribute 'gene_names' dtype object is not allowed

Column attribute 'cell_names' dtype object is not allowed

For help, see http://linnarssonlab.org/loompy/format/

GSM4878538_umi_hNP_1.loom does not appear to be a valid Loom file according to Loom spec version '0.0.0'

不知道为啥在python打不开。

R:

安装相关R包:

BiocManager::install("LoomExperiment")

remotes::install_github("aertslab/SCopeLoomR")

最终R运行:

library(hdf5r)

library(loomR)

library(LoomExperiment)

library(SCopeLoomR)

conn <- connect("GSM4878538_umi_hNP_1.loom")

再查看conn:

> conn

Class: loom

Filename: /xxx/GSM4878538_umi_hNP_1.loom

Access type: H5F_ACC_RDONLY

Attributes: version, chunks

Listing:

name obj_type dataset.dims dataset.type_class

col_attrs H5I_GROUP <NA> <NA>

col_graphs H5I_GROUP <NA> <NA>

layers H5I_GROUP <NA> <NA>

matrix H5I_DATASET 12385 x 26418 H5T_FLOAT

row_attrs H5I_GROUP <NA> <NA>

row_graphs H5I_GROUP <NA> <NA>

读取出矩阵:

> GCMat <- as.data.frame(conn[["matrix"]][,])

> GCMat[1:4,1:4]

V1 V2 V3 V4

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

> colnames(GCMat) <- conn[["row_attrs/gene_names"]][]

> rownames(GCMat) <- conn[["col_attrs/cell_names"]][]

> GCMat[1:4,1:5]

AL627309.1 AL627309.6 AL627309.5 AL627309.4

hNP_1_AAACCCACAAAGACTA-1-1 0 0 0 0

hNP_1_AAACCCACACTTCTCG-1-1 0 0 0 0

hNP_1_AAACCCACAGCGTATT-1-1 0 0 0 0

hNP_1_AAACCCACATCTGGGC-1-1 0 0 0 0

FO538757.1

hNP_1_AAACCCACAAAGACTA-1-1 0

hNP_1_AAACCCACACTTCTCG-1-1 0

hNP_1_AAACCCACAGCGTATT-1-1 0

hNP_1_AAACCCACATCTGGGC-1-1 0

注意要转置:

> GCMat <- t(GCMat)

转换成稀疏矩阵:

> GCMat <- as(GCMat,"dgCMatrix")

> GCMat[1:4,1:4]

4 x 4 sparse Matrix of class "dgCMatrix"

hNP_1_AAACCCACAAAGACTA-1-1 hNP_1_AAACCCACACTTCTCG-1-1

AL627309.1 . .

AL627309.6 . .

AL627309.5 . .

AL627309.4 . .

替换细胞名前面的编号,以方便管理:

AllCells <- colnames(GCMat)

NewCells <- c()

for(i in AllCells)

{

TCell <- gsub("hNP_1zhong","hNP-3",i)

NewCells <- c(NewCells,TCell)

}

colnames(GCMat) <- NewCells

相关推荐
小成202303202652 小时前
Linux高级02
linux·开发语言
知行合一。。。2 小时前
Python--04--数据容器(总结)
开发语言·python
咸鱼2.02 小时前
【java入门到放弃】需要背诵
java·开发语言
ZK_H2 小时前
嵌入式c语言——关键字其6
c语言·开发语言·计算机网络·面试·职场和发展
A.A呐2 小时前
【C++第二十九章】IO流
开发语言·c++
椰猫子2 小时前
Java:异常(exception)
java·开发语言
lifewange2 小时前
pytest-类中测试方法、多文件批量执行
开发语言·python·pytest
cmpxr_3 小时前
【C】原码和补码以及环形坐标取模算法
c语言·开发语言·算法
2401_827499993 小时前
python项目实战09-AI智能伴侣(ai_partner_5-6)
开发语言·python
PD我是你的真爱粉3 小时前
MCP 协议详解:从架构、工作流到 Python 技术栈落地
开发语言·python·架构