R语言读取单细胞转录组基因表达矩阵loom文件

以GSE160756数据集为例,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE160756

下载上传服务器,解压缩为loom文件后,先尝试用python来打开。

import loompy

import numpy as np

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

print(ds.shape) # 输出矩阵维度

print(ds.row_attrs"Gene":10) # 输出前10个基因名

print(ds.col_attrs"CellID":5) # 输出前5个细胞ID

出现报错:

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 1634, in connect

return LoomConnection(filename, mode, validate=validate)

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 86, in init

raise ValueError("\n".join(lv.errors) + f"\n{filename} does not appear to be a valid Loom file according to Loom spec version '{lv.version}'")

ValueError: Row attribute 'gene_names' dtype object is not allowed

Column attribute 'cell_names' dtype object is not allowed

For help, see http://linnarssonlab.org/loompy/format/

GSM4878538_umi_hNP_1.loom does not appear to be a valid Loom file according to Loom spec version '0.0.0'

不知道为啥在python打不开。

R:

安装相关R包:

BiocManager::install("LoomExperiment")

remotes::install_github("aertslab/SCopeLoomR")

最终R运行:

library(hdf5r)

library(loomR)

library(LoomExperiment)

library(SCopeLoomR)

conn <- connect("GSM4878538_umi_hNP_1.loom")

再查看conn:

> conn

Class: loom

Filename: /xxx/GSM4878538_umi_hNP_1.loom

Access type: H5F_ACC_RDONLY

Attributes: version, chunks

Listing:

name obj_type dataset.dims dataset.type_class

col_attrs H5I_GROUP <NA> <NA>

col_graphs H5I_GROUP <NA> <NA>

layers H5I_GROUP <NA> <NA>

matrix H5I_DATASET 12385 x 26418 H5T_FLOAT

row_attrs H5I_GROUP <NA> <NA>

row_graphs H5I_GROUP <NA> <NA>

读取出矩阵:

> GCMat <- as.data.frame(conn\["matrix"],)

> GCMat1:4,1:4

V1 V2 V3 V4

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

> colnames(GCMat) <- conn\["row_attrs/gene_names"]\[\]

> rownames(GCMat) <- conn\["col_attrs/cell_names"]\[\]

> GCMat1:4,1:5

AL627309.1 AL627309.6 AL627309.5 AL627309.4

hNP_1_AAACCCACAAAGACTA-1-1 0 0 0 0

hNP_1_AAACCCACACTTCTCG-1-1 0 0 0 0

hNP_1_AAACCCACAGCGTATT-1-1 0 0 0 0

hNP_1_AAACCCACATCTGGGC-1-1 0 0 0 0

FO538757.1

hNP_1_AAACCCACAAAGACTA-1-1 0

hNP_1_AAACCCACACTTCTCG-1-1 0

hNP_1_AAACCCACAGCGTATT-1-1 0

hNP_1_AAACCCACATCTGGGC-1-1 0

注意要转置:

> GCMat <- t(GCMat)

转换成稀疏矩阵:

> GCMat <- as(GCMat,"dgCMatrix")

> GCMat1:4,1:4

4 x 4 sparse Matrix of class "dgCMatrix"

hNP_1_AAACCCACAAAGACTA-1-1 hNP_1_AAACCCACACTTCTCG-1-1

AL627309.1 . .

AL627309.6 . .

AL627309.5 . .

AL627309.4 . .

替换细胞名前面的编号,以方便管理:

AllCells <- colnames(GCMat)

NewCells <- c()

for(i in AllCells)

{

TCell <- gsub("hNP_1zhong","hNP-3",i)

NewCells <- c(NewCells,TCell)

}

colnames(GCMat) <- NewCells

相关推荐
LDR00617 小时前
Type-C 快充全面升级!LDR6601 赋能个人护理便携电机,重塑剃须刀 / 理发器新体验
c语言·开发语言
雪碧聊技术17 小时前
Tree.js是什么?一文讲透
开发语言·javascript·ecmascript
码云数智-园园18 小时前
C++20 Modules 模块详解
java·开发语言·spring
swordbob18 小时前
NIO的channel中什么是 fd(File Descriptor,文件描述符)
java·开发语言·nio
源分享19 小时前
Java线程同步的多种实现方法(非常详细)
java·开发语言·jvm
Luminous.19 小时前
C语言--day30
c语言·开发语言
何以解忧,唯有..19 小时前
Go语言循环语句详解:for、range与循环控制
开发语言·算法·golang
謓泽19 小时前
C语言不是语法,是通往机器的地图。
c语言·开发语言
云水一下19 小时前
从零开始学 PHP 系列(一):PHP 的前世今生与开发环境搭建
开发语言·php
飞天狗11119 小时前
零基础JavaWeb入门——第五课第二小节:九大内置对象 · 第2个:response(响应对象)
java·开发语言