R语言读取单细胞转录组基因表达矩阵loom文件

以GSE160756数据集为例,https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE160756

下载上传服务器,解压缩为loom文件后,先尝试用python来打开。

import loompy

import numpy as np

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

print(ds.shape) # 输出矩阵维度

print(ds.row_attrs["Gene"][:10]) # 输出前10个基因名

print(ds.col_attrs["CellID"][:5]) # 输出前5个细胞ID

出现报错:

with loompy.connect("GSM4878538_umi_hNP_1.loom") as ds:

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 1634, in connect

return LoomConnection(filename, mode, validate=validate)

File "/usr/local/lib/python3.9/site-packages/loompy/loompy.py", line 86, in init

raise ValueError("\n".join(lv.errors) + f"\n{filename} does not appear to be a valid Loom file according to Loom spec version '{lv.version}'")

ValueError: Row attribute 'gene_names' dtype object is not allowed

Column attribute 'cell_names' dtype object is not allowed

For help, see http://linnarssonlab.org/loompy/format/

GSM4878538_umi_hNP_1.loom does not appear to be a valid Loom file according to Loom spec version '0.0.0'

不知道为啥在python打不开。

R:

安装相关R包:

BiocManager::install("LoomExperiment")

remotes::install_github("aertslab/SCopeLoomR")

最终R运行:

library(hdf5r)

library(loomR)

library(LoomExperiment)

library(SCopeLoomR)

conn <- connect("GSM4878538_umi_hNP_1.loom")

再查看conn:

> conn

Class: loom

Filename: /xxx/GSM4878538_umi_hNP_1.loom

Access type: H5F_ACC_RDONLY

Attributes: version, chunks

Listing:

name obj_type dataset.dims dataset.type_class

col_attrs H5I_GROUP <NA> <NA>

col_graphs H5I_GROUP <NA> <NA>

layers H5I_GROUP <NA> <NA>

matrix H5I_DATASET 12385 x 26418 H5T_FLOAT

row_attrs H5I_GROUP <NA> <NA>

row_graphs H5I_GROUP <NA> <NA>

读取出矩阵:

> GCMat <- as.data.frame(conn[["matrix"]][,])

> GCMat[1:4,1:4]

V1 V2 V3 V4

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

> colnames(GCMat) <- conn[["row_attrs/gene_names"]][]

> rownames(GCMat) <- conn[["col_attrs/cell_names"]][]

> GCMat[1:4,1:5]

AL627309.1 AL627309.6 AL627309.5 AL627309.4

hNP_1_AAACCCACAAAGACTA-1-1 0 0 0 0

hNP_1_AAACCCACACTTCTCG-1-1 0 0 0 0

hNP_1_AAACCCACAGCGTATT-1-1 0 0 0 0

hNP_1_AAACCCACATCTGGGC-1-1 0 0 0 0

FO538757.1

hNP_1_AAACCCACAAAGACTA-1-1 0

hNP_1_AAACCCACACTTCTCG-1-1 0

hNP_1_AAACCCACAGCGTATT-1-1 0

hNP_1_AAACCCACATCTGGGC-1-1 0

注意要转置:

> GCMat <- t(GCMat)

转换成稀疏矩阵:

> GCMat <- as(GCMat,"dgCMatrix")

> GCMat[1:4,1:4]

4 x 4 sparse Matrix of class "dgCMatrix"

hNP_1_AAACCCACAAAGACTA-1-1 hNP_1_AAACCCACACTTCTCG-1-1

AL627309.1 . .

AL627309.6 . .

AL627309.5 . .

AL627309.4 . .

替换细胞名前面的编号,以方便管理:

AllCells <- colnames(GCMat)

NewCells <- c()

for(i in AllCells)

{

TCell <- gsub("hNP_1zhong","hNP-3",i)

NewCells <- c(NewCells,TCell)

}

colnames(GCMat) <- NewCells

相关推荐
REDcker1 小时前
浏览器端Web程序性能分析与优化实战 DevTools指标与工程清单
开发语言·前端·javascript·vue·ecmascript·php·js
我命由我123452 小时前
Kotlin 开发 - lateinit 关键字
android·java·开发语言·kotlin·android studio·android-studio·android runtime
Halo_tjn3 小时前
Java Set集合相关知识点
java·开发语言·算法
许彰午3 小时前
我手写了一个 Java 内存数据库(二):B+ 树的插入与分裂
java·开发语言·面试
大飞记Python3 小时前
【2026更新】Python基础学习指南(AI版)——04数据类型
开发语言·人工智能·python
Alice-YUE4 小时前
【js高频八股】防抖与节流
开发语言·前端·javascript·笔记·学习·ecmascript
云泽8084 小时前
C++11 核心特性全解:列表初始化、右值引用与移动语义实战
开发语言·c++
froginwe114 小时前
DOM 加载函数
开发语言
Hello eveybody4 小时前
介绍一下背包DP(Python)
开发语言·python·动态规划·dp·背包dp
AI进化营-智能译站5 小时前
ROS2 C++开发系列12-用多态与虚函数构建可扩展的ROS2机器人行为模块
开发语言·c++·ai·机器人