1. 示例数据下载:
数据下载地址: https://www.10xgenomics.com/datasets/xenium-human-lung-cancer-post-xenium-technote

文件解压缩后,文件层级展示如下图,一般一个FOV对应的是一个样本,一个样本对应一个文件夹结果,一张芯片上最多可以选8个FOV,若果一张芯片上拼的样本数超过8个,就会有多个样本被并到一个FOV中,后续下机数据分析的时候想要拆分开的话,需要使用Xenium browser手动圈选,拿到个样本的barcodes,然后就可以拆分样本(一般TMA样本都需要手动圈选操作)。这里我们下载的数据就只有一个FOV,也就是只有一个样本,所以下图展示的是这一个样本的数据。

File type | File and description |
Experiment file | experiment.xenium : Experiment manifest file. |
Interactive summary | analysis_summary.html : Summary metrics, graphs, and images to QC your run data in HTML format. |
Image files | morphology.ome.tif : The 3D nuclei-stained (DAPI) morphology image in OME-TIFF format. |
Image files | morphology_focus/ : A directory containing the multi-focus projection of morphology image(s) in a multi-file OME-TIFF format (2D). The directory will contain the nuclei DAPI stain image, as well as three additional stain images for Xenium outputs generated with the multimodal cell segmentation assay workflow. |
Cell summary | cells.csv.gz : Cell summary file. |
Cell summary | cells.parquet : Cell summary file in Parquet format. |
Cell segmentation masks and polygons | cells.zarr.zip : Cell summary file in zipped Zarr format, only file that contains the nucleus and cell segmentation masks and boundaries used for transcript assignment. |
Cell boundary polygons | cell_boundaries.csv.gz : Cell boundary file. |
cell_boundaries.parquet : Cell boundary file in Parquet format. |
Nucleus boundary polygons | nucleus_boundaries.csv.gz : Nucleus boundary file. |
nucleus_boundaries.parquet : Nucleus boundary file in Parquet format. |
Transcript data | transcripts.parquet : Transcripts data in Parquet format. |
transcripts.zarr.zip : Transcript data in zipped Zarr format. |
Cell-feature matrix | cell_feature_matrix/ : Directory of the cell-feature matrix files in Market Exchange format. |
cell_feature_matrix.h5 : Cell-feature matrix file in HDF5 format. |
cell_feature_matrix.zarr.zip : Cell-feature matrix file in zipped Zarr format. |
Metric summary | metrics_summary.csv : Summary of key metrics. |
Secondary analysis | analysis/ : Directory of secondary analysis results. |
analysis.zarr.zip : Secondary analysis outputs in zipped Zarr format. |
Gene panel | gene_panel.json : Copy of input gene panel file. |
Auxiliary data (aux_outputs/ ) |
* morphology_fov_locations.json : Field of view (FOV) name and position information (in microns). * overview_scan_fov_locations.json : FOV name and position information (in pixels). * per_cycle_channel_images/ : Directory of downsampled RNA image files in TIFF format from each cycle and channel. * overview_scan.png : Full resolution image of entire slide sample. * background_qc_images/ : Directory of autofluorescence images (downsampled, TIFF format) that are subtracted from the raw stain images to produce the morphology_focus/ images if Cell Segmentation Staining protocol used. |


下面展示的是morphology_focus文件夹下的4个ome.tif文件,对应的是4个通道,0000是DAPI, 0001是green, 0002是yellow, 0003是red。

2. 安装依赖库

3. 数据读取
import osimport threadingimport spatialdata as sdfrom spatialdata_io import xenium
# 多线程读取Xenium下机数据读取def xenium_data_load_multithreaded(data_dir, sample_info): def sd_read_xenium(sample_data, sample_name, sdata_dict): sdata = xenium(path=sample_data, cells_boundaries=True, n_jobs=6) sdata_dict[sample_name] = sdata threads = [] sdata_dict = {} sample_2_group = {} with open(sample_info, 'r') as f: for line in f: raw_name, sample_name, group_name = line.strip().split('\t')[:3] # 这里根据自己实际情况修改 sample_2_group[sample_name] = group_name thread = threading.Thread(target=sd_read_xenium, args=(os.path.join(data_dir, raw_name),sample_name, sdata_dict,)) threads.append(thread) thread.start() for thread in threads: thread.join() sdata = sd.concatenate( sdata_dict, concatenate_tables=True, # 这里是将多样本的单细胞数据合并在一起到table中 obs_names_make_unique=True ) sdata.tables['table'].obs["sample"] = sdata.tables['table'].obs["region"].str.replace('cell_circles-', '') sdata.tables['table'].obs["group"] = sdata.tables['table'].obs["sample"].apply(lambda x: sample_2_group[x]) sdata.tables['table'].obs["cell_boundaries"] = sdata.tables['table'].obs["region"].str.replace('cell_circles', 'cell_boundaries') sdata.set_table_annotates_spatialelement(table_name='table', region=[i for i in sdata.shapes.keys() if i.startswith('cell_boundaries-')], region_key='cell_boundaries')
return sdata
fig, ax = plt.subplots(figsize=(10, 10))sdata.pl.render_images("morphology_focus-S1").pl.show(ax=ax, title="Morphology plot", coordinate_systems="global")

from spatialdata import bounding_box_query
fig, ax = plt.subplots(figsize=(10, 10))crop0 = lambda x: bounding_box_query( x, min_coordinate=[10000, 20000], max_coordinate=[15000, 25000], axes=("x", "y"), target_coordinate_system="global",)crop0(sdata).pl.render_shapes( "cell_boundaries-S1", color='EPCAM', outline_width=0.3, outline_alpha=0.9, outline_color='grey').pl.show(ax=ax, title="EPCAM gene expression", coordinate_systems="global")ax.grid(False)ax.axis('off')