多组学研究(转录组 + 蛋白组 + 代谢组 + 甲基化组等)已成为解析生物复杂调控机制的核心手段,但数据可视化始终是制约多组学成果落地的关键瓶颈:传统静态图表(如 ggplot2 绘制的热图、火山图)无法支持非生信人员的交互式探索,多组学数据整合后维度高达数千甚至数万,静态图表难以展现全局关联;投稿 / 项目汇报时,静态报告无法让评审 / 团队成员自主筛选感兴趣的分子、调整可视化参数,导致研究价值无法充分呈现。
OmicsDashboard(多组学交互式仪表盘)通过 R Shiny 或 Python Dash 构建可视化平台,将多组学数据整合为 "可交互、可探索、可导出" 的动态报告,解决了 "静态图表无法联动、非专业人员无法操作、多组学数据整合难" 的痛点。本文从多组学可视化的核心需求出发,分别基于 R Shiny 和 Python Dash 实现完整的 OmicsDashboard 搭建,涵盖数据整合、交互式筛选、多类型可视化、报告生成、部署共享全流程,提供可直接复用的代码模板,助力生信人员快速落地多组学交互式分析。
一、多组学可视化的核心痛点与 OmicsDashboard 价值
1. 多组学数据的核心特征
多组学数据具有 "高维度、多类型、强关联、异质性" 四大特征:
- 高维度:转录组 / 蛋白组数据少则数千个分子,多则数万个;
- 多类型:包含计数数据(RNA-seq count)、连续数据(蛋白表达量)、分类数据(样本分组)、注释数据(GO/KEGG 通路);
- 强关联:同一分子在不同组学层面的表达存在调控关联(如 lncRNA→mRNA→蛋白);
- 异质性:不同组学数据的量纲、分布差异大(如转录组是 count,代谢组是峰面积)。
2. 传统静态可视化的痛点
| 痛点 | 具体表现 | OmicsDashboard 解决方案 |
|---|---|---|
| 无交互性 | 无法筛选特定分子 / 样本,无法调整可视化参数(如热图聚类方法) | 交互式筛选组件(下拉框、滑块、复选框),实时更新图表 |
| 数据整合难 | 转录组、蛋白组图表分开展示,无法联动查看同一分子的多组学表达 | 跨组学分子联动查询,一键展示目标分子在各层面的表达 |
| 复用性差 | 每换一批数据需重新编写可视化代码 | 模块化 Dashboard,只需替换数据文件即可复用 |
| 报告不灵活 | 静态 PDF 报告无法满足个性化探索需求 | 支持导出交互式 HTML 报告、静态 PNG/PDF、原始数据 |
| 非专业人员使用门槛高 | 需生信人员协助才能调整分析参数 | 可视化界面操作,无需代码基础 |
3. OmicsDashboard 核心价值
- 交互性:支持分子 / 样本筛选、参数调整、图表联动,自主探索数据;
- 整合性:统一展示多组学数据,直观呈现分子间调控关联;
- 易用性:可视化界面操作,降低非生信人员使用门槛;
- 可复用性:模块化架构,适配不同多组学数据集;
- 可分享性:支持本地 / 云端部署,方便团队协作与成果展示。
二、技术选型与环境配置
1. R Shiny vs Python Dash 对比(生信场景适配)
| 特性 | R Shiny | Python Dash | 生信场景推荐 |
|---|---|---|---|
| 生信生态 | 丰富(ggplot2、pheatmap、clusterProfiler) | 较薄弱(需结合 Plotly、scikit-learn) | 转录组 / 表观组分析优先选 |
| 可视化语法 | 贴合生信人员习惯(ggplot2) | 基于 Plotly,交互性更强 | 需高交互性图表优先选 |
| 学习成本 | R 语言使用者低 | Python 语言使用者低 | 按团队技术栈选择 |
| 部署难度 | 简单(Shiny Server/Shinyapps.io) | 稍复杂(Gunicorn+Nginx) | 快速部署选 Shiny,大规模部署选 Dash |
| 性能 | 小数据量(<10 万行)表现好 | 大数据量(>10 万行)更高效 | 数据量 > 10 万行选 Dash |
2. 环境配置
(1)R Shiny 环境配置
R
# 安装核心包
install.packages(c("shiny", "shinydashboard", "ggplot2", "pheatmap", "dplyr", "tidyr", "plotly", "DT", "shinyWidgets", "rmarkdown"))
# 生信专用包
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "ComplexHeatmap"))
(2)Python Dash 环境配置
bash
# 创建虚拟环境
conda create -n dash_omics python=3.9 -y
conda activate dash_omics
# 安装核心包
pip install dash dash-bootstrap-components plotly pandas numpy scipy scikit-learn seaborn openpyxl reportlab
# 生信专用包(可选)
pip install pysam pybedtools biom-format
三、实战 1:基于 R Shiny 搭建多组学 Dashboard(转录组 + 代谢组)
以 "肺癌转录组 + 代谢组整合分析" 为例,搭建包含数据概览、交互式筛选、多组学可视化、功能富集、报告导出的完整 Dashboard。
1. Dashboard 架构设计
采用 shinydashboard 经典布局:
- 侧边栏(Sidebar):功能导航(数据概览、基因筛选、代谢组分析、富集分析、报告导出)+ 筛选组件(基因名、样本分组、p 值阈值);
- 主面板(Body):动态展示图表 / 表格,支持实时更新;
- 顶部导航栏(Header):项目信息、数据版本、帮助文档。
2. 核心代码实现(完整可运行)
步骤 1:准备多组学测试数据
创建 data/ 目录,放入以下文件:
rna_seq_counts.csv:转录组 count 矩阵(行:基因,列:样本,最后一列:基因注释);metabolomics_data.csv:代谢组数据(行:代谢物,列:样本,最后一列:代谢通路);sample_info.csv:样本分组信息(列:样本名、分组(Tumor/Normal)、性别、年龄)。
示例数据格式(rna_seq_counts.csv):
| GeneID | Sample1 | Sample2 | Sample3 | Sample4 | Annotation |
|---|---|---|---|---|---|
| TP53 | 1200 | 1350 | 800 | 750 | Tumor suppressor |
| EGFR | 850 | 920 | 1500 | 1600 | Oncogene |
| GAPDH | 5000 | 5200 | 4800 | 4900 | Housekeeping |
步骤 2:编写 Shiny App 代码(app.R)
R
# 加载包
library(shiny)
library(shinydashboard)
library(ggplot2)
library(pheatmap)
library(dplyr)
library(tidyr)
library(plotly)
library(DT)
library(shinyWidgets)
library(clusterProfiler)
library(org.Hs.eg.db)
# ====================== 1. 加载数据 ======================
# 转录组数据
rna_data <- read.csv("data/rna_seq_counts.csv", row.names = 1, stringsAsFactors = FALSE)
# 代谢组数据
metab_data <- read.csv("data/metabolomics_data.csv", row.names = 1, stringsAsFactors = FALSE)
# 样本信息
sample_info <- read.csv("data/sample_info.csv", row.names = 1, stringsAsFactors = FALSE)
# 提取表达矩阵(去除注释列)
rna_expr <- rna_data[, !colnames(rna_data) %in% "Annotation"]
metab_expr <- metab_data[, !colnames(metab_data) %in% "Pathway"]
# ====================== 2. 定义UI ======================
ui <- dashboardPage(
# 顶部导航栏
dashboardHeader(title = "肺癌多组学分析Dashboard", titleWidth = 300),
# 侧边栏
dashboardSidebar(
width = 300,
sidebarMenu(
menuItem("数据概览", tabName = "overview", icon = icon("dashboard")),
menuItem("基因筛选与可视化", tabName = "gene_vis", icon = icon("dna")),
menuItem("代谢组分析", tabName = "metab_vis", icon = icon("flask")),
menuItem("功能富集分析", tabName = "enrich", icon = icon("chart-pie")),
menuItem("报告导出", tabName = "report", icon = icon("file-export"))
),
# 通用筛选组件(样本分组)
hr(),
h4("样本筛选"),
pickerInput(
inputId = "sample_group",
label = "选择样本分组",
choices = unique(sample_info$Group),
selected = unique(sample_info$Group),
multiple = TRUE,
options = list(`actions-box` = TRUE)
),
# 基因筛选组件(仅在基因可视化页面显示)
conditionalPanel(
condition = "input.tabName == 'gene_vis'",
hr(),
h4("基因筛选"),
textInput("gene_name", "输入基因名(支持模糊匹配)", placeholder = "TP53/EGFR"),
sliderInput("expr_threshold", "基因表达量阈值", min = 0, max = max(rna_expr), value = 500),
pickerInput(
inputId = "plot_type",
label = "选择可视化类型",
choices = c("箱线图", "热图", "火山图"),
selected = "箱线图"
)
),
# 代谢组筛选组件
conditionalPanel(
condition = "input.tabName == 'metab_vis'",
hr(),
h4("代谢物筛选"),
textInput("metab_name", "输入代谢物名(支持模糊匹配)", placeholder = "Glucose/Lactate"),
pickerInput(
inputId = "metab_plot",
label = "选择可视化类型",
choices = c("散点图", "热图", "通路富集图"),
selected = "散点图"
)
)
),
# 主面板
dashboardBody(
tabItems(
# ====================== 标签页1:数据概览 ======================
tabItem(
tabName = "overview",
fluidRow(
# 数据基本信息卡片
box(title = "数据概览", width = 12,
column(6,
h4("转录组数据"),
p(paste("基因数量:", nrow(rna_expr))),
p(paste("样本数量:", ncol(rna_expr))),
p(paste("肿瘤样本数:", sum(sample_info$Group == "Tumor"))),
p(paste("正常样本数:", sum(sample_info$Group == "Normal")))
),
column(6,
h4("代谢组数据"),
p(paste("代谢物数量:", nrow(metab_expr))),
p(paste("样本数量:", ncol(metab_expr))),
p(paste("代谢通路数:", length(unique(metab_data$Pathway)))),
p(paste("数据量纲:峰面积(归一化后)"))
)
),
# 样本分组分布饼图
box(title = "样本分组分布", width = 6,
plotOutput("sample_pie", height = 300)
),
# 转录组表达量分布直方图
box(title = "转录组基因表达量分布", width = 6,
plotOutput("rna_expr_dist", height = 300)
),
# 样本相关性热图
box(title = "样本相关性分析(转录组)", width = 12,
plotOutput("sample_corr_heatmap", height = 400)
)
)
),
# ====================== 标签页2:基因筛选与可视化 ======================
tabItem(
tabName = "gene_vis",
fluidRow(
# 筛选结果表格
box(title = "筛选基因列表", width = 12,
DTOutput("gene_table")
),
# 可视化图表
box(title = "基因表达可视化", width = 12,
conditionalPanel(
condition = "input.plot_type == '箱线图'",
plotlyOutput("gene_boxplot", height = 400)
),
conditionalPanel(
condition = "input.plot_type == '热图'",
plotOutput("gene_heatmap", height = 400)
),
conditionalPanel(
condition = "input.plot_type == '火山图'",
plotlyOutput("gene_volcano", height = 400)
)
)
)
),
# ====================== 标签页3:代谢组分析 ======================
tabItem(
tabName = "metab_vis",
fluidRow(
# 代谢物筛选表格
box(title = "筛选代谢物列表", width = 12,
DTOutput("metab_table")
),
# 代谢物可视化
box(title = "代谢物表达可视化", width = 12,
conditionalPanel(
condition = "input.metab_plot == '散点图'",
plotlyOutput("metab_scatter", height = 400)
),
conditionalPanel(
condition = "input.metab_plot == '热图'",
plotOutput("metab_heatmap", height = 400)
),
conditionalPanel(
condition = "input.metab_plot == '通路富集图'",
plotOutput("metab_pathway", height = 400)
)
)
)
),
# ====================== 标签页4:功能富集分析 ======================
tabItem(
tabName = "enrich",
fluidRow(
box(title = "富集分析参数", width = 4,
textAreaInput("gene_list", "输入富集基因列表(换行分隔)", rows = 10, placeholder = "TP53\nEGFR\nMYC"),
pickerInput(
inputId = "enrich_type",
label = "富集类型",
choices = c("GO-BP", "GO-CC", "GO-MF", "KEGG"),
selected = "GO-BP"
),
actionButton("run_enrich", "运行富集分析", class = "btn-primary")
),
box(title = "富集结果表格", width = 8,
DTOutput("enrich_table")
),
box(title = "富集气泡图", width = 12,
plotOutput("enrich_bubble", height = 400)
)
)
),
# ====================== 标签页5:报告导出 ======================
tabItem(
tabName = "report",
fluidRow(
box(title = "报告导出设置", width = 6,
textInput("report_title", "报告标题", value = "肺癌多组学分析报告"),
selectInput("report_format", "报告格式", choices = c("HTML", "PDF", "Word"), selected = "HTML"),
checkboxGroupInput("include_plots", "包含图表",
choices = c("数据概览", "基因表达图", "代谢物图", "富集分析图"),
selected = c("数据概览", "基因表达图")),
actionButton("generate_report", "生成报告", class = "btn-success")
),
box(title = "导出状态", width = 6,
verbatimTextOutput("report_status"),
downloadButton("download_report", "下载报告", class = "btn-info")
)
)
)
)
)
)
# ====================== 3. 定义Server ======================
server <- function(input, output, session) {
# ---------------------- 数据概览页面 ----------------------
# 样本分组饼图
output$sample_pie <- renderPlot({
group_counts <- table(sample_info$Group)
ggplot(data.frame(Group = names(group_counts), Count = as.vector(group_counts)),
aes(x = "", y = Count, fill = Group)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
labs(title = "样本分组分布", fill = "分组") +
scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green"))
})
# 转录组表达量分布
output$rna_expr_dist <- renderPlot({
rna_expr_long <- pivot_longer(as.data.frame(rna_expr), cols = everything(), names_to = "Sample", values_to = "Expression")
ggplot(rna_expr_long, aes(x = Expression)) +
geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
theme_bw() +
labs(title = "基因表达量分布", x = "表达量(count)", y = "基因数量") +
scale_x_log10() # 对数转换,更易观察分布
})
# 样本相关性热图
output$sample_corr_heatmap <- renderPlot({
# 计算样本相关性
sample_corr <- cor(t(rna_expr))
# 绘制热图
pheatmap(sample_corr,
annotation_col = sample_info[, "Group", drop = FALSE],
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "样本相关性热图(转录组)",
show_rownames = TRUE,
show_colnames = TRUE)
})
# ---------------------- 基因筛选与可视化页面 ----------------------
# 筛选基因数据(响应式)
filtered_genes <- reactive({
# 筛选样本
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
# 筛选基因(表达量阈值 + 基因名模糊匹配)
if (input$gene_name != "") {
gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
}
rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
# 合并注释信息
rna_filtered <- cbind(rna_filtered, Annotation = rna_data[rownames(rna_filtered), "Annotation"])
return(rna_filtered)
})
# 筛选基因表格
output$gene_table <- renderDT({
datatable(filtered_genes(),
options = list(pageLength = 10, scrollX = TRUE),
caption = "筛选后的基因表达数据") %>%
formatRound(columns = 1:(ncol(filtered_genes())-1), digits = 0)
})
# 基因箱线图
output$gene_boxplot <- renderPlotly({
if (nrow(filtered_genes()) == 0) {
return(plotly_empty() %>% layout(title = "无符合条件的基因"))
}
# 转换为长格式
gene_long <- filtered_genes() %>%
rownames_to_column("GeneID") %>%
pivot_longer(cols = -c(GeneID, Annotation), names_to = "Sample", values_to = "Expression") %>%
left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
# 绘制箱线图
p <- ggplot(gene_long, aes(x = Group, y = Expression, fill = Group)) +
geom_boxplot(alpha = 0.7) +
facet_wrap(~GeneID, scales = "free_y") +
theme_bw() +
scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green")) +
labs(x = "样本分组", y = "基因表达量(count)", title = "基因表达箱线图")
ggplotly(p)
})
# 基因热图
output$gene_heatmap <- renderPlot({
if (nrow(filtered_genes()) == 0) {
return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void())
}
# 提取表达矩阵(去除注释列)
expr_mat <- filtered_genes()[, !colnames(filtered_genes()) %in% "Annotation"]
# 标准化(行Z-score)
expr_mat <- t(scale(t(expr_mat)))
pheatmap(expr_mat,
annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "基因表达热图(Z-score标准化)",
show_rownames = TRUE,
show_colnames = TRUE)
})
# 基因火山图(仅展示差异基因,此处简化为Tumor vs Normal均值比)
output$gene_volcano <- renderPlotly({
if (nrow(filtered_genes()) == 0) {
return(plotly_empty() %>% layout(title = "无符合条件的基因"))
}
# 计算Tumor和Normal的均值
tumor_samples <- rownames(sample_info)[sample_info$Group == "Tumor"]
normal_samples <- rownames(sample_info)[sample_info$Group == "Normal"]
tumor_mean <- rowMeans(filtered_genes()[, tumor_samples, drop = FALSE])
normal_mean <- rowMeans(filtered_genes()[, normal_samples, drop = FALSE])
# 计算log2FC和p值(简化:t检验)
volcano_data <- data.frame(
GeneID = names(tumor_mean),
log2FC = log2((tumor_mean + 1)/(normal_mean + 1)),
pvalue = sapply(names(tumor_mean), function(g) {
t.test(filtered_genes()[g, tumor_samples], filtered_genes()[g, normal_samples])$p.value
})
) %>%
mutate(-log10p = -log10(pvalue),
Significance = ifelse(pvalue < 0.05 & abs(log2FC) > 1, "Significant", "Not significant"))
# 绘制火山图
p <- ggplot(volcano_data, aes(x = log2FC, y = -log10p, color = Significance)) +
geom_point(alpha = 0.7, size = 2) +
theme_bw() +
scale_color_manual(values = c("Significant" = "red", "Not significant" = "gray")) +
geom_vline(xintercept = c(-1, 1), linetype = "dashed", color = "black") +
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "black") +
labs(x = "log2(Fold Change)", y = "-log10(P-value)", title = "基因差异表达火山图")
ggplotly(p) %>% layout(hovermode = "closest")
})
# ---------------------- 代谢组分析页面 ----------------------
# 筛选代谢物数据
filtered_metabs <- reactive({
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
if (input$metab_name != "") {
metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
}
metab_filtered <- cbind(metab_filtered, Pathway = metab_data[rownames(metab_filtered), "Pathway"])
return(metab_filtered)
})
# 代谢物表格
output$metab_table <- renderDT({
datatable(filtered_metabs(),
options = list(pageLength = 10, scrollX = TRUE),
caption = "筛选后的代谢物表达数据") %>%
formatRound(columns = 1:(ncol(filtered_metabs())-1), digits = 2)
})
# 代谢物散点图
output$metab_scatter <- renderPlotly({
if (nrow(filtered_metabs()) == 0) {
return(plotly_empty() %>% layout(title = "无符合条件的代谢物"))
}
metab_long <- filtered_metabs() %>%
rownames_to_column("Metabolite") %>%
pivot_longer(cols = -c(Metabolite, Pathway), names_to = "Sample", values_to = "Expression") %>%
left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
p <- ggplot(metab_long, aes(x = Sample, y = Expression, color = Group)) +
geom_point(size = 3, alpha = 0.7) +
facet_wrap(~Metabolite, scales = "free_y") +
theme_bw() +
scale_color_manual(values = c("Tumor" = "red", "Normal" = "green")) +
labs(x = "样本", y = "代谢物表达量(峰面积)", title = "代谢物表达散点图") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(p)
})
# 代谢物热图
output$metab_heatmap <- renderPlot({
if (nrow(filtered_metabs()) == 0) {
return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void())
}
expr_mat <- filtered_metabs()[, !colnames(filtered_metabs()) %in% "Pathway"]
expr_mat <- t(scale(t(expr_mat)))
pheatmap(expr_mat,
annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "代谢物表达热图(Z-score标准化)",
show_rownames = TRUE,
show_colnames = TRUE)
})
# 代谢物通路富集图(简化版)
output$metab_pathway <- renderPlot({
if (nrow(filtered_metabs()) == 0) {
return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void())
}
# 统计各通路的代谢物数量
pathway_counts <- table(filtered_metabs()$Pathway)
pathway_df <- data.frame(Pathway = names(pathway_counts), Count = as.vector(pathway_counts)) %>%
arrange(desc(Count)) %>%
slice_head(n = 10) # 展示前10个通路
ggplot(pathway_df, aes(x = reorder(Pathway, Count), y = Count)) +
geom_bar(stat = "identity", fill = "orange", alpha = 0.7) +
theme_bw() +
labs(x = "代谢通路", y = "代谢物数量", title = "代谢通路富集TOP10") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
})
# ---------------------- 功能富集分析页面 ----------------------
# 富集分析结果(响应式)
enrich_result <- eventReactive(input$run_enrich, {
if (input$gene_list == "") {
return(NULL)
}
# 提取基因列表
genes <- unlist(strsplit(input$gene_list, "\n"))
genes <- genes[genes != ""]
# 转换为Entrez ID(富集分析需要)
gene_entrez <- bitr(genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)$ENTREZID
if (length(gene_entrez) == 0) {
return(NULL)
}
# 运行富集分析
if (input$enrich_type == "GO-BP") {
enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "BP", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
} else if (input$enrich_type == "GO-CC") {
enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "CC", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
} else if (input$enrich_type == "GO-MF") {
enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "MF", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
} else if (input$enrich_type == "KEGG") {
enrich <- enrichKEGG(gene = gene_entrez, organism = "hsa", pvalueCutoff = 0.05)
}
return(enrich)
})
# 富集结果表格
output$enrich_table <- renderDT({
if (is.null(enrich_result())) {
return(datatable(data.frame(Message = "请输入有效基因列表并运行富集分析")))
}
datatable(as.data.frame(enrich_result()),
options = list(pageLength = 10, scrollX = TRUE),
caption = paste(input$enrich_type, "富集分析结果"))
})
# 富集气泡图
output$enrich_bubble <- renderPlot({
if (is.null(enrich_result())) {
return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无富集结果")) + theme_void())
}
dotplot(enrich_result(), showCategory = 15, title = paste(input$enrich_type, "富集气泡图")) +
theme(axis.text.y = element_text(size = 10))
})
# ---------------------- 报告导出页面 ----------------------
# 生成报告
report_path <- reactiveVal(NULL)
# 辅助函数:保存图表为图片
save_plot <- function(plot_obj, filename, width = 10, height = 6, dpi = 300) {
if (is.null(plot_obj)) return(NULL)
plot_path <- file.path(tempdir(), filename)
# 区分ggplot/plotly对象
if (inherits(plot_obj, "ggplot")) {
ggsave(plot_path, plot_obj, width = width, height = height, dpi = dpi)
} else if (inherits(plot_obj, "plotly")) {
plotly::export(plot_obj, file = plot_path)
} else {
png(plot_path, width = width, height = height, units = "in", res = dpi)
print(plot_obj)
dev.off()
}
return(plot_path)
}
observeEvent(input$generate_report, {
# 1. 生成并保存核心图表(临时目录)
## 1.1 样本分布饼图
sample_pie_plot <- ggplot(data.frame(Group = names(table(sample_info$Group)),
Count = as.vector(table(sample_info$Group))),
aes(x = "", y = Count, fill = Group)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
labs(title = "样本分组分布", fill = "分组") +
scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green"))
sample_pie_path <- save_plot(sample_pie_plot, "sample_pie.png")
## 1.2 基因表达图表(根据选择的可视化类型)
gene_plot <- if (input$plot_type == "箱线图") {
# 重新生成箱线图(非交互式版本,适配Rmd)
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
if (input$gene_name != "") {
gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
}
rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
if (nrow(rna_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
} else {
gene_long <- rna_filtered %>%
rownames_to_column("GeneID") %>%
pivot_longer(cols = -GeneID, names_to = "Sample", values_to = "Expression") %>%
left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
ggplot(gene_long, aes(x = Group, y = Expression, fill = Group)) +
geom_boxplot(alpha = 0.7) +
facet_wrap(~GeneID, scales = "free_y") +
theme_bw() +
scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green")) +
labs(x = "样本分组", y = "基因表达量(count)", title = "基因表达箱线图")
}
} else if (input$plot_type == "热图") {
# 重新生成热图
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
if (input$gene_name != "") {
gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
}
rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
if (nrow(rna_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
} else {
expr_mat <- t(scale(t(rna_filtered)))
pheatmap(expr_mat,
annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "基因表达热图(Z-score标准化)",
show_rownames = TRUE,
show_colnames = TRUE)
}
} else {
# 火山图
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
if (input$gene_name != "") {
gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
}
rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
if (nrow(rna_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
} else {
tumor_samples <- rownames(sample_info)[sample_info$Group == "Tumor"]
normal_samples <- rownames(sample_info)[sample_info$Group == "Normal"]
tumor_mean <- rowMeans(rna_filtered[, tumor_samples, drop = FALSE])
normal_mean <- rowMeans(rna_filtered[, normal_samples, drop = FALSE])
volcano_data <- data.frame(
GeneID = names(tumor_mean),
log2FC = log2((tumor_mean + 1)/(normal_mean + 1)),
pvalue = sapply(names(tumor_mean), function(g) {
t.test(rna_filtered[g, tumor_samples], rna_filtered[g, normal_samples])$p.value
})
) %>%
mutate(-log10p = -log10(pvalue),
Significance = ifelse(pvalue < 0.05 & abs(log2FC) > 1, "Significant", "Not significant"))
ggplot(volcano_data, aes(x = log2FC, y = -log10p, color = Significance)) +
geom_point(alpha = 0.7, size = 2) +
theme_bw() +
scale_color_manual(values = c("Significant" = "red", "Not significant" = "gray")) +
geom_vline(xintercept = c(-1, 1), linetype = "dashed", color = "black") +
geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "black") +
labs(x = "log2(Fold Change)", y = "-log10(P-value)", title = "基因差异表达火山图")
}
}
gene_plot_path <- save_plot(gene_plot, "gene_expression.png")
## 1.3 代谢组分析图表
metab_plot <- if (input$metab_plot == "散点图") {
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
if (input$metab_name != "") {
metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
}
if (nrow(metab_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
} else {
metab_long <- metab_filtered %>%
rownames_to_column("Metabolite") %>%
pivot_longer(cols = -Metabolite, names_to = "Sample", values_to = "Expression") %>%
left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
ggplot(metab_long, aes(x = Sample, y = Expression, color = Group)) +
geom_point(size = 3, alpha = 0.7) +
facet_wrap(~Metabolite, scales = "free_y") +
theme_bw() +
scale_color_manual(values = c("Tumor" = "red", "Normal" = "green")) +
labs(x = "样本", y = "代谢物表达量(峰面积)", title = "代谢物表达散点图") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
} else if (input$metab_plot == "热图") {
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
if (input$metab_name != "") {
metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
}
if (nrow(metab_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
} else {
expr_mat <- t(scale(t(metab_filtered)))
pheatmap(expr_mat,
annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
color = colorRampPalette(c("blue", "white", "red"))(100),
main = "代谢物表达热图(Z-score标准化)",
show_rownames = TRUE,
show_colnames = TRUE)
}
} else {
# 通路富集图
selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
if (input$metab_name != "") {
metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
}
metab_filtered <- cbind(metab_filtered, Pathway = metab_data[rownames(metab_filtered), "Pathway"])
if (nrow(metab_filtered) == 0) {
ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
} else {
pathway_counts <- table(metab_filtered$Pathway)
pathway_df <- data.frame(Pathway = names(pathway_counts), Count = as.vector(pathway_counts)) %>%
arrange(desc(Count)) %>%
slice_head(n = 10)
ggplot(pathway_df, aes(x = reorder(Pathway, Count), y = Count)) +
geom_bar(stat = "identity", fill = "orange", alpha = 0.7) +
theme_bw() +
labs(x = "代谢通路", y = "代谢物数量", title = "代谢通路富集TOP10") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
}
metab_plot_path <- save_plot(metab_plot, "metabolomics_analysis.png")
## 1.4 功能富集分析图表
enrich_plot <- if (!is.null(enrich_result())) {
dotplot(enrich_result(), showCategory = 15, title = paste(input$enrich_type, "富集气泡图")) +
theme(axis.text.y = element_text(size = 10))
} else {
ggplot() + geom_text(aes(x = 1, y = 1, label = "未运行富集分析或无富集结果")) + theme_void()
}
enrich_plot_path <- save_plot(enrich_plot, "enrichment_analysis.png")
# 2. 定义R Markdown报告模板
report_template <- "
---
title: '%s'
output:
%s_document:
toc: true
toc_depth: 3
number_sections: true
---
# 多组学分析报告
## 1. 数据概览
### 1.1 基本信息
- 转录组基因数量:%d
- 代谢组代谢物数量:%d
- 样本总数:%d(肿瘤:%d,正常:%d)
### 1.2 样本分布
```{r echo=FALSE, fig.width=8, fig.height=4, warning=FALSE}
knitr::include_graphics('%s')
附录:数据文件说明
-
转录组数据:rna_seq_counts.csv
-
代谢组数据:metabolomics_data.csv
-
样本信息:sample_info.csv"
填充模板参数
report_title <- inputreporttitlereportformat<−switch(inputreport_format,HTML = "html",PDF = "pdf",Word = "word")
提取图表代码(简化版,实际可保存图片后插入)
sample_pie_code <- "print(outputsamplepie())"geneplotcode<−if(inputplot_type == "箱线图") "print (outputgeneboxplot())"else"print(outputgene_heatmap())"metab_plot_code <- if (input散点图metab_scatter())" else "print(outputmetabheatmap())"enrichplotcode<−if(!is.null(enrichresult()))"print(outputenrich_bubble ())"else"ggplot () + geom_text (aes (x=1,y=1,label=' 无富集结果 '))"
填充模板
report_content <- sprintf(report_template,report_title,report_format,nrow(rna_expr),nrow(metab_expr),nrow(sample_info),sum(sample_infoGroup=="Tumor"),sum(sampleinfoGroup == "Normal"),sample_pie_code,gene_plot_code,metab_plot_code,enrich_plot_code)
保存报告模板
temp_report <- tempfile(fileext = ".Rmd")writeLines(report_content, temp_report)
渲染报告
output_file <- paste0("OmicsReport_", Sys.Date(), ".", report_format)rmarkdown::render(temp_report, output_file = output_file, envir = new.env())
更新报告路径
report_path (output_file)output$report_status <- renderPrint ({cat (paste ("报告生成成功!路径:", output_file))})})
下载报告
outputdownload_report <- downloadHandler( filename = function() { paste0("OmicsReport_", Sys.Date(), ".", inputreport_format)},content = function(file) {file.copy(report_path(), file)},contentType = switch(input$report_format,HTML = "text/html",PDF = "application/pdf",Word = "application/msword"))}
4. 运行 App
shinyApp(ui, server)
plaintext
### 3. 本地运行与部署
#### (1)本地运行
将 `app.R` 和 `data/` 目录放在同一文件夹,在RStudio中打开 `app.R`,点击"Run App"按钮即可启动Dashboard。
#### (2)服务器部署(Shiny Server)
1. 安装 Shiny Server(Ubuntu 示例):
```bash
sudo apt-get install r-base
sudo R -e "install.packages('shiny')"
wget https://download3.rstudio.org/ubuntu-18.04/x86_64/shiny-server-1.5.20.1002-amd64.deb
sudo dpkg -i shiny-server-1.5.20.1002-amd64.deb
- 将项目文件夹复制到
/srv/shiny-server/omics_dashboard/; - 启动 Shiny Server:
sudo systemctl start shiny-server; - 访问 Dashboard:
http://服务器IP:3838/omics_dashboard/。
(3)云端部署(Shinyapps.io)
-
安装
rsconnect包:install.packages("rsconnect"); -
配置账号:
rsconnect::setAccountInfo(name='你的账号', token='你的token', secret='你的secret'); -
部署 App:
Rlibrary(rsconnect) deployApp(appDir = "path/to/omics_dashboard", appName = "lung_cancer_omics")
四、实战 2:基于 Python Dash 搭建多组学 Dashboard(蛋白组 + 甲基化组)
以 "肝癌蛋白组 + 甲基化组整合分析" 为例,搭建高交互性的 Dash Dashboard,重点实现跨组学联动、动态更新、大数据量优化。
1. Dashboard 架构设计
Dash 基于 "组件 + 回调" 架构:
- 布局组件 :
dash-bootstrap-components构建响应式布局(侧边栏 + 主面板); - 回调函数:实现组件交互(筛选→更新图表);
- 可视化 :
plotly绘制交互式图表,支持缩放、悬停、下载; - 数据处理 :
pandas整合多组学数据,scipy做统计分析。
2. 核心代码实现(完整可运行)
步骤 1:准备测试数据
创建 data/ 目录,放入 proteomics_data.csv(蛋白组)、methylation_data.csv(甲基化组)、sample_info.csv(样本信息)。
步骤 2:编写 Dash App 代码(app.py)
python
import dash
from dash import dcc, html, Input, Output, State, callback
import dash_bootstrap_components as dbc
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy import stats
import os
# ====================== 1. 加载数据 ======================
# 设置数据路径
DATA_DIR = "data"
proteomics_df = pd.read_csv(os.path.join(DATA_DIR, "proteomics_data.csv"), index_col=0)
methylation_df = pd.read_csv(os.path.join(DATA_DIR, "methylation_data.csv"), index_col=0)
sample_info = pd.read_csv(os.path.join(DATA_DIR, "sample_info.csv"), index_col=0)
# 提取表达矩阵(去除注释列)
proteomics_expr = proteomics_df.drop(columns=["Annotation"], errors="ignore")
methylation_expr = methylation_df.drop(columns=["CpG_Island"], errors="ignore")
# ====================== 2. 初始化App ======================
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP], suppress_callback_exceptions=True)
server = app.server # 部署用
# ====================== 3. 定义布局 ======================
app.layout = dbc.Container([
# 标题
dbc.Row([
dbc.Col(html.H1("肝癌多组学分析Dashboard", className="text-center mb-4"), width=12)
]),
# 主体布局(侧边栏+主面板)
dbc.Row([
# 侧边栏(筛选组件)
dbc.Col([
html.Div([
html.H4("筛选条件", className="mb-3"),
# 样本分组筛选
html.Label("样本分组"),
dcc.Dropdown(
id="sample_group",
options=[{"label": g, "value": g} for g in sample_info["Group"].unique()],
value=sample_info["Group"].unique().tolist(),
multi=True,
className="mb-3"
),
# 分子类型选择
html.Label("分子类型"),
dcc.RadioItems(
id="omics_type",
options=[
{"label": "蛋白组", "value": "proteomics"},
{"label": "甲基化组", "value": "methylation"},
{"label": "跨组学联动", "value": "cross_omics"}
],
value="proteomics",
className="mb-3"
),
# 蛋白组筛选(条件显示)
html.Div(id="proteomics_filters", children=[
html.Label("蛋白名(模糊匹配)"),
dcc.Input(id="protein_name", type="text", placeholder="p53/AKT", className="mb-3 form-control"),
html.Label("表达量阈值"),
dcc.Slider(
id="protein_expr_thresh",
min=0, max=proteomics_expr.values.max(),
value=100, step=10,
marks={i: str(i) for i in range(0, int(proteomics_expr.values.max())+1, 500)},
className="mb-3"
)
], className="mb-4"),
# 甲基化组筛选(条件显示)
html.Div(id="methylation_filters", children=[
html.Label("CpG位点(模糊匹配)"),
dcc.Input(id="cpg_name", type="text", placeholder="cg0001/cg1234", className="mb-3 form-control"),
html.Label("甲基化水平阈值(β值)"),
dcc.Slider(
id="methylation_thresh",
min=0, max=1, value=0.5, step=0.1,
marks={i/10: str(i/10) for i in range(0, 11, 2)},
className="mb-3"
)
], className="mb-4"),
# 跨组学筛选(条件显示)
html.Div(id="cross_omics_filters", children=[
html.Label("目标基因/蛋白名"),
dcc.Input(id="cross_gene_name", type="text", placeholder="TP53", className="mb-3 form-control"),
html.Label("关联分析类型"),
dcc.RadioItems(
id="corr_type",
options=[
{"label": "皮尔逊相关", "value": "pearson"},
{"label": "斯皮尔曼相关", "value": "spearman"}
],
value="pearson",
className="mb-3"
)
], className="mb-4"),
# 可视化类型选择
html.Label("可视化类型"),
dcc.Dropdown(
id="plot_type",
options=[
{"label": "箱线图", "value": "box"},
{"label": "热图", "value": "heatmap"},
{"label": "散点图(相关性)", "value": "scatter"},
{"label": "火山图", "value": "volcano"}
],
value="box",
className="mb-4"
),
# 报告导出按钮
dbc.Button("导出交互式报告", id="export_report", color="success", className="mt-4")
], className="bg-light p-4 rounded-3")
], width=3),
# 主面板(图表+表格)
dbc.Col([
# 图表区域
dcc.Loading([
dcc.Graph(id="main_plot", figure={}, className="mb-4")
]),
# 数据表格区域
html.H4("筛选结果", className="mb-2"),
dash.dash_table.DataTable(
id="omics_table",
columns=[],
data=[],
page_size=10,
scroll_x=True,
style_table={"overflowX": "auto"},
style_cell={"textAlign": "center"},
style_header={"backgroundColor": "lightgray", "fontWeight": "bold"}
),
# 报告下载链接
html.Div(id="report_link", className="mt-4")
], width=9)
]),
# 隐藏组件(存储临时数据)
dcc.Store(id="filtered_data")
], fluid=True)
# ====================== 4. 回调函数 ======================
# 回调1:根据分子类型显示/隐藏筛选组件
@callback(
Output("proteomics_filters", "style"),
Output("methylation_filters", "style"),
Output("cross_omics_filters", "style"),
Input("omics_type", "value")
)
def toggle_filters(omics_type):
# 默认隐藏所有筛选组件
hidden = {"display": "none"}
visible = {"display": "block"}
if omics_type == "proteomics":
return visible, hidden, hidden
elif omics_type == "methylation":
return hidden, visible, hidden
elif omics_type == "cross_omics":
return hidden, hidden, visible
else:
return hidden, hidden, hidden
# 回调2:筛选数据并存储
@callback(
Output("filtered_data", "data"),
Input("omics_type", "value"),
Input("sample_group", "value"),
# 蛋白组筛选参数
Input("protein_name", "value"),
Input("protein_expr_thresh", "value"),
# 甲基化组筛选参数
Input("cpg_name", "value"),
Input("methylation_thresh", "value"),
# 跨组学筛选参数
Input("cross_gene_name", "value"),
Input("corr_type", "value")
)
def filter_data(omics_type, sample_group, protein_name, protein_thresh, cpg_name, methylation_thresh, cross_gene_name, corr_type):
# 筛选样本
selected_samples = sample_info[sample_info["Group"].isin(sample_group)].index.tolist()
if omics_type == "proteomics":
# 筛选蛋白组数据
df = proteomics_expr[selected_samples].copy()
# 模糊匹配蛋白名
if protein_name:
df = df[df.index.str.contains(protein_name, case=False)]
# 表达量阈值筛选
df = df[df.mean(axis=1) >= protein_thresh]
# 合并注释
if "Annotation" in proteomics_df.columns:
df["Annotation"] = proteomics_df.loc[df.index, "Annotation"]
return df.to_dict("records")
elif omics_type == "methylation":
# 筛选甲基化组数据
df = methylation_expr[selected_samples].copy()
# 模糊匹配CpG位点
if cpg_name:
df = df[df.index.str.contains(cpg_name, case=False)]
# 甲基化水平阈值筛选
df = df[df.mean(axis=1) >= methylation_thresh]
# 合并注释
if "CpG_Island" in methylation_df.columns:
df["CpG_Island"] = methylation_df.loc[df.index, "CpG_Island"]
return df.to_dict("records")
elif omics_type == "cross_omics":
# 跨组学联动:匹配蛋白和甲基化数据中的目标基因
if not cross_gene_name:
return []
# 提取目标蛋白和甲基化数据
protein_data = proteomics_expr[selected_samples].copy()
methylation_data = methylation_expr[selected_samples].copy()
# 模糊匹配目标基因
protein_match = protein_data.index.str.contains(cross_gene_name, case=False)
methylation_match = methylation_data.index.str.contains(cross_gene_name, case=False)
if not protein_match.any() or not methylation_match.any():
return []
# 计算相关性
protein_vals = protein_data[protein_match].mean(axis=0)
methylation_vals = methylation_data[methylation_match].mean(axis=0)
corr_func = stats.pearsonr if corr_type == "pearson" else stats.spearmanr
corr, pval = corr_func(protein_vals, methylation_vals)
# 构建结果数据
cross_df = pd.DataFrame({
"Sample": selected_samples,
"Protein_Expression": protein_vals.values,
"Methylation_Level": methylation_vals.values,
"Group": sample_info.loc[selected_samples, "Group"].values,
"Correlation": corr,
"P_Value": pval
})
return cross_df.to_dict("records")
else:
return []
# 回调3:更新表格
@callback(
Output("omics_table", "columns"),
Output("omics_table", "data"),
Input("filtered_data", "data"),
Input("omics_type", "value")
)
def update_table(filtered_data, omics_type):
if not filtered_data:
return [], []
df = pd.DataFrame(filtered_data)
# 定义表格列
columns = [{"name": col, "id": col} for col in df.columns]
# 格式化数值
for col in df.select_dtypes(include=[np.number]).columns:
df[col] = df[col].round(2)
return columns, df.to_dict("records")
# 回调4:更新主图表
@callback(
Output("main_plot", "figure"),
Input("filtered_data", "data"),
Input("omics_type", "value"),
Input("plot_type", "value"),
Input("sample_group", "value"),
Input("corr_type", "value")
)
def update_plot(filtered_data, omics_type, plot_type, sample_group, corr_type):
if not filtered_data:
return go.Figure().update_layout(title="无符合条件的数据")
df = pd.DataFrame(filtered_data)
selected_samples = sample_info[sample_info["Group"].isin(sample_group)].index.tolist()
# 蛋白组/甲基化组可视化
if omics_type in ["proteomics", "methylation"]:
# 转换为长格式
value_cols = [col for col in df.columns if col in selected_samples]
df_long = df.melt(
id_vars=[col for col in df.columns if col not in value_cols],
value_vars=value_cols,
var_name="Sample",
value_name="Value"
)
# 合并样本分组
df_long["Group"] = df_long["Sample"].map(sample_info["Group"])
if plot_type == "box":
# 箱线图
fig = px.box(
df_long,
x="Group",
y="Value",
color="Group",
facet_col="index" if omics_type == "proteomics" else "index",
facet_col_wrap=3,
title=f"{omics_type.capitalize()} 表达/甲基化水平箱线图",
labels={"Value": "蛋白表达量" if omics_type == "proteomics" else "甲基化水平(β值)"},
color_discrete_map={"Tumor": "red", "Normal": "green"}
)
fig.update_layout(height=600)
elif plot_type == "heatmap":
# 热图
expr_mat = df[value_cols].T
fig = px.imshow(
expr_mat,
labels=dict(x="分子", y="样本", color="Value"),
x=expr_mat.columns,
y=expr_mat.index,
color_continuous_scale="RdBu_r",
title=f"{omics_type.capitalize()} 表达/甲基化水平热图"
)
# 添加样本分组注释
fig.add_annotation(
text="样本分组:" + ", ".join([f"{g}: {sum(sample_info['Group']==g)}" for g in sample_group]),
x=0.5, y=-0.1, showarrow=False, xref="paper", yref="paper"
)
elif plot_type == "volcano":
# 火山图(Tumor vs Normal)
tumor_samples = sample_info[sample_info["Group"] == "Tumor"].index.tolist()
normal_samples = sample_info[sample_info["Group"] == "Normal"].index.tolist()
if not tumor_samples or not normal_samples:
return go.Figure().update_layout(title="需同时选择Tumor和Normal样本")
# 计算均值和log2FC
tumor_mean = df[tumor_samples].mean(axis=1)
normal_mean = df[normal_samples].mean(axis=1)
log2fc = np.log2((tumor_mean + 1) / (normal_mean + 1))
# 计算p值
pvals = []
for idx in df.index:
tumor_vals = df.loc[idx, tumor_samples].values
normal_vals = df.loc[idx, normal_samples].values
_, pval = stats.ttest_ind(tumor_vals, normal_vals)
pvals.append(pval)
volcano_df = pd.DataFrame({
"ID": df.index,
"log2FC": log2fc,
"-log10p": -np.log10(pvals),
"Significant": np.where((np.array(pvals) < 0.05) & (np.abs(log2fc) > 1), "Significant", "Not significant")
})
fig = px.scatter(
volcano_df,
x="log2FC",
y="-log10p",
color="Significant",
hover_name="ID",
title=f"{omics_type.capitalize()} 差异火山图",
color_discrete_map={"Significant": "red", "Not significant": "gray"}
)
fig.add_vline(x=-1, line_dash="dash", line_color="black")
fig.add_vline(x=1, line_dash="dash", line_color="black")
fig.add_hline(y=-np.log10(0.05), line_dash="dash", line_color="black")
else:
fig = go.Figure().update_layout(title="不支持的可视化类型")
# 跨组学可视化
elif omics_type == "cross_omics":
if plot_type == "scatter":
# 相关性散点图
fig = px.scatter(
df,
x="Protein_Expression",
y="Methylation_Level",
color="Group",
hover_name="Sample",
title=f"蛋白表达与甲基化水平相关性({corr_type},r={df['Correlation'].iloc[0]:.2f},p={df['P_Value'].iloc[0]:.4f})",
trendline="ols",
color_discrete_map={"Tumor": "red", "Normal": "green"}
)
else:
fig = go.Figure().update_layout(title="跨组学仅支持散点图可视化")
else:
fig = go.Figure().update_layout(title="无效的分子类型")
return fig
# 回调5:导出交互式报告
@callback(
Output("report_link", "children"),
Input("export_report", "n_clicks"),
State("filtered_data", "data"),
State("omics_type", "value"),
State("plot_type", "value")
)
def export_report(n_clicks, filtered_data, omics_type, plot_type):
if not n_clicks or not filtered_data:
return ""
# 生成HTML报告
report_filename = f"Omics_Dashboard_Report_{pd.Timestamp.now().strftime('%Y%m%d_%H%M%S')}.html"
# 简单的HTML模板
html_content = f"""
<!DOCTYPE html>
<html>
<head>
<title>多组学分析报告</title>
<meta charset="utf-8">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css">
</head>
<body class="container mt-4">
<h1>肝癌多组学分析报告</h1>
<p>生成时间:{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
<h2>1. 筛选条件</h2>
<p>分子类型:{omics_type}</p>
<p>可视化类型:{plot_type}</p>
<h2>2. 可视化结果</h2>
<div>{dcc.Graph(id="report_plot", figure=update_plot(filtered_data, omics_type, plot_type, sample_info["Group"].unique(), "pearson")).to_html()}</div>
<h2>3. 数据表格</h2>
<div>{pd.DataFrame(filtered_data).to_html(classes="table table-striped")}</div>
</body>
</html>
"""
# 保存报告
with open(report_filename, "w", encoding="utf-8") as f:
f.write(html_content)
# 返回下载链接
return html.A(
f"下载报告({report_filename})",
href=report_filename,
download=report_filename,
className="btn btn-info"
)
# ====================== 5. 运行App ======================
if __name__ == "__main__":
app.run_server(debug=True, host="0.0.0.0", port=8050)
3. 本地运行与部署
(1)本地运行
bash
python app.py
访问 http://localhost:8050 即可启动 Dashboard。
(2)服务器部署(Gunicorn+Nginx)
-
安装 Gunicorn:
pip install gunicorn; -
启动 Gunicorn:
gunicorn app:server -w 4 -b 0.0.0.0:8050; -
配置 Nginx 反向代理: nginx
server { listen 80; server_name your_domain.com; location / { proxy_pass http://127.0.0.1:8050; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } }
五、交互式报告生成进阶技巧
1. 自动化报告模板
- R Shiny:结合 R Markdown 模板,支持参数化生成报告(如根据用户筛选条件动态调整内容);
- Python Dash :使用
plotly.io.write_html导出交互式图表,结合jinja2模板生成完整 HTML 报告。
2. 自定义可视化组件
- 热图优化:添加行 / 列聚类、注释条、颜色刻度调整;
- 网络图:整合 ceRNA 网络,支持节点拖拽、缩放;
- 相关性分析:添加置信区间、分组拟合线;
- 多组学联动:点击一个组学的分子,自动高亮其他组学中的同源分子。
3. 性能优化(大数据量适配)
- 数据缓存 :使用
shiny::reactiveCache(R)或dash.Dcc.Store(Python)缓存筛选后的数据,避免重复计算; - 懒加载:仅加载当前页面所需数据,大数据量时分页 / 分块加载;
- 异步处理 :使用
shiny::future(R)或celery(Python)处理耗时操作(如富集分析); - 图表优化:减少图表中的数据点数量(如热图仅展示 TOP100 分子),使用 WebGL 渲染。
4. 数据导出与分享
- 格式支持:导出 CSV(原始数据)、PNG/PDF(静态图表)、HTML(交互式报告)、JSON(接口数据);
- 云端分享:集成 OneDrive/Google Drive,一键上传报告;
- 权限控制:添加用户登录模块(Shiny Server Pro/Dash Enterprise),限制数据访问。
六、常见问题与解决方案
1. 数据加载慢
- 原因:多组学数据量过大,一次性加载全部数据;
- 解决方案 :分块加载数据、使用
feather/parquet格式存储数据(比 CSV 快 10 倍)、添加数据压缩。
2. 图表渲染卡顿
- 原因:图表数据点过多(如热图包含数万个分子);
- 解决方案:限制展示的分子数量(如 TOP200)、使用降维分析(PCA/UMAP)简化数据、关闭不必要的交互功能。
3. 跨组学数据整合冲突
- 原因:不同组学的样本 ID 不匹配、量纲差异大;
- 解决方案:统一样本 ID 命名规范、对数据进行归一化 / 标准化、添加样本 ID 映射表。
4. 部署后无法访问
- 原因:端口未开放、权限不足、依赖包缺失;
- 解决方案 :开放防火墙端口(如 8050/3838)、使用非 root 用户运行、生成依赖清单(
requirements.txt/renv.lock)。
5. 交互式报告导出失败
- 原因:路径权限不足、中文编码问题;
- 解决方案:指定可写的导出路径、使用 UTF-8 编码、添加异常捕获。
七、总结与展望
OmicsDashboard 彻底改变了多组学数据的展示方式 ------ 从 "静态图表" 升级为 "交互式探索平台",不仅提升了生信分析的效率,还降低了非专业人员的使用门槛。R Shiny 适合快速搭建生信特色的 Dashboard,Python Dash 适合高交互性、大数据量的场景,可根据团队技术栈和项目需求选择。
未来,多组学可视化的发展方向包括:
- AI 辅助探索:集成大语言模型,通过自然语言查询生成可视化图表;
- 3D 可视化:使用 WebGL 实现多组学数据的 3D 网络、空间转录组可视化;
- 移动端适配:优化 Dashboard 响应式布局,支持手机 / 平板访问;
- 多平台整合:对接数据库(TCGA/GTEx),一键导入公共多组学数据。
掌握 OmicsDashboard 搭建技能,能让你的多组学研究成果更直观、更易复用、更具说服力 ------ 无论是期刊投稿、项目汇报,还是团队协作,交互式可视化都能让数据 "活" 起来,充分展现研究的价值。