多组学可视化进阶:OmicsDashboard 搭建与交互式报告生成(R Shiny/Python Dash 实战)

多组学研究(转录组 + 蛋白组 + 代谢组 + 甲基化组等)已成为解析生物复杂调控机制的核心手段,但数据可视化始终是制约多组学成果落地的关键瓶颈:传统静态图表(如 ggplot2 绘制的热图、火山图)无法支持非生信人员的交互式探索,多组学数据整合后维度高达数千甚至数万,静态图表难以展现全局关联;投稿 / 项目汇报时,静态报告无法让评审 / 团队成员自主筛选感兴趣的分子、调整可视化参数,导致研究价值无法充分呈现。

OmicsDashboard(多组学交互式仪表盘)通过 R Shiny 或 Python Dash 构建可视化平台,将多组学数据整合为 "可交互、可探索、可导出" 的动态报告,解决了 "静态图表无法联动、非专业人员无法操作、多组学数据整合难" 的痛点。本文从多组学可视化的核心需求出发,分别基于 R Shiny 和 Python Dash 实现完整的 OmicsDashboard 搭建,涵盖数据整合、交互式筛选、多类型可视化、报告生成、部署共享全流程,提供可直接复用的代码模板,助力生信人员快速落地多组学交互式分析。

一、多组学可视化的核心痛点与 OmicsDashboard 价值

1. 多组学数据的核心特征

多组学数据具有 "高维度、多类型、强关联、异质性" 四大特征:

  • 高维度:转录组 / 蛋白组数据少则数千个分子,多则数万个;
  • 多类型:包含计数数据(RNA-seq count)、连续数据(蛋白表达量)、分类数据(样本分组)、注释数据(GO/KEGG 通路);
  • 强关联:同一分子在不同组学层面的表达存在调控关联(如 lncRNA→mRNA→蛋白);
  • 异质性:不同组学数据的量纲、分布差异大(如转录组是 count,代谢组是峰面积)。

2. 传统静态可视化的痛点

痛点 具体表现 OmicsDashboard 解决方案
无交互性 无法筛选特定分子 / 样本,无法调整可视化参数(如热图聚类方法) 交互式筛选组件(下拉框、滑块、复选框),实时更新图表
数据整合难 转录组、蛋白组图表分开展示,无法联动查看同一分子的多组学表达 跨组学分子联动查询,一键展示目标分子在各层面的表达
复用性差 每换一批数据需重新编写可视化代码 模块化 Dashboard,只需替换数据文件即可复用
报告不灵活 静态 PDF 报告无法满足个性化探索需求 支持导出交互式 HTML 报告、静态 PNG/PDF、原始数据
非专业人员使用门槛高 需生信人员协助才能调整分析参数 可视化界面操作,无需代码基础

3. OmicsDashboard 核心价值

  • 交互性:支持分子 / 样本筛选、参数调整、图表联动,自主探索数据;
  • 整合性:统一展示多组学数据,直观呈现分子间调控关联;
  • 易用性:可视化界面操作,降低非生信人员使用门槛;
  • 可复用性:模块化架构,适配不同多组学数据集;
  • 可分享性:支持本地 / 云端部署,方便团队协作与成果展示。

二、技术选型与环境配置

1. R Shiny vs Python Dash 对比(生信场景适配)

特性 R Shiny Python Dash 生信场景推荐
生信生态 丰富(ggplot2、pheatmap、clusterProfiler) 较薄弱(需结合 Plotly、scikit-learn) 转录组 / 表观组分析优先选
可视化语法 贴合生信人员习惯(ggplot2) 基于 Plotly,交互性更强 需高交互性图表优先选
学习成本 R 语言使用者低 Python 语言使用者低 按团队技术栈选择
部署难度 简单(Shiny Server/Shinyapps.io) 稍复杂(Gunicorn+Nginx) 快速部署选 Shiny,大规模部署选 Dash
性能 小数据量(<10 万行)表现好 大数据量(>10 万行)更高效 数据量 > 10 万行选 Dash

2. 环境配置

(1)R Shiny 环境配置
R 复制代码
# 安装核心包
install.packages(c("shiny", "shinydashboard", "ggplot2", "pheatmap", "dplyr", "tidyr", "plotly", "DT", "shinyWidgets", "rmarkdown"))
# 生信专用包
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "ComplexHeatmap"))
(2)Python Dash 环境配置
bash 复制代码
# 创建虚拟环境
conda create -n dash_omics python=3.9 -y
conda activate dash_omics

# 安装核心包
pip install dash dash-bootstrap-components plotly pandas numpy scipy scikit-learn seaborn openpyxl reportlab
# 生信专用包(可选)
pip install pysam pybedtools biom-format

三、实战 1:基于 R Shiny 搭建多组学 Dashboard(转录组 + 代谢组)

以 "肺癌转录组 + 代谢组整合分析" 为例,搭建包含数据概览、交互式筛选、多组学可视化、功能富集、报告导出的完整 Dashboard。

1. Dashboard 架构设计

采用 shinydashboard 经典布局:

  • 侧边栏(Sidebar):功能导航(数据概览、基因筛选、代谢组分析、富集分析、报告导出)+ 筛选组件(基因名、样本分组、p 值阈值);
  • 主面板(Body):动态展示图表 / 表格,支持实时更新;
  • 顶部导航栏(Header):项目信息、数据版本、帮助文档。

2. 核心代码实现(完整可运行)

步骤 1:准备多组学测试数据

创建 data/ 目录,放入以下文件:

  • rna_seq_counts.csv:转录组 count 矩阵(行:基因,列:样本,最后一列:基因注释);
  • metabolomics_data.csv:代谢组数据(行:代谢物,列:样本,最后一列:代谢通路);
  • sample_info.csv:样本分组信息(列:样本名、分组(Tumor/Normal)、性别、年龄)。

示例数据格式(rna_seq_counts.csv):

GeneID Sample1 Sample2 Sample3 Sample4 Annotation
TP53 1200 1350 800 750 Tumor suppressor
EGFR 850 920 1500 1600 Oncogene
GAPDH 5000 5200 4800 4900 Housekeeping
步骤 2:编写 Shiny App 代码(app.R
R 复制代码
# 加载包
library(shiny)
library(shinydashboard)
library(ggplot2)
library(pheatmap)
library(dplyr)
library(tidyr)
library(plotly)
library(DT)
library(shinyWidgets)
library(clusterProfiler)
library(org.Hs.eg.db)

# ====================== 1. 加载数据 ======================
# 转录组数据
rna_data <- read.csv("data/rna_seq_counts.csv", row.names = 1, stringsAsFactors = FALSE)
# 代谢组数据
metab_data <- read.csv("data/metabolomics_data.csv", row.names = 1, stringsAsFactors = FALSE)
# 样本信息
sample_info <- read.csv("data/sample_info.csv", row.names = 1, stringsAsFactors = FALSE)

# 提取表达矩阵(去除注释列)
rna_expr <- rna_data[, !colnames(rna_data) %in% "Annotation"]
metab_expr <- metab_data[, !colnames(metab_data) %in% "Pathway"]

# ====================== 2. 定义UI ======================
ui <- dashboardPage(
  # 顶部导航栏
  dashboardHeader(title = "肺癌多组学分析Dashboard", titleWidth = 300),
  
  # 侧边栏
  dashboardSidebar(
    width = 300,
    sidebarMenu(
      menuItem("数据概览", tabName = "overview", icon = icon("dashboard")),
      menuItem("基因筛选与可视化", tabName = "gene_vis", icon = icon("dna")),
      menuItem("代谢组分析", tabName = "metab_vis", icon = icon("flask")),
      menuItem("功能富集分析", tabName = "enrich", icon = icon("chart-pie")),
      menuItem("报告导出", tabName = "report", icon = icon("file-export"))
    ),
    
    # 通用筛选组件(样本分组)
    hr(),
    h4("样本筛选"),
    pickerInput(
      inputId = "sample_group",
      label = "选择样本分组",
      choices = unique(sample_info$Group),
      selected = unique(sample_info$Group),
      multiple = TRUE,
      options = list(`actions-box` = TRUE)
    ),
    
    # 基因筛选组件(仅在基因可视化页面显示)
    conditionalPanel(
      condition = "input.tabName == 'gene_vis'",
      hr(),
      h4("基因筛选"),
      textInput("gene_name", "输入基因名(支持模糊匹配)", placeholder = "TP53/EGFR"),
      sliderInput("expr_threshold", "基因表达量阈值", min = 0, max = max(rna_expr), value = 500),
      pickerInput(
        inputId = "plot_type",
        label = "选择可视化类型",
        choices = c("箱线图", "热图", "火山图"),
        selected = "箱线图"
      )
    ),
    
    # 代谢组筛选组件
    conditionalPanel(
      condition = "input.tabName == 'metab_vis'",
      hr(),
      h4("代谢物筛选"),
      textInput("metab_name", "输入代谢物名(支持模糊匹配)", placeholder = "Glucose/Lactate"),
      pickerInput(
        inputId = "metab_plot",
        label = "选择可视化类型",
        choices = c("散点图", "热图", "通路富集图"),
        selected = "散点图"
      )
    )
  ),
  
  # 主面板
  dashboardBody(
    tabItems(
      # ====================== 标签页1:数据概览 ======================
      tabItem(
        tabName = "overview",
        fluidRow(
          # 数据基本信息卡片
          box(title = "数据概览", width = 12,
              column(6,
                     h4("转录组数据"),
                     p(paste("基因数量:", nrow(rna_expr))),
                     p(paste("样本数量:", ncol(rna_expr))),
                     p(paste("肿瘤样本数:", sum(sample_info$Group == "Tumor"))),
                     p(paste("正常样本数:", sum(sample_info$Group == "Normal")))
              ),
              column(6,
                     h4("代谢组数据"),
                     p(paste("代谢物数量:", nrow(metab_expr))),
                     p(paste("样本数量:", ncol(metab_expr))),
                     p(paste("代谢通路数:", length(unique(metab_data$Pathway)))),
                     p(paste("数据量纲:峰面积(归一化后)"))
              )
          ),
          
          # 样本分组分布饼图
          box(title = "样本分组分布", width = 6,
              plotOutput("sample_pie", height = 300)
          ),
          
          # 转录组表达量分布直方图
          box(title = "转录组基因表达量分布", width = 6,
              plotOutput("rna_expr_dist", height = 300)
          ),
          
          # 样本相关性热图
          box(title = "样本相关性分析(转录组)", width = 12,
              plotOutput("sample_corr_heatmap", height = 400)
          )
        )
      ),
      
      # ====================== 标签页2:基因筛选与可视化 ======================
      tabItem(
        tabName = "gene_vis",
        fluidRow(
          # 筛选结果表格
          box(title = "筛选基因列表", width = 12,
              DTOutput("gene_table")
          ),
          
          # 可视化图表
          box(title = "基因表达可视化", width = 12,
              conditionalPanel(
                condition = "input.plot_type == '箱线图'",
                plotlyOutput("gene_boxplot", height = 400)
              ),
              conditionalPanel(
                condition = "input.plot_type == '热图'",
                plotOutput("gene_heatmap", height = 400)
              ),
              conditionalPanel(
                condition = "input.plot_type == '火山图'",
                plotlyOutput("gene_volcano", height = 400)
              )
          )
        )
      ),
      
      # ====================== 标签页3:代谢组分析 ======================
      tabItem(
        tabName = "metab_vis",
        fluidRow(
          # 代谢物筛选表格
          box(title = "筛选代谢物列表", width = 12,
              DTOutput("metab_table")
          ),
          
          # 代谢物可视化
          box(title = "代谢物表达可视化", width = 12,
              conditionalPanel(
                condition = "input.metab_plot == '散点图'",
                plotlyOutput("metab_scatter", height = 400)
              ),
              conditionalPanel(
                condition = "input.metab_plot == '热图'",
                plotOutput("metab_heatmap", height = 400)
              ),
              conditionalPanel(
                condition = "input.metab_plot == '通路富集图'",
                plotOutput("metab_pathway", height = 400)
              )
          )
        )
      ),
      
      # ====================== 标签页4:功能富集分析 ======================
      tabItem(
        tabName = "enrich",
        fluidRow(
          box(title = "富集分析参数", width = 4,
              textAreaInput("gene_list", "输入富集基因列表(换行分隔)", rows = 10, placeholder = "TP53\nEGFR\nMYC"),
              pickerInput(
                inputId = "enrich_type",
                label = "富集类型",
                choices = c("GO-BP", "GO-CC", "GO-MF", "KEGG"),
                selected = "GO-BP"
              ),
              actionButton("run_enrich", "运行富集分析", class = "btn-primary")
          ),
          
          box(title = "富集结果表格", width = 8,
              DTOutput("enrich_table")
          ),
          
          box(title = "富集气泡图", width = 12,
              plotOutput("enrich_bubble", height = 400)
          )
        )
      ),
      
      # ====================== 标签页5:报告导出 ======================
      tabItem(
        tabName = "report",
        fluidRow(
          box(title = "报告导出设置", width = 6,
              textInput("report_title", "报告标题", value = "肺癌多组学分析报告"),
              selectInput("report_format", "报告格式", choices = c("HTML", "PDF", "Word"), selected = "HTML"),
              checkboxGroupInput("include_plots", "包含图表", 
                                 choices = c("数据概览", "基因表达图", "代谢物图", "富集分析图"),
                                 selected = c("数据概览", "基因表达图")),
              actionButton("generate_report", "生成报告", class = "btn-success")
          ),
          
          box(title = "导出状态", width = 6,
              verbatimTextOutput("report_status"),
              downloadButton("download_report", "下载报告", class = "btn-info")
          )
        )
      )
    )
  )
)

# ====================== 3. 定义Server ======================
server <- function(input, output, session) {
  # ---------------------- 数据概览页面 ----------------------
  # 样本分组饼图
  output$sample_pie <- renderPlot({
    group_counts <- table(sample_info$Group)
    ggplot(data.frame(Group = names(group_counts), Count = as.vector(group_counts)),
           aes(x = "", y = Count, fill = Group)) +
      geom_bar(stat = "identity", width = 1) +
      coord_polar("y", start = 0) +
      theme_void() +
      labs(title = "样本分组分布", fill = "分组") +
      scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green"))
  })
  
  # 转录组表达量分布
  output$rna_expr_dist <- renderPlot({
    rna_expr_long <- pivot_longer(as.data.frame(rna_expr), cols = everything(), names_to = "Sample", values_to = "Expression")
    ggplot(rna_expr_long, aes(x = Expression)) +
      geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
      theme_bw() +
      labs(title = "基因表达量分布", x = "表达量(count)", y = "基因数量") +
      scale_x_log10()  # 对数转换,更易观察分布
  })
  
  # 样本相关性热图
  output$sample_corr_heatmap <- renderPlot({
    # 计算样本相关性
    sample_corr <- cor(t(rna_expr))
    # 绘制热图
    pheatmap(sample_corr, 
             annotation_col = sample_info[, "Group", drop = FALSE],
             color = colorRampPalette(c("blue", "white", "red"))(100),
             main = "样本相关性热图(转录组)",
             show_rownames = TRUE,
             show_colnames = TRUE)
  })
  
  # ---------------------- 基因筛选与可视化页面 ----------------------
  # 筛选基因数据(响应式)
  filtered_genes <- reactive({
    # 筛选样本
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
    
    # 筛选基因(表达量阈值 + 基因名模糊匹配)
    if (input$gene_name != "") {
      gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
      rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
    }
    rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
    
    # 合并注释信息
    rna_filtered <- cbind(rna_filtered, Annotation = rna_data[rownames(rna_filtered), "Annotation"])
    return(rna_filtered)
  })
  
  # 筛选基因表格
  output$gene_table <- renderDT({
    datatable(filtered_genes(), 
              options = list(pageLength = 10, scrollX = TRUE),
              caption = "筛选后的基因表达数据") %>%
      formatRound(columns = 1:(ncol(filtered_genes())-1), digits = 0)
  })
  
  # 基因箱线图
  output$gene_boxplot <- renderPlotly({
    if (nrow(filtered_genes()) == 0) {
      return(plotly_empty() %>% layout(title = "无符合条件的基因"))
    }
    
    # 转换为长格式
    gene_long <- filtered_genes() %>%
      rownames_to_column("GeneID") %>%
      pivot_longer(cols = -c(GeneID, Annotation), names_to = "Sample", values_to = "Expression") %>%
      left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
    
    # 绘制箱线图
    p <- ggplot(gene_long, aes(x = Group, y = Expression, fill = Group)) +
      geom_boxplot(alpha = 0.7) +
      facet_wrap(~GeneID, scales = "free_y") +
      theme_bw() +
      scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green")) +
      labs(x = "样本分组", y = "基因表达量(count)", title = "基因表达箱线图")
    
    ggplotly(p)
  })
  
  # 基因热图
  output$gene_heatmap <- renderPlot({
    if (nrow(filtered_genes()) == 0) {
      return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void())
    }
    
    # 提取表达矩阵(去除注释列)
    expr_mat <- filtered_genes()[, !colnames(filtered_genes()) %in% "Annotation"]
    # 标准化(行Z-score)
    expr_mat <- t(scale(t(expr_mat)))
    
    pheatmap(expr_mat,
             annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
             color = colorRampPalette(c("blue", "white", "red"))(100),
             main = "基因表达热图(Z-score标准化)",
             show_rownames = TRUE,
             show_colnames = TRUE)
  })
  
  # 基因火山图(仅展示差异基因,此处简化为Tumor vs Normal均值比)
  output$gene_volcano <- renderPlotly({
    if (nrow(filtered_genes()) == 0) {
      return(plotly_empty() %>% layout(title = "无符合条件的基因"))
    }
    
    # 计算Tumor和Normal的均值
    tumor_samples <- rownames(sample_info)[sample_info$Group == "Tumor"]
    normal_samples <- rownames(sample_info)[sample_info$Group == "Normal"]
    tumor_mean <- rowMeans(filtered_genes()[, tumor_samples, drop = FALSE])
    normal_mean <- rowMeans(filtered_genes()[, normal_samples, drop = FALSE])
    
    # 计算log2FC和p值(简化:t检验)
    volcano_data <- data.frame(
      GeneID = names(tumor_mean),
      log2FC = log2((tumor_mean + 1)/(normal_mean + 1)),
      pvalue = sapply(names(tumor_mean), function(g) {
        t.test(filtered_genes()[g, tumor_samples], filtered_genes()[g, normal_samples])$p.value
      })
    ) %>%
      mutate(-log10p = -log10(pvalue),
             Significance = ifelse(pvalue < 0.05 & abs(log2FC) > 1, "Significant", "Not significant"))
    
    # 绘制火山图
    p <- ggplot(volcano_data, aes(x = log2FC, y = -log10p, color = Significance)) +
      geom_point(alpha = 0.7, size = 2) +
      theme_bw() +
      scale_color_manual(values = c("Significant" = "red", "Not significant" = "gray")) +
      geom_vline(xintercept = c(-1, 1), linetype = "dashed", color = "black") +
      geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "black") +
      labs(x = "log2(Fold Change)", y = "-log10(P-value)", title = "基因差异表达火山图")
    
    ggplotly(p) %>% layout(hovermode = "closest")
  })
  
  # ---------------------- 代谢组分析页面 ----------------------
  # 筛选代谢物数据
  filtered_metabs <- reactive({
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
    
    if (input$metab_name != "") {
      metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
      metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
    }
    
    metab_filtered <- cbind(metab_filtered, Pathway = metab_data[rownames(metab_filtered), "Pathway"])
    return(metab_filtered)
  })
  
  # 代谢物表格
  output$metab_table <- renderDT({
    datatable(filtered_metabs(), 
              options = list(pageLength = 10, scrollX = TRUE),
              caption = "筛选后的代谢物表达数据") %>%
      formatRound(columns = 1:(ncol(filtered_metabs())-1), digits = 2)
  })
  
  # 代谢物散点图
  output$metab_scatter <- renderPlotly({
    if (nrow(filtered_metabs()) == 0) {
      return(plotly_empty() %>% layout(title = "无符合条件的代谢物"))
    }
    
    metab_long <- filtered_metabs() %>%
      rownames_to_column("Metabolite") %>%
      pivot_longer(cols = -c(Metabolite, Pathway), names_to = "Sample", values_to = "Expression") %>%
      left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
    
    p <- ggplot(metab_long, aes(x = Sample, y = Expression, color = Group)) +
      geom_point(size = 3, alpha = 0.7) +
      facet_wrap(~Metabolite, scales = "free_y") +
      theme_bw() +
      scale_color_manual(values = c("Tumor" = "red", "Normal" = "green")) +
      labs(x = "样本", y = "代谢物表达量(峰面积)", title = "代谢物表达散点图") +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
    
    ggplotly(p)
  })
  
  # 代谢物热图
  output$metab_heatmap <- renderPlot({
    if (nrow(filtered_metabs()) == 0) {
      return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void())
    }
    
    expr_mat <- filtered_metabs()[, !colnames(filtered_metabs()) %in% "Pathway"]
    expr_mat <- t(scale(t(expr_mat)))
    
    pheatmap(expr_mat,
             annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
             color = colorRampPalette(c("blue", "white", "red"))(100),
             main = "代谢物表达热图(Z-score标准化)",
             show_rownames = TRUE,
             show_colnames = TRUE)
  })
  
  # 代谢物通路富集图(简化版)
  output$metab_pathway <- renderPlot({
    if (nrow(filtered_metabs()) == 0) {
      return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void())
    }
    
    # 统计各通路的代谢物数量
    pathway_counts <- table(filtered_metabs()$Pathway)
    pathway_df <- data.frame(Pathway = names(pathway_counts), Count = as.vector(pathway_counts)) %>%
      arrange(desc(Count)) %>%
      slice_head(n = 10)  # 展示前10个通路
    
    ggplot(pathway_df, aes(x = reorder(Pathway, Count), y = Count)) +
      geom_bar(stat = "identity", fill = "orange", alpha = 0.7) +
      theme_bw() +
      labs(x = "代谢通路", y = "代谢物数量", title = "代谢通路富集TOP10") +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
  })
  
  # ---------------------- 功能富集分析页面 ----------------------
  # 富集分析结果(响应式)
  enrich_result <- eventReactive(input$run_enrich, {
    if (input$gene_list == "") {
      return(NULL)
    }
    
    # 提取基因列表
    genes <- unlist(strsplit(input$gene_list, "\n"))
    genes <- genes[genes != ""]
    
    # 转换为Entrez ID(富集分析需要)
    gene_entrez <- bitr(genes, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)$ENTREZID
    
    if (length(gene_entrez) == 0) {
      return(NULL)
    }
    
    # 运行富集分析
    if (input$enrich_type == "GO-BP") {
      enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "BP", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
    } else if (input$enrich_type == "GO-CC") {
      enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "CC", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
    } else if (input$enrich_type == "GO-MF") {
      enrich <- enrichGO(gene = gene_entrez, OrgDb = org.Hs.eg.db, ont = "MF", pAdjustMethod = "fdr", qvalueCutoff = 0.05)
    } else if (input$enrich_type == "KEGG") {
      enrich <- enrichKEGG(gene = gene_entrez, organism = "hsa", pvalueCutoff = 0.05)
    }
    
    return(enrich)
  })
  
  # 富集结果表格
  output$enrich_table <- renderDT({
    if (is.null(enrich_result())) {
      return(datatable(data.frame(Message = "请输入有效基因列表并运行富集分析")))
    }
    
    datatable(as.data.frame(enrich_result()), 
              options = list(pageLength = 10, scrollX = TRUE),
              caption = paste(input$enrich_type, "富集分析结果"))
  })
  
  # 富集气泡图
  output$enrich_bubble <- renderPlot({
    if (is.null(enrich_result())) {
      return(ggplot() + geom_text(aes(x = 1, y = 1, label = "无富集结果")) + theme_void())
    }
    
    dotplot(enrich_result(), showCategory = 15, title = paste(input$enrich_type, "富集气泡图")) +
      theme(axis.text.y = element_text(size = 10))
  })
  
  # ---------------------- 报告导出页面 ----------------------
# 生成报告
report_path <- reactiveVal(NULL)

# 辅助函数:保存图表为图片
save_plot <- function(plot_obj, filename, width = 10, height = 6, dpi = 300) {
  if (is.null(plot_obj)) return(NULL)
  plot_path <- file.path(tempdir(), filename)
  # 区分ggplot/plotly对象
  if (inherits(plot_obj, "ggplot")) {
    ggsave(plot_path, plot_obj, width = width, height = height, dpi = dpi)
  } else if (inherits(plot_obj, "plotly")) {
    plotly::export(plot_obj, file = plot_path)
  } else {
    png(plot_path, width = width, height = height, units = "in", res = dpi)
    print(plot_obj)
    dev.off()
  }
  return(plot_path)
}

observeEvent(input$generate_report, {
  # 1. 生成并保存核心图表(临时目录)
  ## 1.1 样本分布饼图
  sample_pie_plot <- ggplot(data.frame(Group = names(table(sample_info$Group)), 
                                       Count = as.vector(table(sample_info$Group))),
                            aes(x = "", y = Count, fill = Group)) +
    geom_bar(stat = "identity", width = 1) +
    coord_polar("y", start = 0) +
    theme_void() +
    labs(title = "样本分组分布", fill = "分组") +
    scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green"))
  sample_pie_path <- save_plot(sample_pie_plot, "sample_pie.png")
  
  ## 1.2 基因表达图表(根据选择的可视化类型)
  gene_plot <- if (input$plot_type == "箱线图") {
    # 重新生成箱线图(非交互式版本,适配Rmd)
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
    if (input$gene_name != "") {
      gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
      rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
    }
    rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
    
    if (nrow(rna_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
    } else {
      gene_long <- rna_filtered %>%
        rownames_to_column("GeneID") %>%
        pivot_longer(cols = -GeneID, names_to = "Sample", values_to = "Expression") %>%
        left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
      ggplot(gene_long, aes(x = Group, y = Expression, fill = Group)) +
        geom_boxplot(alpha = 0.7) +
        facet_wrap(~GeneID, scales = "free_y") +
        theme_bw() +
        scale_fill_manual(values = c("Tumor" = "red", "Normal" = "green")) +
        labs(x = "样本分组", y = "基因表达量(count)", title = "基因表达箱线图")
    }
  } else if (input$plot_type == "热图") {
    # 重新生成热图
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
    if (input$gene_name != "") {
      gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
      rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
    }
    rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
    
    if (nrow(rna_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
    } else {
      expr_mat <- t(scale(t(rna_filtered)))
      pheatmap(expr_mat,
               annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
               color = colorRampPalette(c("blue", "white", "red"))(100),
               main = "基因表达热图(Z-score标准化)",
               show_rownames = TRUE,
               show_colnames = TRUE)
    }
  } else {
    # 火山图
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    rna_filtered <- rna_expr[, selected_samples, drop = FALSE]
    if (input$gene_name != "") {
      gene_match <- grepl(input$gene_name, rownames(rna_filtered), ignore.case = TRUE)
      rna_filtered <- rna_filtered[gene_match, , drop = FALSE]
    }
    rna_filtered <- rna_filtered[rowMeans(rna_filtered) >= input$expr_threshold, , drop = FALSE]
    
    if (nrow(rna_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的基因")) + theme_void()
    } else {
      tumor_samples <- rownames(sample_info)[sample_info$Group == "Tumor"]
      normal_samples <- rownames(sample_info)[sample_info$Group == "Normal"]
      tumor_mean <- rowMeans(rna_filtered[, tumor_samples, drop = FALSE])
      normal_mean <- rowMeans(rna_filtered[, normal_samples, drop = FALSE])
      volcano_data <- data.frame(
        GeneID = names(tumor_mean),
        log2FC = log2((tumor_mean + 1)/(normal_mean + 1)),
        pvalue = sapply(names(tumor_mean), function(g) {
          t.test(rna_filtered[g, tumor_samples], rna_filtered[g, normal_samples])$p.value
        })
      ) %>%
        mutate(-log10p = -log10(pvalue),
               Significance = ifelse(pvalue < 0.05 & abs(log2FC) > 1, "Significant", "Not significant"))
      ggplot(volcano_data, aes(x = log2FC, y = -log10p, color = Significance)) +
        geom_point(alpha = 0.7, size = 2) +
        theme_bw() +
        scale_color_manual(values = c("Significant" = "red", "Not significant" = "gray")) +
        geom_vline(xintercept = c(-1, 1), linetype = "dashed", color = "black") +
        geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "black") +
        labs(x = "log2(Fold Change)", y = "-log10(P-value)", title = "基因差异表达火山图")
    }
  }
  gene_plot_path <- save_plot(gene_plot, "gene_expression.png")
  
  ## 1.3 代谢组分析图表
  metab_plot <- if (input$metab_plot == "散点图") {
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
    if (input$metab_name != "") {
      metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
      metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
    }
    
    if (nrow(metab_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
    } else {
      metab_long <- metab_filtered %>%
        rownames_to_column("Metabolite") %>%
        pivot_longer(cols = -Metabolite, names_to = "Sample", values_to = "Expression") %>%
        left_join(sample_info %>% rownames_to_column("Sample"), by = "Sample")
      ggplot(metab_long, aes(x = Sample, y = Expression, color = Group)) +
        geom_point(size = 3, alpha = 0.7) +
        facet_wrap(~Metabolite, scales = "free_y") +
        theme_bw() +
        scale_color_manual(values = c("Tumor" = "red", "Normal" = "green")) +
        labs(x = "样本", y = "代谢物表达量(峰面积)", title = "代谢物表达散点图") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))
    }
  } else if (input$metab_plot == "热图") {
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
    if (input$metab_name != "") {
      metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
      metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
    }
    
    if (nrow(metab_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
    } else {
      expr_mat <- t(scale(t(metab_filtered)))
      pheatmap(expr_mat,
               annotation_col = sample_info[colnames(expr_mat), "Group", drop = FALSE],
               color = colorRampPalette(c("blue", "white", "red"))(100),
               main = "代谢物表达热图(Z-score标准化)",
               show_rownames = TRUE,
               show_colnames = TRUE)
    }
  } else {
    # 通路富集图
    selected_samples <- rownames(sample_info)[sample_info$Group %in% input$sample_group]
    metab_filtered <- metab_expr[, selected_samples, drop = FALSE]
    if (input$metab_name != "") {
      metab_match <- grepl(input$metab_name, rownames(metab_filtered), ignore.case = TRUE)
      metab_filtered <- metab_filtered[metab_match, , drop = FALSE]
    }
    metab_filtered <- cbind(metab_filtered, Pathway = metab_data[rownames(metab_filtered), "Pathway"])
    
    if (nrow(metab_filtered) == 0) {
      ggplot() + geom_text(aes(x = 1, y = 1, label = "无符合条件的代谢物")) + theme_void()
    } else {
      pathway_counts <- table(metab_filtered$Pathway)
      pathway_df <- data.frame(Pathway = names(pathway_counts), Count = as.vector(pathway_counts)) %>%
        arrange(desc(Count)) %>%
        slice_head(n = 10)
      ggplot(pathway_df, aes(x = reorder(Pathway, Count), y = Count)) +
        geom_bar(stat = "identity", fill = "orange", alpha = 0.7) +
        theme_bw() +
        labs(x = "代谢通路", y = "代谢物数量", title = "代谢通路富集TOP10") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))
    }
  }
  metab_plot_path <- save_plot(metab_plot, "metabolomics_analysis.png")
  
  ## 1.4 功能富集分析图表
  enrich_plot <- if (!is.null(enrich_result())) {
    dotplot(enrich_result(), showCategory = 15, title = paste(input$enrich_type, "富集气泡图")) +
      theme(axis.text.y = element_text(size = 10))
  } else {
    ggplot() + geom_text(aes(x = 1, y = 1, label = "未运行富集分析或无富集结果")) + theme_void()
  }
  enrich_plot_path <- save_plot(enrich_plot, "enrichment_analysis.png")

  # 2. 定义R Markdown报告模板
  report_template <- "
---
title: '%s'
output:
  %s_document:
    toc: true
    toc_depth: 3
    number_sections: true
---

# 多组学分析报告
## 1. 数据概览
### 1.1 基本信息
- 转录组基因数量:%d
- 代谢组代谢物数量:%d
- 样本总数:%d(肿瘤:%d,正常:%d)

### 1.2 样本分布
```{r echo=FALSE, fig.width=8, fig.height=4, warning=FALSE}
knitr::include_graphics('%s')

附录:数据文件说明

  • 转录组数据:rna_seq_counts.csv

  • 代谢组数据:metabolomics_data.csv

  • 样本信息:sample_info.csv"

    填充模板参数

    report_title <- inputreportt​itlereportf​ormat<−switch(inputreport_format,HTML = "html",PDF = "pdf",Word = "word")

    提取图表代码(简化版,实际可保存图片后插入)

    sample_pie_code <- "print(outputsamplep​ie())"genep​lotc​ode<−if(inputplot_type == "箱线图") "print (outputgeneb​oxplot())"else"print(outputgene_heatmap())"metab_plot_code <- if (input散点图metab_scatter())" else "print(outputmetabh​eatmap())"enrichp​lotc​ode<−if(!is.null(enrichr​esult()))"print(outputenrich_bubble ())"else"ggplot () + geom_text (aes (x=1,y=1,label=' 无富集结果 '))"

    填充模板

    report_content <- sprintf(report_template,report_title,report_format,nrow(rna_expr),nrow(metab_expr),nrow(sample_info),sum(sample_infoGroup=="Tumor"),sum(samplei​nfoGroup == "Normal"),sample_pie_code,gene_plot_code,metab_plot_code,enrich_plot_code)

    保存报告模板

    temp_report <- tempfile(fileext = ".Rmd")writeLines(report_content, temp_report)

    渲染报告

    output_file <- paste0("OmicsReport_", Sys.Date(), ".", report_format)rmarkdown::render(temp_report, output_file = output_file, envir = new.env())

    更新报告路径

    report_path (output_file)output$report_status <- renderPrint ({cat (paste ("报告生成成功!路径:", output_file))})})

    下载报告

    outputdownload_report <- downloadHandler( filename = function() { paste0("OmicsReport_", Sys.Date(), ".", inputreport_format)},content = function(file) {file.copy(report_path(), file)},contentType = switch(input$report_format,HTML = "text/html",PDF = "application/pdf",Word = "application/msword"))}

4. 运行 App

shinyApp(ui, server)

plaintext

复制代码
### 3. 本地运行与部署
#### (1)本地运行
将 `app.R` 和 `data/` 目录放在同一文件夹,在RStudio中打开 `app.R`,点击"Run App"按钮即可启动Dashboard。

#### (2)服务器部署(Shiny Server)
1. 安装 Shiny Server(Ubuntu 示例):
   ```bash
   sudo apt-get install r-base
   sudo R -e "install.packages('shiny')"
   wget https://download3.rstudio.org/ubuntu-18.04/x86_64/shiny-server-1.5.20.1002-amd64.deb
   sudo dpkg -i shiny-server-1.5.20.1002-amd64.deb
  1. 将项目文件夹复制到 /srv/shiny-server/omics_dashboard/
  2. 启动 Shiny Server:sudo systemctl start shiny-server
  3. 访问 Dashboard:http://服务器IP:3838/omics_dashboard/
(3)云端部署(Shinyapps.io
  1. 注册 Shinyapps.io 账号(https://www.shinyapps.io/);

  2. 安装 rsconnect 包:install.packages("rsconnect")

  3. 配置账号:rsconnect::setAccountInfo(name='你的账号', token='你的token', secret='你的secret')

  4. 部署 App:

    R 复制代码
    library(rsconnect)
    deployApp(appDir = "path/to/omics_dashboard", appName = "lung_cancer_omics")

四、实战 2:基于 Python Dash 搭建多组学 Dashboard(蛋白组 + 甲基化组)

以 "肝癌蛋白组 + 甲基化组整合分析" 为例,搭建高交互性的 Dash Dashboard,重点实现跨组学联动、动态更新、大数据量优化

1. Dashboard 架构设计

Dash 基于 "组件 + 回调" 架构:

  • 布局组件dash-bootstrap-components 构建响应式布局(侧边栏 + 主面板);
  • 回调函数:实现组件交互(筛选→更新图表);
  • 可视化plotly 绘制交互式图表,支持缩放、悬停、下载;
  • 数据处理pandas 整合多组学数据,scipy 做统计分析。

2. 核心代码实现(完整可运行)

步骤 1:准备测试数据

创建 data/ 目录,放入 proteomics_data.csv(蛋白组)、methylation_data.csv(甲基化组)、sample_info.csv(样本信息)。

步骤 2:编写 Dash App 代码(app.py
python 复制代码
import dash
from dash import dcc, html, Input, Output, State, callback
import dash_bootstrap_components as dbc
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
from scipy import stats
import os

# ====================== 1. 加载数据 ======================
# 设置数据路径
DATA_DIR = "data"
proteomics_df = pd.read_csv(os.path.join(DATA_DIR, "proteomics_data.csv"), index_col=0)
methylation_df = pd.read_csv(os.path.join(DATA_DIR, "methylation_data.csv"), index_col=0)
sample_info = pd.read_csv(os.path.join(DATA_DIR, "sample_info.csv"), index_col=0)

# 提取表达矩阵(去除注释列)
proteomics_expr = proteomics_df.drop(columns=["Annotation"], errors="ignore")
methylation_expr = methylation_df.drop(columns=["CpG_Island"], errors="ignore")

# ====================== 2. 初始化App ======================
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP], suppress_callback_exceptions=True)
server = app.server  # 部署用

# ====================== 3. 定义布局 ======================
app.layout = dbc.Container([
    # 标题
    dbc.Row([
        dbc.Col(html.H1("肝癌多组学分析Dashboard", className="text-center mb-4"), width=12)
    ]),
    
    # 主体布局(侧边栏+主面板)
    dbc.Row([
        # 侧边栏(筛选组件)
        dbc.Col([
            html.Div([
                html.H4("筛选条件", className="mb-3"),
                
                # 样本分组筛选
                html.Label("样本分组"),
                dcc.Dropdown(
                    id="sample_group",
                    options=[{"label": g, "value": g} for g in sample_info["Group"].unique()],
                    value=sample_info["Group"].unique().tolist(),
                    multi=True,
                    className="mb-3"
                ),
                
                # 分子类型选择
                html.Label("分子类型"),
                dcc.RadioItems(
                    id="omics_type",
                    options=[
                        {"label": "蛋白组", "value": "proteomics"},
                        {"label": "甲基化组", "value": "methylation"},
                        {"label": "跨组学联动", "value": "cross_omics"}
                    ],
                    value="proteomics",
                    className="mb-3"
                ),
                
                # 蛋白组筛选(条件显示)
                html.Div(id="proteomics_filters", children=[
                    html.Label("蛋白名(模糊匹配)"),
                    dcc.Input(id="protein_name", type="text", placeholder="p53/AKT", className="mb-3 form-control"),
                    html.Label("表达量阈值"),
                    dcc.Slider(
                        id="protein_expr_thresh",
                        min=0, max=proteomics_expr.values.max(),
                        value=100, step=10,
                        marks={i: str(i) for i in range(0, int(proteomics_expr.values.max())+1, 500)},
                        className="mb-3"
                    )
                ], className="mb-4"),
                
                # 甲基化组筛选(条件显示)
                html.Div(id="methylation_filters", children=[
                    html.Label("CpG位点(模糊匹配)"),
                    dcc.Input(id="cpg_name", type="text", placeholder="cg0001/cg1234", className="mb-3 form-control"),
                    html.Label("甲基化水平阈值(β值)"),
                    dcc.Slider(
                        id="methylation_thresh",
                        min=0, max=1, value=0.5, step=0.1,
                        marks={i/10: str(i/10) for i in range(0, 11, 2)},
                        className="mb-3"
                    )
                ], className="mb-4"),
                
                # 跨组学筛选(条件显示)
                html.Div(id="cross_omics_filters", children=[
                    html.Label("目标基因/蛋白名"),
                    dcc.Input(id="cross_gene_name", type="text", placeholder="TP53", className="mb-3 form-control"),
                    html.Label("关联分析类型"),
                    dcc.RadioItems(
                        id="corr_type",
                        options=[
                            {"label": "皮尔逊相关", "value": "pearson"},
                            {"label": "斯皮尔曼相关", "value": "spearman"}
                        ],
                        value="pearson",
                        className="mb-3"
                    )
                ], className="mb-4"),
                
                # 可视化类型选择
                html.Label("可视化类型"),
                dcc.Dropdown(
                    id="plot_type",
                    options=[
                        {"label": "箱线图", "value": "box"},
                        {"label": "热图", "value": "heatmap"},
                        {"label": "散点图(相关性)", "value": "scatter"},
                        {"label": "火山图", "value": "volcano"}
                    ],
                    value="box",
                    className="mb-4"
                ),
                
                # 报告导出按钮
                dbc.Button("导出交互式报告", id="export_report", color="success", className="mt-4")
            ], className="bg-light p-4 rounded-3")
        ], width=3),
        
        # 主面板(图表+表格)
        dbc.Col([
            # 图表区域
            dcc.Loading([
                dcc.Graph(id="main_plot", figure={}, className="mb-4")
            ]),
            
            # 数据表格区域
            html.H4("筛选结果", className="mb-2"),
            dash.dash_table.DataTable(
                id="omics_table",
                columns=[],
                data=[],
                page_size=10,
                scroll_x=True,
                style_table={"overflowX": "auto"},
                style_cell={"textAlign": "center"},
                style_header={"backgroundColor": "lightgray", "fontWeight": "bold"}
            ),
            
            # 报告下载链接
            html.Div(id="report_link", className="mt-4")
        ], width=9)
    ]),
    
    # 隐藏组件(存储临时数据)
    dcc.Store(id="filtered_data")
], fluid=True)

# ====================== 4. 回调函数 ======================
# 回调1:根据分子类型显示/隐藏筛选组件
@callback(
    Output("proteomics_filters", "style"),
    Output("methylation_filters", "style"),
    Output("cross_omics_filters", "style"),
    Input("omics_type", "value")
)
def toggle_filters(omics_type):
    # 默认隐藏所有筛选组件
    hidden = {"display": "none"}
    visible = {"display": "block"}
    
    if omics_type == "proteomics":
        return visible, hidden, hidden
    elif omics_type == "methylation":
        return hidden, visible, hidden
    elif omics_type == "cross_omics":
        return hidden, hidden, visible
    else:
        return hidden, hidden, hidden

# 回调2:筛选数据并存储
@callback(
    Output("filtered_data", "data"),
    Input("omics_type", "value"),
    Input("sample_group", "value"),
    # 蛋白组筛选参数
    Input("protein_name", "value"),
    Input("protein_expr_thresh", "value"),
    # 甲基化组筛选参数
    Input("cpg_name", "value"),
    Input("methylation_thresh", "value"),
    # 跨组学筛选参数
    Input("cross_gene_name", "value"),
    Input("corr_type", "value")
)
def filter_data(omics_type, sample_group, protein_name, protein_thresh, cpg_name, methylation_thresh, cross_gene_name, corr_type):
    # 筛选样本
    selected_samples = sample_info[sample_info["Group"].isin(sample_group)].index.tolist()
    
    if omics_type == "proteomics":
        # 筛选蛋白组数据
        df = proteomics_expr[selected_samples].copy()
        # 模糊匹配蛋白名
        if protein_name:
            df = df[df.index.str.contains(protein_name, case=False)]
        # 表达量阈值筛选
        df = df[df.mean(axis=1) >= protein_thresh]
        # 合并注释
        if "Annotation" in proteomics_df.columns:
            df["Annotation"] = proteomics_df.loc[df.index, "Annotation"]
        return df.to_dict("records")
    
    elif omics_type == "methylation":
        # 筛选甲基化组数据
        df = methylation_expr[selected_samples].copy()
        # 模糊匹配CpG位点
        if cpg_name:
            df = df[df.index.str.contains(cpg_name, case=False)]
        # 甲基化水平阈值筛选
        df = df[df.mean(axis=1) >= methylation_thresh]
        # 合并注释
        if "CpG_Island" in methylation_df.columns:
            df["CpG_Island"] = methylation_df.loc[df.index, "CpG_Island"]
        return df.to_dict("records")
    
    elif omics_type == "cross_omics":
        # 跨组学联动:匹配蛋白和甲基化数据中的目标基因
        if not cross_gene_name:
            return []
        
        # 提取目标蛋白和甲基化数据
        protein_data = proteomics_expr[selected_samples].copy()
        methylation_data = methylation_expr[selected_samples].copy()
        
        # 模糊匹配目标基因
        protein_match = protein_data.index.str.contains(cross_gene_name, case=False)
        methylation_match = methylation_data.index.str.contains(cross_gene_name, case=False)
        
        if not protein_match.any() or not methylation_match.any():
            return []
        
        # 计算相关性
        protein_vals = protein_data[protein_match].mean(axis=0)
        methylation_vals = methylation_data[methylation_match].mean(axis=0)
        
        corr_func = stats.pearsonr if corr_type == "pearson" else stats.spearmanr
        corr, pval = corr_func(protein_vals, methylation_vals)
        
        # 构建结果数据
        cross_df = pd.DataFrame({
            "Sample": selected_samples,
            "Protein_Expression": protein_vals.values,
            "Methylation_Level": methylation_vals.values,
            "Group": sample_info.loc[selected_samples, "Group"].values,
            "Correlation": corr,
            "P_Value": pval
        })
        return cross_df.to_dict("records")
    
    else:
        return []

# 回调3:更新表格
@callback(
    Output("omics_table", "columns"),
    Output("omics_table", "data"),
    Input("filtered_data", "data"),
    Input("omics_type", "value")
)
def update_table(filtered_data, omics_type):
    if not filtered_data:
        return [], []
    
    df = pd.DataFrame(filtered_data)
    # 定义表格列
    columns = [{"name": col, "id": col} for col in df.columns]
    # 格式化数值
    for col in df.select_dtypes(include=[np.number]).columns:
        df[col] = df[col].round(2)
    
    return columns, df.to_dict("records")

# 回调4:更新主图表
@callback(
    Output("main_plot", "figure"),
    Input("filtered_data", "data"),
    Input("omics_type", "value"),
    Input("plot_type", "value"),
    Input("sample_group", "value"),
    Input("corr_type", "value")
)
def update_plot(filtered_data, omics_type, plot_type, sample_group, corr_type):
    if not filtered_data:
        return go.Figure().update_layout(title="无符合条件的数据")
    
    df = pd.DataFrame(filtered_data)
    selected_samples = sample_info[sample_info["Group"].isin(sample_group)].index.tolist()
    
    # 蛋白组/甲基化组可视化
    if omics_type in ["proteomics", "methylation"]:
        # 转换为长格式
        value_cols = [col for col in df.columns if col in selected_samples]
        df_long = df.melt(
            id_vars=[col for col in df.columns if col not in value_cols],
            value_vars=value_cols,
            var_name="Sample",
            value_name="Value"
        )
        # 合并样本分组
        df_long["Group"] = df_long["Sample"].map(sample_info["Group"])
        
        if plot_type == "box":
            # 箱线图
            fig = px.box(
                df_long,
                x="Group",
                y="Value",
                color="Group",
                facet_col="index" if omics_type == "proteomics" else "index",
                facet_col_wrap=3,
                title=f"{omics_type.capitalize()} 表达/甲基化水平箱线图",
                labels={"Value": "蛋白表达量" if omics_type == "proteomics" else "甲基化水平(β值)"},
                color_discrete_map={"Tumor": "red", "Normal": "green"}
            )
            fig.update_layout(height=600)
        
        elif plot_type == "heatmap":
            # 热图
            expr_mat = df[value_cols].T
            fig = px.imshow(
                expr_mat,
                labels=dict(x="分子", y="样本", color="Value"),
                x=expr_mat.columns,
                y=expr_mat.index,
                color_continuous_scale="RdBu_r",
                title=f"{omics_type.capitalize()} 表达/甲基化水平热图"
            )
            # 添加样本分组注释
            fig.add_annotation(
                text="样本分组:" + ", ".join([f"{g}: {sum(sample_info['Group']==g)}" for g in sample_group]),
                x=0.5, y=-0.1, showarrow=False, xref="paper", yref="paper"
            )
        
        elif plot_type == "volcano":
            # 火山图(Tumor vs Normal)
            tumor_samples = sample_info[sample_info["Group"] == "Tumor"].index.tolist()
            normal_samples = sample_info[sample_info["Group"] == "Normal"].index.tolist()
            
            if not tumor_samples or not normal_samples:
                return go.Figure().update_layout(title="需同时选择Tumor和Normal样本")
            
            # 计算均值和log2FC
            tumor_mean = df[tumor_samples].mean(axis=1)
            normal_mean = df[normal_samples].mean(axis=1)
            log2fc = np.log2((tumor_mean + 1) / (normal_mean + 1))
            
            # 计算p值
            pvals = []
            for idx in df.index:
                tumor_vals = df.loc[idx, tumor_samples].values
                normal_vals = df.loc[idx, normal_samples].values
                _, pval = stats.ttest_ind(tumor_vals, normal_vals)
                pvals.append(pval)
            
            volcano_df = pd.DataFrame({
                "ID": df.index,
                "log2FC": log2fc,
                "-log10p": -np.log10(pvals),
                "Significant": np.where((np.array(pvals) < 0.05) & (np.abs(log2fc) > 1), "Significant", "Not significant")
            })
            
            fig = px.scatter(
                volcano_df,
                x="log2FC",
                y="-log10p",
                color="Significant",
                hover_name="ID",
                title=f"{omics_type.capitalize()} 差异火山图",
                color_discrete_map={"Significant": "red", "Not significant": "gray"}
            )
            fig.add_vline(x=-1, line_dash="dash", line_color="black")
            fig.add_vline(x=1, line_dash="dash", line_color="black")
            fig.add_hline(y=-np.log10(0.05), line_dash="dash", line_color="black")
        
        else:
            fig = go.Figure().update_layout(title="不支持的可视化类型")
    
    # 跨组学可视化
    elif omics_type == "cross_omics":
        if plot_type == "scatter":
            # 相关性散点图
            fig = px.scatter(
                df,
                x="Protein_Expression",
                y="Methylation_Level",
                color="Group",
                hover_name="Sample",
                title=f"蛋白表达与甲基化水平相关性({corr_type},r={df['Correlation'].iloc[0]:.2f},p={df['P_Value'].iloc[0]:.4f})",
                trendline="ols",
                color_discrete_map={"Tumor": "red", "Normal": "green"}
            )
        else:
            fig = go.Figure().update_layout(title="跨组学仅支持散点图可视化")
    
    else:
        fig = go.Figure().update_layout(title="无效的分子类型")
    
    return fig

# 回调5:导出交互式报告
@callback(
    Output("report_link", "children"),
    Input("export_report", "n_clicks"),
    State("filtered_data", "data"),
    State("omics_type", "value"),
    State("plot_type", "value")
)
def export_report(n_clicks, filtered_data, omics_type, plot_type):
    if not n_clicks or not filtered_data:
        return ""
    
    # 生成HTML报告
    report_filename = f"Omics_Dashboard_Report_{pd.Timestamp.now().strftime('%Y%m%d_%H%M%S')}.html"
    
    # 简单的HTML模板
    html_content = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>多组学分析报告</title>
        <meta charset="utf-8">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css">
    </head>
    <body class="container mt-4">
        <h1>肝癌多组学分析报告</h1>
        <p>生成时间:{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
        <h2>1. 筛选条件</h2>
        <p>分子类型:{omics_type}</p>
        <p>可视化类型:{plot_type}</p>
        <h2>2. 可视化结果</h2>
        <div>{dcc.Graph(id="report_plot", figure=update_plot(filtered_data, omics_type, plot_type, sample_info["Group"].unique(), "pearson")).to_html()}</div>
        <h2>3. 数据表格</h2>
        <div>{pd.DataFrame(filtered_data).to_html(classes="table table-striped")}</div>
    </body>
    </html>
    """
    
    # 保存报告
    with open(report_filename, "w", encoding="utf-8") as f:
        f.write(html_content)
    
    # 返回下载链接
    return html.A(
        f"下载报告({report_filename})",
        href=report_filename,
        download=report_filename,
        className="btn btn-info"
    )

# ====================== 5. 运行App ======================
if __name__ == "__main__":
    app.run_server(debug=True, host="0.0.0.0", port=8050)

3. 本地运行与部署

(1)本地运行
bash 复制代码
python app.py

访问 http://localhost:8050 即可启动 Dashboard。

(2)服务器部署(Gunicorn+Nginx)
  1. 安装 Gunicorn:pip install gunicorn

  2. 启动 Gunicorn:gunicorn app:server -w 4 -b 0.0.0.0:8050

  3. 配置 Nginx 反向代理: nginx

    复制代码
    server {
        listen 80;
        server_name your_domain.com;
        
        location / {
            proxy_pass http://127.0.0.1:8050;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }

五、交互式报告生成进阶技巧

1. 自动化报告模板

  • R Shiny:结合 R Markdown 模板,支持参数化生成报告(如根据用户筛选条件动态调整内容);
  • Python Dash :使用 plotly.io.write_html 导出交互式图表,结合 jinja2 模板生成完整 HTML 报告。

2. 自定义可视化组件

  • 热图优化:添加行 / 列聚类、注释条、颜色刻度调整;
  • 网络图:整合 ceRNA 网络,支持节点拖拽、缩放;
  • 相关性分析:添加置信区间、分组拟合线;
  • 多组学联动:点击一个组学的分子,自动高亮其他组学中的同源分子。

3. 性能优化(大数据量适配)

  • 数据缓存 :使用 shiny::reactiveCache(R)或 dash.Dcc.Store(Python)缓存筛选后的数据,避免重复计算;
  • 懒加载:仅加载当前页面所需数据,大数据量时分页 / 分块加载;
  • 异步处理 :使用 shiny::future(R)或 celery(Python)处理耗时操作(如富集分析);
  • 图表优化:减少图表中的数据点数量(如热图仅展示 TOP100 分子),使用 WebGL 渲染。

4. 数据导出与分享

  • 格式支持:导出 CSV(原始数据)、PNG/PDF(静态图表)、HTML(交互式报告)、JSON(接口数据);
  • 云端分享:集成 OneDrive/Google Drive,一键上传报告;
  • 权限控制:添加用户登录模块(Shiny Server Pro/Dash Enterprise),限制数据访问。

六、常见问题与解决方案

1. 数据加载慢

  • 原因:多组学数据量过大,一次性加载全部数据;
  • 解决方案 :分块加载数据、使用 feather/parquet 格式存储数据(比 CSV 快 10 倍)、添加数据压缩。

2. 图表渲染卡顿

  • 原因:图表数据点过多(如热图包含数万个分子);
  • 解决方案:限制展示的分子数量(如 TOP200)、使用降维分析(PCA/UMAP)简化数据、关闭不必要的交互功能。

3. 跨组学数据整合冲突

  • 原因:不同组学的样本 ID 不匹配、量纲差异大;
  • 解决方案:统一样本 ID 命名规范、对数据进行归一化 / 标准化、添加样本 ID 映射表。

4. 部署后无法访问

  • 原因:端口未开放、权限不足、依赖包缺失;
  • 解决方案 :开放防火墙端口(如 8050/3838)、使用非 root 用户运行、生成依赖清单(requirements.txt/renv.lock)。

5. 交互式报告导出失败

  • 原因:路径权限不足、中文编码问题;
  • 解决方案:指定可写的导出路径、使用 UTF-8 编码、添加异常捕获。

七、总结与展望

OmicsDashboard 彻底改变了多组学数据的展示方式 ------ 从 "静态图表" 升级为 "交互式探索平台",不仅提升了生信分析的效率,还降低了非专业人员的使用门槛。R Shiny 适合快速搭建生信特色的 Dashboard,Python Dash 适合高交互性、大数据量的场景,可根据团队技术栈和项目需求选择。

未来,多组学可视化的发展方向包括:

  • AI 辅助探索:集成大语言模型,通过自然语言查询生成可视化图表;
  • 3D 可视化:使用 WebGL 实现多组学数据的 3D 网络、空间转录组可视化;
  • 移动端适配:优化 Dashboard 响应式布局,支持手机 / 平板访问;
  • 多平台整合:对接数据库(TCGA/GTEx),一键导入公共多组学数据。

掌握 OmicsDashboard 搭建技能,能让你的多组学研究成果更直观、更易复用、更具说服力 ------ 无论是期刊投稿、项目汇报,还是团队协作,交互式可视化都能让数据 "活" 起来,充分展现研究的价值。

相关推荐
Solyn_HAN5 小时前
非编码 RNA(ceRNA/lncRNA/circRNA)分析完整流程:从数据下载到功能验证(含代码模板)
python·bash·生物信息学·r
Solyn_HAN3 天前
Snakemake 从入门到实战:生信自动化工作流搭建指南
生物信息学·snakemake
Solyn_HAN4 天前
Python 生信进阶:Biopython 库完全指南(序列处理 + 数据库交互)
python·生物信息学·biopython
橙子牛奶糖19 天前
Nature | 本周最新文献速递
gwas·生物信息学·单细胞测序
陈天白2 个月前
RNA-seq分析之最佳cutoff(TCGA版)
r语言·生物信息学·rna-seq
这是一只菜狗啊4 个月前
链特异性文库是什么?为什么它在转录组测序中越来越重要?
生物信息学
让学习成为一种生活方式4 个月前
生物信息学数据技能-学习系列001
生物信息学
endNone6 个月前
【生物信息学】k-mer的基本概念及应用
生物信息学·基因组学·k-mer
易基因科技7 个月前
易基因:何川团队开发新m6A测序方法 可温和条件下高分辨率/低背景噪声检测m6A修饰|Nature子刊
经验分享·数据挖掘·生物学·生物信息学·生信分析·基础技术方法论专题