R语言科研绘图:单/双因素箱线图 从基础到发表级 (附完整代码)

R语言科研绘图:箱线图从基础到发表级完整教程

还在用柱状图加误差线做差异比较?箱线图一张图给你五个统计量------最小值、Q1、中位数、Q3、最大值,一眼看出数据偏斜、离散度和离群值。本文从零开始,用 R 的 ggplot2 逐步叠加美化、统计检验、双因素分析和拼图导出,最终达到期刊发表标准。所有代码实测可运行。


一、认识原始数据

演示数据为模拟的生态学多样性调查数据,7 个地区在春、夏、秋季对林地、草地、农田进行植物调查。核心变量如下:

字段 说明 示例
Plot 样地编号 For-Sp01
Season 采样季节 Spring, Summer, Autumn
Habitat 生境类型 Forest, Grassland, Farmland
Shannon Shannon 多样性指数 2.5 - 3.6
Simpson Simpson 优势度指数 0.7 - 0.96

数据结构:

读取数据并设置因子顺序:

r 复制代码
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))

二、基础箱线图

ggplot2geom_boxplot() 是绘制箱线图的核心函数,自动计算每组的中位数、四分位数和箱体范围。

r 复制代码
library(ggplot2)

ggplot(df, aes(x = Season, y = Shannon)) +
  geom_boxplot()

默认参数下输出为灰色填充、黑色边框。在此基础上加上坐标轴标签:

r 复制代码
p_base <- ggplot(df, aes(x = Season, y = Shannon)) +
  geom_boxplot() +
  labs(x = "Season", y = "Shannon index")
p_base

三、美化:配色与 jitter 散点

配色方案

Season 颜色 色号
Spring 蓝绿色 #00AFBB
Summer 金黄色 #E7B800
Autumn 橙红色 #FC4E07

jitter 散点叠加

在箱线图上叠加 jitter 散点可展示每个观测值的实际位置。使用 ggpubr::ggboxplot 让代码更简洁:

r 复制代码
library(ggpubr)

p_beauty <- ggboxplot(df, x = "Season", y = "Shannon",
                       color = "Season",
                       palette = c("#00AFBB", "#E7B800", "#FC4E07"),
                       xlab = "Season", ylab = "Shannon index",
                       add = "jitter", add.params = list(size = 1))
p_beauty

四、添加统计检验

检验方法选择

多组比较的标准方法为 ANOVA + Tukey HSD,但生态学数据常不满足正态性与方差齐性。Kruskal-Wallis 非参数检验不依赖分布假设,适用范围更广。若全局检验显著,再用两两 t 检验定位差异。

r 复制代码
# 全局检验
compare_means(Shannon ~ Season, data = df, method = "kruskal.test")

图形化标注

ggpubr::stat_compare_means() 可在图形上直接标注 p 值。将绘图逻辑封装为函数以便后续复用:

r 复制代码
# 定义两两比较组
groups_season <- list(c("Spring", "Summer"),
                       c("Spring", "Autumn"),
                       c("Summer", "Autumn"))

plot_single_factor <- function(yvar, ylabname) {
  ggboxplot(df, x = "Season", y = yvar,
            color = "Season",
            palette = c("#00AFBB", "#E7B800", "#FC4E07"),
            xlab = "Season", ylab = ylabname,
            add = "jitter", add.params = list(size = 1)) +
    stat_compare_means(comparisons = groups_season,
                       method = "t.test",
                       label = "p.format",
                       hide.ns = TRUE) +
    theme_clean
}

hide.ns = TRUE:仅标注显著的比较(设为 FALSE 则在非显著比较处标注 "ns")。
label = "p.format" 显示精确 p 值,改 "p.signif" 则显示为星号。

向内刻度线是发表级图表的常用细节:

r 复制代码
theme_clean <- theme(
  legend.position = "none",
  axis.ticks.length = unit(-0.1, "cm"),
  axis.text.x = element_text(margin = margin(t = 4)),
  axis.text.y = element_text(margin = margin(l = 4))
)

axis.ticks.length = unit(-0.1, "cm") 使刻度线向内延伸,在 ecology 类期刊中较为常见。

调用函数:

r 复制代码
p3_stat <- plot_single_factor("Shannon", "Shannon index")
p3_stat

五、双因素箱线图:季节 x 生境

单因素只回答"季节间有没有差异",但季节变化模式可能因生境而异------这就是双因素分析的价值。

组合变量

将 Season 和 Habitat 拼接为 Season_Habitat,x 轴直接容纳 9 个分组:

r 复制代码
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
    levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
               "Summer_Forest", "Summer_Grassland", "Summer_Farmland",
               "Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))

每种生境分配固定颜色,跨季节保持一致:

r 复制代码
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")

显著性字母标注

9 个分组用方括号线条标注会非常拥挤,改用显著性字母标注法(compact letter display)。共享同一字母的组间差异不显著,反之则显著。

实现分三步:pairwise_t_test 计算 p 值 → multcompLetters 转字母 → geom_text 标注到图:

r 复制代码
library(rstatix)
library(multcompView)
library(export)

generate_letters <- function(varname) {
  pw <- df %>%
    pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
                    p.adjust.method = "none")
  pvals <- pw$p
  names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
  letters <- multcompView::multcompLetters(pvals)
  temp <- data.frame(
    Season_Habitat = names(letters$Letters),
    label = letters$Letters,
    stringsAsFactors = FALSE
  )
  temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
  return(temp)
}

plot_two_factor <- function(varname, ylabname) {
  letters_df <- generate_letters(varname)
  ggboxplot(df, x = "Season_Habitat", y = varname,
            color = "Season_Habitat",
            palette = rep(habitat_colors, 3),
            xlab = "", ylab = ylabname,
            add = "jitter", add.params = list(size = 1)) +
    geom_text(data = letters_df,
              aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
              vjust = 0) +
    theme_clean +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

调用函数:

r 复制代码
q1 <- plot_two_factor("Shannon", "Shannon index")
q1

结果解读

  • Forest 在各季节多样性最高,季节波动小
  • Grassland 居中,秋季下降较明显
  • Farmland 最低,季节间波动最大------人为干扰强的生境缓冲能力更弱

单因素和双因素回答不同层次的问题,二者互补。


六、多指标拼图与多格式导出

将 Shannon 和 Simpson 拼成综合 Figure,方便期刊排版。

单因素拼图

r 复制代码
library(cowplot)
library(export)

p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")

p_final <- plot_grid(p1, p2, ncol = 2)

tiff("boxplot_publication.tif", width = 3100, height = 1600,
     pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()

png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()

graph2ppt(p_final, file = "boxplot_publication.pptx")

双因素拼图

r 复制代码
library(export)

q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")

q_final <- plot_grid(q1, q2, ncol = 2)

tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
     pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()

png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
    res = 280)
print(q_final); dev.off()

graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")

三种导出格式各有用途:

格式 用途 特点
TIFF 期刊投稿 无损压缩,符合投稿要求
PNG 预览/网络 通用格式,体积适中
PPT 后期美化 保留矢量信息,可编辑

特别推荐 graph2ppt(),导出后在 PowerPoint 中可以直接编辑标签、颜色与大小,非常方便。


七、完整代码

以下为本文全部代码,复制后修改 read.csv 中的文件名即可运行。

r 复制代码
# ===================== 加载包 =====================
library(ggplot2)
library(ggpubr)
library(rstatix)
library(multcompView)
library(cowplot)
library(export)
library(dplyr)

# ===================== 全局设置 =====================
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))

season_colors <- c("#00AFBB", "#E7B800", "#FC4E07")
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")

theme_clean <- theme(
  legend.position = "none",
  axis.ticks.length = unit(-0.1, "cm"),
  axis.text.x = element_text(margin = margin(t = 4)),
  axis.text.y = element_text(margin = margin(l = 4))
)

groups_season <- list(c("Spring", "Summer"),
                       c("Spring", "Autumn"),
                       c("Summer", "Autumn"))

# ===================== 单因素绘图函数 =====================
plot_single_factor <- function(yvar, ylabname) {
  ggboxplot(df, x = "Season", y = yvar,
            color = "Season",
            palette = season_colors,
            xlab = "Season", ylab = ylabname,
            add = "jitter", add.params = list(size = 1)) +
    stat_compare_means(comparisons = groups_season, method = "t.test",
                       label = "p.format", hide.ns = TRUE) +
    theme_clean
}

# ===================== 双因素准备:组合变量 =====================
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
    levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
               "Summer_Forest", "Summer_Grassland", "Summer_Farmland",
               "Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))

# ===================== 双因素显著性字母函数 =====================
generate_letters <- function(varname) {
  pw <- df %>%
    pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
                    p.adjust.method = "none")
  pvals <- pw$p
  names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
  letters <- multcompView::multcompLetters(pvals)
  temp <- data.frame(
    Season_Habitat = names(letters$Letters),
    label = letters$Letters,
    stringsAsFactors = FALSE
  )
  temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
  return(temp)
}

# ===================== 双因素绘图函数 =====================
plot_two_factor <- function(varname, ylabname) {
  letters_df <- generate_letters(varname)
  ggboxplot(df, x = "Season_Habitat", y = varname,
            color = "Season_Habitat",
            palette = rep(habitat_colors, 3),
            xlab = "", ylab = ylabname,
            add = "jitter", add.params = list(size = 1)) +
    geom_text(data = letters_df,
              aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
              vjust = 0) +
    theme_clean +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

# ===================== 单因素拼图 =====================
p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")

p_final <- plot_grid(p1, p2, ncol = 2,
                     labels = c("(a)", "(b)"), label_x = 0)

tiff("boxplot_publication.tif", width = 3100, height = 1600,
     pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()

png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()

graph2ppt(p_final, file = "boxplot_publication.pptx")

# ===================== 双因素拼图 =====================
q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")

q_final <- plot_grid(q1, q2, ncol = 2,
                     labels = c("(a)", "(b)"), label_x = 0)

tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
     pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()

png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
    res = 280)
print(q_final); dev.off()

graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")

适配自有数据

read.csv() 中的文件名替换为你的数据文件,并对应修改 xycolor 的变量映射即可。配色也可用 ggsci 包的 pal_npg()pal_aaas() 等方案直接调用。


八、要点总结

  • 配色分离:Season 与 Habitat 使用独立调色板,避免变量混淆
  • 先全局检验后两两比较:Kruskal-Wallis 判定整体差异,t 检验定位具体组间差异
  • 双因素用字母标注:分组较多时,显著性字母优于方括号线条的可读性
  • jitter 叠加 :展示数据分布细节;数据量过大时可调整 width 参数
  • 矢量 PPT 导出graph2ppt() 保留可编辑矢量信息,便于美化微调

如果对你有帮助,欢迎 点赞 + 收藏

相关推荐
小白学大数据5 小时前
Scrapling:极简高效的 Python 智能爬虫框架
开发语言·爬虫·python·数据分析
Cloud_Shy6187 小时前
Python 数据分析基础入门:《Excel Python:飞速搞定数据分析与处理》学习笔记系列(第十二章 用户定义函数 上篇)
python·数据分析·excel·pandas
Omics Pro11 小时前
免费!糖蛋白质组学数据分析
开发语言·深度学习·数据挖掘·数据分析·r语言·excel·知识图谱
babe小鑫11 小时前
2026计算机专业职场进阶:数据分析的价值与路径
数据挖掘·数据分析
码界筑梦坊11 小时前
131-基于Flask的美国新泽西州自动售货机销售数据可视化分析系统
开发语言·python·信息可视化·数据分析·flask·毕业设计
SelectDB1 天前
Agent 时代,为什么传统的可观测方案不适用了?
大数据·数据库·数据分析
kejiayuan1 天前
FineBI组件制作-构成分析类图表
数据分析·数据可视化·finebi
kejiayuan1 天前
FineBI组件制作-流向分析类图表
数据分析·数据可视化·finebi
YangYang9YangYan1 天前
2026产品专员学习数据分析的价值与路径
学习·数据挖掘·数据分析