R语言科研绘图:箱线图从基础到发表级完整教程
还在用柱状图加误差线做差异比较?箱线图一张图给你五个统计量------最小值、Q1、中位数、Q3、最大值,一眼看出数据偏斜、离散度和离群值。本文从零开始,用 R 的 ggplot2 逐步叠加美化、统计检验、双因素分析和拼图导出,最终达到期刊发表标准。所有代码实测可运行。
一、认识原始数据
演示数据为模拟的生态学多样性调查数据,7 个地区在春、夏、秋季对林地、草地、农田进行植物调查。核心变量如下:
| 字段 | 说明 | 示例 |
|---|---|---|
| Plot | 样地编号 | For-Sp01 |
| Season | 采样季节 | Spring, Summer, Autumn |
| Habitat | 生境类型 | Forest, Grassland, Farmland |
| Shannon | Shannon 多样性指数 | 2.5 - 3.6 |
| Simpson | Simpson 优势度指数 | 0.7 - 0.96 |
数据结构:
读取数据并设置因子顺序:
r
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))
二、基础箱线图
ggplot2 的 geom_boxplot() 是绘制箱线图的核心函数,自动计算每组的中位数、四分位数和箱体范围。
r
library(ggplot2)
ggplot(df, aes(x = Season, y = Shannon)) +
geom_boxplot()
默认参数下输出为灰色填充、黑色边框。在此基础上加上坐标轴标签:
r
p_base <- ggplot(df, aes(x = Season, y = Shannon)) +
geom_boxplot() +
labs(x = "Season", y = "Shannon index")
p_base
三、美化:配色与 jitter 散点
配色方案
| Season | 颜色 | 色号 |
|---|---|---|
| Spring | 蓝绿色 | #00AFBB |
| Summer | 金黄色 | #E7B800 |
| Autumn | 橙红色 | #FC4E07 |
jitter 散点叠加
在箱线图上叠加 jitter 散点可展示每个观测值的实际位置。使用 ggpubr::ggboxplot 让代码更简洁:
r
library(ggpubr)
p_beauty <- ggboxplot(df, x = "Season", y = "Shannon",
color = "Season",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
xlab = "Season", ylab = "Shannon index",
add = "jitter", add.params = list(size = 1))
p_beauty
四、添加统计检验
检验方法选择
多组比较的标准方法为 ANOVA + Tukey HSD,但生态学数据常不满足正态性与方差齐性。Kruskal-Wallis 非参数检验不依赖分布假设,适用范围更广。若全局检验显著,再用两两 t 检验定位差异。
r
# 全局检验
compare_means(Shannon ~ Season, data = df, method = "kruskal.test")
图形化标注
ggpubr::stat_compare_means() 可在图形上直接标注 p 值。将绘图逻辑封装为函数以便后续复用:
r
# 定义两两比较组
groups_season <- list(c("Spring", "Summer"),
c("Spring", "Autumn"),
c("Summer", "Autumn"))
plot_single_factor <- function(yvar, ylabname) {
ggboxplot(df, x = "Season", y = yvar,
color = "Season",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
xlab = "Season", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
stat_compare_means(comparisons = groups_season,
method = "t.test",
label = "p.format",
hide.ns = TRUE) +
theme_clean
}
hide.ns = TRUE:仅标注显著的比较(设为FALSE则在非显著比较处标注 "ns")。
label = "p.format"显示精确 p 值,改"p.signif"则显示为星号。
向内刻度线是发表级图表的常用细节:
r
theme_clean <- theme(
legend.position = "none",
axis.ticks.length = unit(-0.1, "cm"),
axis.text.x = element_text(margin = margin(t = 4)),
axis.text.y = element_text(margin = margin(l = 4))
)
axis.ticks.length = unit(-0.1, "cm")使刻度线向内延伸,在 ecology 类期刊中较为常见。
调用函数:
r
p3_stat <- plot_single_factor("Shannon", "Shannon index")
p3_stat

五、双因素箱线图:季节 x 生境
单因素只回答"季节间有没有差异",但季节变化模式可能因生境而异------这就是双因素分析的价值。
组合变量
将 Season 和 Habitat 拼接为 Season_Habitat,x 轴直接容纳 9 个分组:
r
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
"Summer_Forest", "Summer_Grassland", "Summer_Farmland",
"Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))
每种生境分配固定颜色,跨季节保持一致:
r
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")
显著性字母标注
9 个分组用方括号线条标注会非常拥挤,改用显著性字母标注法(compact letter display)。共享同一字母的组间差异不显著,反之则显著。
实现分三步:pairwise_t_test 计算 p 值 → multcompLetters 转字母 → geom_text 标注到图:
r
library(rstatix)
library(multcompView)
library(export)
generate_letters <- function(varname) {
pw <- df %>%
pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
p.adjust.method = "none")
pvals <- pw$p
names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
letters <- multcompView::multcompLetters(pvals)
temp <- data.frame(
Season_Habitat = names(letters$Letters),
label = letters$Letters,
stringsAsFactors = FALSE
)
temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
return(temp)
}
plot_two_factor <- function(varname, ylabname) {
letters_df <- generate_letters(varname)
ggboxplot(df, x = "Season_Habitat", y = varname,
color = "Season_Habitat",
palette = rep(habitat_colors, 3),
xlab = "", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
geom_text(data = letters_df,
aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
vjust = 0) +
theme_clean +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
调用函数:
r
q1 <- plot_two_factor("Shannon", "Shannon index")
q1

结果解读
- Forest 在各季节多样性最高,季节波动小
- Grassland 居中,秋季下降较明显
- Farmland 最低,季节间波动最大------人为干扰强的生境缓冲能力更弱
单因素和双因素回答不同层次的问题,二者互补。
六、多指标拼图与多格式导出
将 Shannon 和 Simpson 拼成综合 Figure,方便期刊排版。
单因素拼图
r
library(cowplot)
library(export)
p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")
p_final <- plot_grid(p1, p2, ncol = 2)
tiff("boxplot_publication.tif", width = 3100, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()
png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()
graph2ppt(p_final, file = "boxplot_publication.pptx")
双因素拼图
r
library(export)
q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")
q_final <- plot_grid(q1, q2, ncol = 2)
tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()
png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
res = 280)
print(q_final); dev.off()
graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")
三种导出格式各有用途:
| 格式 | 用途 | 特点 |
|---|---|---|
| TIFF | 期刊投稿 | 无损压缩,符合投稿要求 |
| PNG | 预览/网络 | 通用格式,体积适中 |
| PPT | 后期美化 | 保留矢量信息,可编辑 |
特别推荐
graph2ppt(),导出后在 PowerPoint 中可以直接编辑标签、颜色与大小,非常方便。
七、完整代码
以下为本文全部代码,复制后修改 read.csv 中的文件名即可运行。
r
# ===================== 加载包 =====================
library(ggplot2)
library(ggpubr)
library(rstatix)
library(multcompView)
library(cowplot)
library(export)
library(dplyr)
# ===================== 全局设置 =====================
df <- read.csv("ecology_diversity_data.csv", header = TRUE)
df$Season <- factor(df$Season, levels = c("Spring", "Summer", "Autumn"))
df$Habitat <- factor(df$Habitat, levels = c("Forest", "Grassland", "Farmland"))
season_colors <- c("#00AFBB", "#E7B800", "#FC4E07")
habitat_colors <- c("#1B9E77", "#D95F02", "#7570B3")
theme_clean <- theme(
legend.position = "none",
axis.ticks.length = unit(-0.1, "cm"),
axis.text.x = element_text(margin = margin(t = 4)),
axis.text.y = element_text(margin = margin(l = 4))
)
groups_season <- list(c("Spring", "Summer"),
c("Spring", "Autumn"),
c("Summer", "Autumn"))
# ===================== 单因素绘图函数 =====================
plot_single_factor <- function(yvar, ylabname) {
ggboxplot(df, x = "Season", y = yvar,
color = "Season",
palette = season_colors,
xlab = "Season", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
stat_compare_means(comparisons = groups_season, method = "t.test",
label = "p.format", hide.ns = TRUE) +
theme_clean
}
# ===================== 双因素准备:组合变量 =====================
df$Season_Habitat <- factor(paste(df$Season, df$Habitat, sep = "_"),
levels = c("Spring_Forest", "Spring_Grassland", "Spring_Farmland",
"Summer_Forest", "Summer_Grassland", "Summer_Farmland",
"Autumn_Forest", "Autumn_Grassland", "Autumn_Farmland"))
# ===================== 双因素显著性字母函数 =====================
generate_letters <- function(varname) {
pw <- df %>%
pairwise_t_test(as.formula(paste(varname, "~ Season_Habitat")),
p.adjust.method = "none")
pvals <- pw$p
names(pvals) <- paste(pw$group1, pw$group2, sep = "-")
letters <- multcompView::multcompLetters(pvals)
temp <- data.frame(
Season_Habitat = names(letters$Letters),
label = letters$Letters,
stringsAsFactors = FALSE
)
temp$Season_Habitat <- factor(temp$Season_Habitat, levels = levels(df$Season_Habitat))
return(temp)
}
# ===================== 双因素绘图函数 =====================
plot_two_factor <- function(varname, ylabname) {
letters_df <- generate_letters(varname)
ggboxplot(df, x = "Season_Habitat", y = varname,
color = "Season_Habitat",
palette = rep(habitat_colors, 3),
xlab = "", ylab = ylabname,
add = "jitter", add.params = list(size = 1)) +
geom_text(data = letters_df,
aes(x = Season_Habitat, y = max(df[[varname]]) * 1.05, label = label),
vjust = 0) +
theme_clean +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
# ===================== 单因素拼图 =====================
p1 <- plot_single_factor("Shannon", "Shannon index")
p2 <- plot_single_factor("Simpson", "Simpson index")
p_final <- plot_grid(p1, p2, ncol = 2,
labels = c("(a)", "(b)"), label_x = 0)
tiff("boxplot_publication.tif", width = 3100, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(p_final); dev.off()
png("boxplot_publication.png", width = 3100, height = 1600, res = 280)
print(p_final); dev.off()
graph2ppt(p_final, file = "boxplot_publication.pptx")
# ===================== 双因素拼图 =====================
q1 <- plot_two_factor("Shannon", "Shannon index")
q2 <- plot_two_factor("Simpson", "Simpson index")
q_final <- plot_grid(q1, q2, ncol = 2,
labels = c("(a)", "(b)"), label_x = 0)
tiff("boxplot_two_way_comb_publication.tif", width = 3400, height = 1600,
pointsize = 8, res = 280, compression = "lzw")
print(q_final); dev.off()
png("boxplot_two_way_comb_publication.png", width = 3400, height = 1600,
res = 280)
print(q_final); dev.off()
graph2ppt(q_final, file = "boxplot_two_way_comb_publication.pptx")
适配自有数据
将 read.csv() 中的文件名替换为你的数据文件,并对应修改 x、y、color 的变量映射即可。配色也可用 ggsci 包的 pal_npg()、pal_aaas() 等方案直接调用。
八、要点总结
- 配色分离:Season 与 Habitat 使用独立调色板,避免变量混淆
- 先全局检验后两两比较:Kruskal-Wallis 判定整体差异,t 检验定位具体组间差异
- 双因素用字母标注:分组较多时,显著性字母优于方括号线条的可读性
- jitter 叠加 :展示数据分布细节;数据量过大时可调整
width参数 - 矢量 PPT 导出 :
graph2ppt()保留可编辑矢量信息,便于美化微调
如果对你有帮助,欢迎 点赞 + 收藏。

