R语言分布式数据可视化
知识点总览
- 直方图:展示数据分布形态与频数/频率
- 核密度图:平滑估计概率密度函数
- 箱线图:展示分位数、异常值及数据范围
- 小提琴图:箱线图+核密度图,展示分布形状
- 金字塔图:比较两组人群(如年龄结构)
- 脊线图:多个分组密度曲线重叠展示
- 点阵图:原始数据点分布,避免重叠
1. 直方图 (Histogram)
语法知识点
hist(x, breaks, freq, col, border, main, xlab, ylab)x: 数值向量breaks: 分组数或分割点向量freq: TRUE(频数) / FALSE(概率密度)col: 填充颜色border: 边框颜色
案例代码
r
# 生成1000个正态分布随机数,均值=50,标准差=10
data <- rnorm(1000, mean = 50, sd = 10)
# 绘制直方图
hist(data,
breaks = 30, # 分成30个组
freq = FALSE, # 显示概率密度(而非频数)
col = "lightblue", # 柱子填充为浅蓝色
border = "darkblue", # 边框为深蓝色
main = "直方图示例:正态分布数据", # 主标题
xlab = "数值", # x轴标签
ylab = "密度") # y轴标签
# 添加核密度曲线(参考下一节)
lines(density(data), col = "red", lwd = 2)
2. 核密度图 (Kernel Density Plot)
语法知识点
density(x, bw, kernel): 计算核密度估计bw: 带宽(平滑程度)kernel: "gaussian", "rectangular"等
plot(density对象)或lines(density对象)
案例代码
r
# 生成双峰分布数据(混合两个正态分布)
data_bimodal <- c(rnorm(500, mean = 40, sd = 5),
rnorm(500, mean = 70, sd = 6))
# 计算核密度
dens <- density(data_bimodal,
bw = "nrd0", # 自动选择带宽
kernel = "gaussian") # 高斯核
# 绘制核密度图
plot(dens,
main = "核密度图:双峰数据",
xlab = "数值",
ylab = "密度",
col = "darkgreen",
lwd = 2,
type = "l") # 线条类型
# 添加填充色(多边形)
polygon(dens, col = rgb(0, 1, 0, 0.3), border = NA)
# 添加实际数据的小地毯图
rug(data_bimodal, col = "gray")
3. 箱线图 (Boxplot)
语法知识点
boxplot(x, formula, data, range, varwidth, notch, horizontal, col)x: 数值向量或列表formula: y ~ group(分组箱线图)range: 须线延伸倍数(默认1.5)notch: 是否画凹口(中位数置信区间)horizontal: 是否水平
案例代码
r
# 创建分组数据
set.seed(123)
group_a <- rnorm(100, mean = 60, sd = 8)
group_b <- rnorm(100, mean = 65, sd = 10)
group_c <- rnorm(100, mean = 55, sd = 7)
# 合并为数据框
df_box <- data.frame(
value = c(group_a, group_b, group_c),
group = rep(c("A", "B", "C"), each = 100)
)
# 绘制分组箱线图
boxplot(value ~ group,
data = df_box,
range = 1.5, # 须线范围
varwidth = TRUE, # 箱子宽度与样本量成正比
notch = FALSE, # 不画凹口
col = c("gold", "tomato", "lightblue"),
main = "箱线图:三组数据比较",
xlab = "组别",
ylab = "数值",
border = "darkgray")
# 添加均值点(红点)
means <- aggregate(value ~ group, data = df_box, FUN = mean)
points(1:3, means$value, col = "red", pch = 19, cex = 1.5)
4. 小提琴图 (Violin Plot)
语法知识点
- 需要
vioplot包 vioplot(x1, x2, ..., names, col, horizontal)- 结合箱线图与密度曲线
案例代码
r
# 安装并加载vioplot包
if(!require(vioplot)) install.packages("vioplot")
library(vioplot)
# 生成三组不同分布的数据
set.seed(456)
violin1 <- rnorm(200, mean = 50, sd = 8)
violin2 <- rnorm(200, mean = 60, sd = 12)
violin3 <- rnorm(200, mean = 45, sd = 6)
# 绘制小提琴图
vioplot(violin1, violin2, violin3,
names = c("组1", "组2", "组3"), # 组名
col = "lightgreen", # 填充色
border = "darkgreen", # 边框色
main = "小提琴图:分布形状对比",
xlab = "分组",
ylab = "密度",
horizontal = FALSE, # 垂直方向
drawRect = TRUE) # 内部画矩形(箱线图)
# 添加均值点(计算并添加)
means_v <- c(mean(violin1), mean(violin2), mean(violin3))
points(1:3, means_v, col = "red", pch = 18, cex = 1.5)
5. 金字塔图 (Pyramid Plot)
语法知识点
- 常用于人口年龄结构(男左女右)
- 使用
plotrix包中的pyramid.plot - 需要两个向量(左侧和右侧数据)
案例代码
r
# 安装并加载plotrix
if(!require(plotrix)) install.packages("plotrix")
library(plotrix)
# 构造年龄分组和男女数据
age_groups <- c("0-9", "10-19", "20-29", "30-39", "40-49", "50+")
male_pop <- c(120, 140, 130, 110, 90, 60)
female_pop <- c(115, 135, 125, 105, 95, 70)
# 绘制金字塔图
pyramid.plot(male_pop, female_pop,
labels = age_groups,
top.labels = c("男性", "年龄组", "女性"),
main = "人口金字塔图",
gap = 0.5, # 中间间隔宽度
laxlab = seq(0, 150, 50), # 左侧x轴刻度
raxlab = seq(0, 150, 50), # 右侧x轴刻度
col = c("lightblue", "pink"),
border = "gray")
# 添加图例
legend("topright", legend = c("男性", "女性"),
fill = c("lightblue", "pink"), bty = "n")
6. 脊线图 (Ridge Plot)
语法知识点
- 需要
ggridges包(基于ggplot2) geom_density_ridges(): 多个密度曲线垂直堆叠- 支持
scale缩放、alpha透明度
案例代码
r
# 安装并加载所需包
if(!require(ggplot2)) install.packages("ggplot2")
if(!require(ggridges)) install.packages("ggridges")
library(ggplot2)
library(ggridges)
# 构造数据:不同组的分布
set.seed(789)
n_per_group <- 300
ridge_data <- data.frame(
value = c(rnorm(n_per_group, 0, 1),
rnorm(n_per_group, 2, 1.2),
rnorm(n_per_group, 4, 0.8),
rnorm(n_per_group, 6, 1.5)),
group = rep(c("组A", "组B", "组C", "组D"), each = n_per_group)
)
# 绘制脊线图
ggplot(ridge_data, aes(x = value, y = group)) +
geom_density_ridges(
scale = 2, # 峰重叠程度,>1时更高
alpha = 0.6, # 透明度
fill = "steelblue",
color = "white"
) +
labs(title = "脊线图:多组密度分布对比",
x = "数值", y = "分组") +
theme_ridges() # 专用主题
7. 点阵图 (Dot Plot / Strip Plot)
语法知识点
stripchart(x, method, jitter, col, vertical, pch)method: "overplot"(重叠),"jitter"(抖动),"stack"(堆叠)jitter: 抖动幅度vertical: 是否垂直方向
案例代码
r
# 生成三组数据
set.seed(321)
group1 <- rnorm(80, mean = 55, sd = 6)
group2 <- rnorm(80, mean = 62, sd = 7)
group3 <- rnorm(80, mean = 50, sd = 5)
# 组合为列表
dot_data <- list(Group1 = group1, Group2 = group2, Group3 = group3)
# 绘制点阵图(抖动避免重叠)
stripchart(dot_data,
method = "jitter", # 抖动方法
jitter = 0.2, # 抖动幅度
vertical = TRUE, # 垂直显示
col = c("darkred", "darkblue", "darkgreen"),
pch = 19, # 实心圆点
main = "点阵图:原始数据分布",
xlab = "分组",
ylab = "数值",
cex = 0.8) # 点的大小
# 添加各组均值线
means_dot <- c(mean(group1), mean(group2), mean(group3))
points(1:3, means_dot, col = "black", pch = 3, cex = 2, lwd = 2)
# 添加箱线图轮廓(可选)
boxplot(dot_data, add = TRUE, boxwex = 0.3,
col = NA, border = "gray", outline = FALSE)
本章小结
| 图表类型 | 主要用途 | 关键函数/包 | 注意事项 |
|---|---|---|---|
| 直方图 | 分布形态、峰值 | hist |
分组数影响形状 |
| 核密度图 | 平滑密度估计 | density + plot |
带宽选择很关键 |
| 箱线图 | 分位数、异常值 | boxplot |
须线范围1.5倍IQR常用 |
| 小提琴图 | 分布+分位数 | vioplot |
适合多组比较 |
| 金字塔图 | 双向对比 | pyramid.plot |
左右轴刻度需对称 |
| 脊线图 | 多组密度堆叠 | ggridges |
注意重叠比例 |
| 点阵图 | 显示所有点 | stripchart |
避免过度重叠需抖动 |
所有代码均可直接复制到R环境中运行(需联网安装缺失包)。建议结合真实数据调整参数,观察图形变化。