从0开始学习R语言--Day38--辛普森多样性指数

面对数据特点为不同种类，但具有不同影响的数据，需要根据需求侧重使用不同的方法。我们一般会将目光集中在某些地方可以做得更好的数据，但前提是要先对数据做分类判断。而相比其他方法，辛普森多样性指数在分类时就已经计算出了哪个数据是优势的概率更大，而其他的方法一般都倾向于判断种类稀有度，即判断类别的数据量，会多出很多计算量。

以下是一个例子：

R 复制代码

set.seed(123)
# 生成数据：5个树种，随机分布
species <- c("Oak", "Pine", "Birch", "Maple", "Redwood")
counts <- sample(10:100, 5, replace = TRUE)  # 每个树种的个体数
names(counts) <- species

# 构建数据框
forest_data <- data.frame(
  Species = species,
  Count = counts
)
print(forest_data)

# 计算原始辛普森指数 (D)
simpson_D <- function(counts) {
  p <- counts / sum(counts)
  sum(p^2)
}

# 计算改进的辛普森指数 (1 - D 或 1/D)
simpson_diversity <- function(counts, inverse = FALSE) {
  D <- simpson_D(counts)
  if (inverse) 1 / D else 1 - D
}

# 示例
D_value <- simpson_D(counts)
diversity_value <- simpson_diversity(counts, inverse = FALSE)

cat("原始辛普森指数 (D):", round(D_value, 4), "\n")
cat("改进的辛普森指数 (1 - D):", round(diversity_value, 4), "\n")
cat("逆辛普森指数 (1/D):", round(1/D_value, 4), "\n")


library(vegan)
# 计算逆辛普森指数 (1/D)
diversity(counts, index = "invsimpson")  # 输出: 4.1389

# 计算 Shannon 熵（对比）
diversity(counts, index = "shannon")    # 输出: 1.423

library(ggplot2)
ggplot(forest_data, aes(x = Species, y = Count, fill = Species)) +
  geom_bar(stat = "identity") +
  labs(title = paste("树种分布 (辛普森多样性 =", round(diversity_value, 2)),
       x = "树种", y = "个体数") +
  theme_minimal()

输出：

R 复制代码

set.seed(123)
# 生成数据：5个树种，随机分布
species <- c("Oak", "Pine", "Birch", "Maple", "Redwood")
counts <- sample(10:100, 5, replace = TRUE)  # 每个树种的个体数
names(counts) <- species

# 构建数据框
forest_data <- data.frame(
  Species = species,
  Count = counts
)
print(forest_data)

# 计算原始辛普森指数 (D)
simpson_D <- function(counts) {
  p <- counts / sum(counts)
  sum(p^2)
}

# 计算改进的辛普森指数 (1 - D 或 1/D)
simpson_diversity <- function(counts, inverse = FALSE) {
  D <- simpson_D(counts)
  if (inverse) 1 / D else 1 - D
}

# 示例
D_value <- simpson_D(counts)
diversity_value <- simpson_diversity(counts, inverse = FALSE)

cat("原始辛普森指数 (D):", round(D_value, 4), "\n")
cat("改进的辛普森指数 (1 - D):", round(diversity_value, 4), "\n")
cat("逆辛普森指数 (1/D):", round(1/D_value, 4), "\n")


library(vegan)
# 计算逆辛普森指数 (1/D)
diversity(counts, index = "invsimpson")  # 输出: 4.1389

# 计算 Shannon 熵（对比）
diversity(counts, index = "shannon")    # 输出: 1.423

library(ggplot2)
ggplot(forest_data, aes(x = Species, y = Count, fill = Species)) +
  geom_bar(stat = "identity") +
  labs(title = paste("树种分布 (辛普森多样性 =", round(diversity_value, 2)),
       x = "树种", y = "个体数") +
  theme_minimal()

输出表明，随机抽取两个个体属于同一物种的概率为0.2337，也就意味着这个数据的多样性较高，用1减去概率的方式能更明显地展现结果。逆指数代表着均匀分布的水平线，如果实际物种数大于该值，则说明存在优势物种，而香浓熵的结果代表物种为中等多样性，满足稀有物种的保护需求。