R语言【dplyr】——case_when()是一般向量化的 if-else(),该函数允许您将多个 if_else() 语句矢量化

Package dplyr version 1.1.4


Parameters

R 复制代码
case_when(..., .default = NULL, .ptype = NULL, .size = NULL)

参数【...】:<dynamic-dots> 一组两面公式(two-sided formulas)。

  • 公式左边(left hand side,LHS)决定了哪些值符合这种情况。
  • 公式右边(right hand side,RHS)提供了替换值。
  • LHS 输入的结果必须是逻辑向量。
  • RHS 输入将被强制转换为通用类型。
  • 所有输入的数据都将被回收利用,恢复到其平常大小。尽管如此,我们还是鼓励所有 LHS 输入的大小相同。
  • 循环主要适用于 RHS 输入,在这种情况下,您可能会提供一个大小为 1 的输入,它将被循环为 LHS 输入的大小。
  • 输入 NULL 将被忽略。

参数【.default】:当所有 LHS 输入返回 FALSE 或 NA 时使用的值。

  • 参数【.default】的大小必须为 1 或与参数【...】计算出的通用大小相同。
  • 参数【.default】与 RHS 输入一起参与通用类型的计算。
  • LHS 条件中的 NA 值将被视为 FALSE,这意味着这些位置的结果将被分配为参数【.default】值。要以不同的方式处理条件中的缺失值,必须在它们落入参数【.default】之前明确地用另一个条件来捕获它们。这通常涉及 is.na(x) ~ value 的一些变体,以适应您对 case_when() 的使用。
  • 如果为 NULL(默认值),将使用缺失值。

参数【.ptype】:一个可选的原型,用于声明所需的输出类型。如果提供,将覆盖 RHS 输入的通用类型。

参数【.size】:一个可选的大小,用于声明所需的输出大小。如果提供,它将覆盖从参数【...】计算出的通用大小。


Value

一个向量,其大小与参数【...】中输入值计算出的共同大小相同,类型与参数【...】中 RHS 输入值的共同类型相同。


Examples

1. 最简单的例子

R 复制代码
x <- 1:70
case_when(
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
  .default = as.character(x)
)
R 复制代码
 [1] "1"         "2"         "3"         "4"         "fizz"      "6"        
 [7] "buzz"      "8"         "9"         "fizz"      "11"        "12"       
[13] "13"        "buzz"      "fizz"      "16"        "17"        "18"       
[19] "19"        "fizz"      "buzz"      "22"        "23"        "24"       
[25] "fizz"      "26"        "27"        "buzz"      "29"        "fizz"     
[31] "31"        "32"        "33"        "34"        "fizz buzz" "36"       
[37] "37"        "38"        "39"        "fizz"      "41"        "buzz"     
[43] "43"        "44"        "fizz"      "46"        "47"        "48"       
[49] "buzz"      "fizz"      "51"        "52"        "53"        "54"       
[55] "fizz"      "buzz"      "57"        "58"        "59"        "fizz"     
[61] "61"        "62"        "buzz"      "64"        "fizz"      "66"       
[67] "67"        "68"        "69"        "fizz buzz"

2. 与 if语句一样,条件会按顺序进行检测,所以您应该将条件按照最严格到最宽松排列,否则会出现以下情况

R 复制代码
x <- 1:70
case_when(
  x %%  5 == 0 ~ "fizz",
  x %%  7 == 0 ~ "buzz",
  x %% 35 == 0 ~ "fizz buzz",
  .default = as.character(x)
)
R 复制代码
 [1] "1"    "2"    "3"    "4"    "fizz" "6"    "buzz" "8"    "9"   
[10] "fizz" "11"   "12"   "13"   "buzz" "fizz" "16"   "17"   "18"  
[19] "19"   "fizz" "buzz" "22"   "23"   "24"   "fizz" "26"   "27"  
[28] "buzz" "29"   "fizz" "31"   "32"   "33"   "34"   "fizz" "36"  
[37] "37"   "38"   "39"   "fizz" "41"   "buzz" "43"   "44"   "fizz"
[46] "46"   "47"   "48"   "buzz" "fizz" "51"   "52"   "53"   "54"  
[55] "fizz" "buzz" "57"   "58"   "59"   "fizz" "61"   "62"   "buzz"
[64] "64"   "fizz" "66"   "67"   "68"   "69"   "fizz"

3. 如果元素不符合任何条件,那么就会触发参数【.default】,默认为NA

R 复制代码
x <- 1:70
case_when(
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
)
R 复制代码
 [1] NA          NA          NA          NA          "fizz"     
 [6] NA          "buzz"      NA          NA          "fizz"     
[11] NA          NA          NA          "buzz"      "fizz"     
[16] NA          NA          NA          NA          "fizz"     
[21] "buzz"      NA          NA          NA          "fizz"     
[26] NA          NA          "buzz"      NA          "fizz"     
[31] NA          NA          NA          NA          "fizz buzz"
[36] NA          NA          NA          NA          "fizz"     
[41] NA          "buzz"      NA          NA          "fizz"     
[46] NA          NA          NA          "buzz"      "fizz"     
[51] NA          NA          NA          NA          "fizz"     
[56] "buzz"      NA          NA          NA          "fizz"     
[61] NA          NA          "buzz"      NA          "fizz"     
[66] NA          NA          NA          NA          "fizz buzz"

4. 请注意,LHS 上的 NA 值将被视为 FALSE,并触发参数【.default】。如果要使用不同的值,必须明确处理它们。处理缺失值的具体方法取决于您使用的 LHS 条件集

R 复制代码
x <- 1:70
x[2:4] <- NA_real_
case_when(
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
  is.na(x) ~ "nope",
  .default = as.character(x)
)
R 复制代码
 [1] "1"         "nope"      "nope"      "nope"      "fizz"     
 [6] "6"         "buzz"      "8"         "9"         "fizz"     
[11] "11"        "12"        "13"        "buzz"      "fizz"     
[16] "16"        "17"        "18"        "19"        "fizz"     
[21] "buzz"      "22"        "23"        "24"        "fizz"     
[26] "26"        "27"        "buzz"      "29"        "fizz"     
[31] "31"        "32"        "33"        "34"        "fizz buzz"
[36] "36"        "37"        "38"        "39"        "fizz"     
[41] "41"        "buzz"      "43"        "44"        "fizz"     
[46] "46"        "47"        "48"        "buzz"      "fizz"     
[51] "51"        "52"        "53"        "54"        "fizz"     
[56] "buzz"      "57"        "58"        "59"        "fizz"     
[61] "61"        "62"        "buzz"      "64"        "fizz"     
[66] "66"        "67"        "68"        "69"        "fizz buzz"

5. case_when() 对所有 RHS 表达式进行求值,然后通过提取所选(通过 LHS 表达式)部分来构建结果

R 复制代码
y <- seq(-2, 2, by = .5)
case_when(
  y >= 0 ~ sqrt(y),
  .default = y
)
R 复制代码
[1] -2.0000000 -1.5000000 -1.0000000 -0.5000000  0.0000000  0.7071068
[7]  1.0000000  1.2247449  1.4142136
Warning message:
In sqrt(y) : 产生了NaNs

6. 当你想创建一个依赖于现有变量复杂组合的新变量时,**case_when()**在 mutate() 中特别有用

R 复制代码
starwars
R 复制代码
# A tibble: 87 × 14
   name       height  mass hair_color skin_color eye_color birth_year
   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl>
 1 Luke Skyw...    172    77 blond      fair       blue            19  
 2 C-3PO         167    75 NA         gold       yellow         112  
 3 R2-D2          96    32 NA         white, bl... red             33  
 4 Darth Vad...    202   136 none       white      yellow          41.9
 5 Leia Orga...    150    49 brown      light      brown           19  
 6 Owen Lars     178   120 brown, gr... light      blue            52  
 7 Beru Whit...    165    75 brown      light      blue            47  
 8 R5-D4          97    32 NA         white, red red             NA  
 9 Biggs Dar...    183    84 black      light      brown           24  
10 Obi-Wan K...    182    77 auburn, w... fair       blue-gray       57  
# ℹ 77 more rows
# ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>,
#   species <chr>, films <list>, vehicles <list>, starships <list>
# ℹ Use `print(n = ...)` to see more rows
R 复制代码
starwars %>%
  select(name:mass, gender, species) %>%
  mutate(
    type = case_when(
      height > 200 | mass > 200 ~ "large",
      species == "Droid" ~ "robot",
      .default = "other"
    )
  )
R 复制代码
# A tibble: 87 × 6
   name               height  mass gender    species type 
   <chr>               <int> <dbl> <chr>     <chr>   <chr>
 1 Luke Skywalker        172    77 masculine Human   other
 2 C-3PO                 167    75 masculine Droid   robot
 3 R2-D2                  96    32 masculine Droid   robot
 4 Darth Vader           202   136 masculine Human   large
 5 Leia Organa           150    49 feminine  Human   other
 6 Owen Lars             178   120 masculine Human   other
 7 Beru Whitesun Lars    165    75 feminine  Human   other
 8 R5-D4                  97    32 masculine Droid   robot
 9 Biggs Darklighter     183    84 masculine Human   other
10 Obi-Wan Kenobi        182    77 masculine Human   other
# ℹ 77 more rows
# ℹ Use `print(n = ...)` to see more rows

7. case_when() 不是一个整洁的条件函数。如果想重复使用相同的模式,请在自定义函数中调用 case_when()

R 复制代码
case_character_type <- function(height, mass, species) {
  case_when(
    height > 200 | mass > 200 ~ "large",
    species == "Droid" ~ "robot",
    .default = "other"
  )
}

case_character_type(150, 250, "Droid")
case_character_type(150, 150, "Droid")
R 复制代码
[1] "large"
[1] "robot"

8. 上述函数也可在 mutate() 中使用

R 复制代码
starwars %>%
  mutate(type = case_character_type(height, mass, species)) %>%
  pull(type)
R 复制代码
 [1] "other" "robot" "robot" "large" "other" "other" "other" "robot"
 [9] "other" "other" "other" "other" "large" "other" "other" "large"
[17] "other" "other" "other" "other" "other" "robot" "other" "other"
[25] "other" "other" "other" "other" "other" "other" "other" "other"
[33] "other" "other" "other" "large" "large" "other" "other" "other"
[41] "other" "other" "other" "other" "other" "other" "other" "other"
[49] "other" "other" "other" "other" "other" "other" "other" "large"
[57] "other" "other" "other" "other" "other" "other" "other" "other"
[65] "other" "other" "other" "other" "other" "other" "large" "large"
[73] "other" "robot" "other" "other" "other" "large" "large" "other"
[81] "other" "large" "other" "other" "other" "robot" "other"
  1. **case_when()**忽略 NULL 输入。当你想只在特定条件下使用模式时,这很有用。在这里,我们将利用 if 在没有 else 子句时返回 NULL 这一事实。
R 复制代码
case_character_type <- function(height, mass, species, robots = TRUE) {
  case_when(
    height > 200 | mass > 200 ~ "large",
    if (robots) species == "Droid" ~ "robot",
    .default = "other"
  )
}

starwars %>%
  mutate(type = case_character_type(height, mass, species, robots = FALSE)) %>%
  pull(type)
R 复制代码
 [1] "other" "other" "other" "large" "other" "other" "other" "other"
 [9] "other" "other" "other" "other" "large" "other" "other" "large"
[17] "other" "other" "other" "other" "other" "other" "other" "other"
[25] "other" "other" "other" "other" "other" "other" "other" "other"
[33] "other" "other" "other" "large" "large" "other" "other" "other"
[41] "other" "other" "other" "other" "other" "other" "other" "other"
[49] "other" "other" "other" "other" "other" "other" "other" "large"
[57] "other" "other" "other" "other" "other" "other" "other" "other"
[65] "other" "other" "other" "other" "other" "other" "large" "large"
[73] "other" "other" "other" "other" "other" "large" "large" "other"
[81] "other" "large" "other" "other" "other" "other" "other"

每种情况都按顺序进行检测,每个元素的第一个匹配值决定了其在输出向量中的相应值。如果没有匹配的情况,则使用 参数【.default】 作为最后的 "else "声明。

相关推荐
环能jvav大师1 天前
基于R语言的统计分析基础:使用dplyr包进行数据操作
大数据·开发语言·数据分析·r语言
环能jvav大师2 天前
基于R语言的统计分析基础:使用SQL语句操作数据集
开发语言·数据库·sql·数据分析·r语言·sqlite
一声沧海笑2 天前
dplyr、tidyverse和ggplot2初探
信息可视化·数据分析·r语言
waterHBO2 天前
R语言 基础笔记
开发语言·笔记·r语言
Red Red3 天前
GEO数据库提取疾病样本和正常样本|GEO数据库区分疾病和正常样本|直接用|生物信息|生信
开发语言·数据库·笔记·学习·r语言·c#·生物信息
邢博士谈科教4 天前
比传统机器学习更先进的深度学习神经网络的二分类建模全流程教程
数据挖掘·r语言·数据可视化
环能jvav大师6 天前
基于R语言的统计分析基础:使用键盘输入数据
开发语言·学习·数据分析·r语言·人机交互
Red Red6 天前
GEO数据的下载和处理|GEO数据转换为Gene symbol|GEO注释文件提取symbol|查看样本标签|查看GEO数据疾病或正常|生物信息基础
数据库·笔记·学习·r语言·生物信息·geo数据库
不是伍壹7 天前
【R语言】删除数据框中所有行中没有大于200的数值的行
开发语言·r语言
hongyanwin7 天前
商业预测 初识R
r语言·预测