数据分析:筛选多组交集特征

介绍

有时候需要在多个组间筛选它们的交集特征,本文利用R语言实现该目的

加载R包

R 复制代码
library(UpSetR)
library(tidyverse)

Upset画图

R 复制代码
movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
                   header = T, sep = ";")
movies_list <- list(
  Action = movies %>%
    dplyr::filter(Action == 1) %>%
    dplyr::pull(Name),
  Adventure = movies %>%
    dplyr::filter(Adventure == 1) %>%
    dplyr::pull(Name),
  Children = movies %>%
    dplyr::filter(Children == 1) %>%
    dplyr::pull(Name),
  Comedy = movies %>%
    dplyr::filter(Comedy == 1) %>%
    dplyr::pull(Name),
  Crime = movies %>%
    dplyr::filter(Crime == 1) %>%
    dplyr::pull(Name),
  Documentary = movies %>%
    dplyr::filter(Documentary == 1) %>%
    dplyr::pull(Name)  
)

movies_pl <- UpSetR::upset(
  data = fromList(movies_list),
  nsets = 3, 
  sets = c("Action", "Adventure", "Children", 
           "Comedy", "Crime", "Documentary"),
  order.by = "freq",
  main.bar.color = "gray10",
  sets.bar.color = "gray",
  matrix.color = "gray10",
  mainbar.y.label = "NO. of movies",
  sets.x.label = "NO. of movies")

movies_pl

判断交集特征

  • 去冗余变量 df_uniq_movie

  • 分组变量标签 df_group_movie

R 复制代码
df_uniq_movie <- data.frame(feature = unique(unlist(movies_list)))
df_group_movie <- lapply(movies_list, function(x){
  data.frame(feature = x)
}) %>% 
  dplyr::bind_rows(.id = "Sequence")
  • 给变量打上交集标签
R 复制代码
df_int_movie <- lapply(df_uniq_movie$feature, function(x){
  intersection <- df_group_movie %>% 
    dplyr::filter(feature == x) %>% 
    dplyr::arrange(Sequence) %>% 
    dplyr::pull(Sequence) %>% 
    paste0(collapse = "|")
  # build the dataframe
  return(data.frame(feature = x, int = intersection))
}) %>% 
  dplyr::bind_rows()

head(df_int_movie)
相关推荐
编程界一哥8 小时前
永劫无间vcruntime140.dll缺失修复教程:2026最新Win10/11安全解决指南
数据挖掘
编程界一哥9 小时前
d3dx9_43.dll缺失 赛博朋克2077启动报错解决方法:安全有效指南
数据挖掘
MediaTea12 小时前
Pandas 操作指南(二):数据选取与条件筛选
人工智能·python·机器学习·数据挖掘·pandas
jiang_changsheng14 小时前
亚马逊的(A9、COSMO)和视频推流(如ABR)点击推广算法
大数据·数据挖掘
观远数据14 小时前
未来3年企业数据分析的核心:基于自然语言的AI优先决策体系如何搭建
数据库·人工智能·数据分析
programhelp_14 小时前
IBM OA 高频真题分享|2026最新-Programhelp 独家整理
人工智能·机器学习·面试·职场和发展·数据分析
编程界一哥14 小时前
2026最新修复:赛博朋克2077 d3dx9_43.dll丢失的终极解决步骤
数据挖掘
编程界一哥15 小时前
永劫无间打不开闪退?vcruntime140.dll错误一键修复工具哪个好?2026对比
数据挖掘
hui-梦苑15 小时前
[GROMACS]模拟数据分析前轨迹文件生成-轨迹预处理
人工智能·算法·数据分析
赵钰老师1 天前
【ADCIRC】基于“python+”潮汐、风驱动循环、风暴潮等海洋水动力模拟实践技术应用
python·信息可视化·数据分析