4.2 集合（Set）

文章目录

前言
一、集合的创建与特性
- [1. 集合的创建方式](#1. 集合的创建方式)
- [2. 集合的基本特性](#2. 集合的基本特性)
- [3. 集合的方法分类](#3. 集合的方法分类)
二、集合运算（交集、并集、差集）
- [1. 基本集合运算](#1. 基本集合运算)
- [2. 原地修改运算（直接修改原集合）](#2. 原地修改运算（直接修改原集合）)
- [3. 集合关系判断](#3. 集合关系判断)
三、集合的应用场景
- [1. 数据去重（最常用）](#1. 数据去重（最常用）)
- [2. 成员检查（速度极快）](#2. 成员检查（速度极快）)
- [3. 数据对比与分析](#3. 数据对比与分析)
- [4. 数学运算与逻辑判断](#4. 数学运算与逻辑判断)
- [5. 网络与关系分析](#5. 网络与关系分析)

前言

本文主要介绍了集合的创建与特性、几何运算（交集、并集、差集）和集合的应用等知识点。

一、集合的创建与特性

1. 集合的创建方式

python 复制代码

python
# 1. 花括号创建（不能用于空集合）
empty_set = set()          # 空集合（必须用set()）
fruits = {"apple", "banana", "orange"}  # 字符串集合
numbers = {1, 2, 3, 4, 5}               # 数字集合

# 2. set()函数创建
set_from_list = set([1, 2, 2, 3, 3, 3])  # {1, 2, 3}（自动去重）
set_from_string = set("hello")           # {'h', 'e', 'l', 'o'}（无序）
set_from_tuple = set((1, 2, 3, 2, 1))    # {1, 2, 3}

# 3. 集合推导式
squares = {x**2 for x in range(5)}       # {0, 1, 4, 9, 16}
even_numbers = {x for x in range(10) if x % 2 == 0}  # {0, 2, 4, 6, 8}

# 4. 创建不可变集合（frozenset）
immutable_set = frozenset([1, 2, 3, 4, 5])  # 不可变集合，可用作字典键

2. 集合的基本特性

python 复制代码

python
# 1. 无序性：元素没有固定顺序
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s)  # 输出顺序可能每次不同，如 {1, 2, 3, 4, 5, 6, 9}

# 2. 元素唯一性：自动去重
s = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4}
print(s)  # {1, 2, 3, 4}

# 3. 不可哈希的元素不能作为集合元素
# valid_set = {[1, 2], [3, 4]}  # TypeError: 列表不可哈希
valid_set = {(1, 2), (3, 4)}    # 元组可哈希，可以放入集合

# 4. 集合是可变的，但元素必须是不可变的
s = {1, 2, 3}
s.add(4)          # 可以添加元素
# s.add([5, 6])   # 不能添加列表（可变）

# 5. 集合的大小和成员检查
s = {1, 2, 3, 4, 5}
print(len(s))     # 5
print(3 in s)     # True
print(6 not in s) # True

3. 集合的方法分类

方法类型	方法名	说明
添加	add()	添加单个元素
	update()	添加多个元素
删除	remove()	删除指定元素（不存在则报错）
	discard()	删除指定元素（不存在不报错）
	pop()	随机删除并返回一个元素
	clear()	清空集合
集合运算	union()	并集
	intersection()	交集
	difference()	差集
	symmetric_difference()	对称差集
关系判断	issubset()	子集检查
	issuperset()	超集检查
	isdisjoint()	是否无交集

二、集合运算（交集、并集、差集）

1. 基本集合运算

python 复制代码

python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

# 1. 并集（Union）：包含所有元素
print(A | B)                      # {1, 2, 3, 4, 5, 6, 7, 8}
print(A.union(B))                 # 同上
print(A.union(B, {9, 10}))       # 多个集合的并集

# 2. 交集（Intersection）：共有元素
print(A & B)                      # {4, 5}
print(A.intersection(B))          # 同上
print(A.intersection(B, {4, 5, 9}))  # {4, 5}

# 3. 差集（Difference）：A有但B没有
print(A - B)                      # {1, 2, 3}
print(A.difference(B))            # 同上
print(B - A)                      # {6, 7, 8}（B有但A没有）

# 4. 对称差集（Symmetric Difference）：非共有元素
print(A ^ B)                      # {1, 2, 3, 6, 7, 8}
print(A.symmetric_difference(B))  # 同上

2. 原地修改运算（直接修改原集合）

python 复制代码

python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

# 1. 并集并赋值
A |= B  # 等价于 A = A | B
print(A)  # {1, 2, 3, 4, 5, 6, 7, 8}

# 重置A
A = {1, 2, 3, 4, 5}

# 2. 交集并赋值
A &= B  # 等价于 A = A & B
print(A)  # {4, 5}

# 重置A
A = {1, 2, 3, 4, 5}

# 3. 差集并赋值
A -= B  # 等价于 A = A - B
print(A)  # {1, 2, 3}

# 重置A
A = {1, 2, 3, 4, 5}

# 4. 对称差集并赋值
A ^= B  # 等价于 A = A ^ B
print(A)  # {1, 2, 3, 6, 7, 8}

3. 集合关系判断

python 复制代码

python
A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
C = {4, 5, 6}
D = {1, 2}

# 1. 子集（Subset）
print(D <= A)            # True（D是A的子集）
print(D.issubset(A))     # True
print(D < A)             # True（D是A的真子集）
print(A <= A)            # True（自己是自己的子集）
print(A < A)             # False（不是真子集）

# 2. 超集（Superset）
print(B >= A)            # True（B是A的超集）
print(B.issuperset(A))   # True
print(B > A)             # True（B是A的真超集）

# 3. 不相交（Disjoint）
print(A.isdisjoint(C))   # True（A和C没有共同元素）
print(A.isdisjoint(B))   # False（A和B有共同元素）

三、集合的应用场景

1. 数据去重（最常用）

python 复制代码

python
# 去除列表中的重复元素
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_numbers = list(set(numbers))  # [1, 2, 3, 4]
print(f"去重前: {len(numbers)}个元素")  # 10
print(f"去重后: {len(unique_numbers)}个元素")  # 4

# 统计文章中的唯一单词
text = "apple banana apple orange banana apple mango"
words = text.split()
unique_words = set(words)
print(f"总单词数: {len(words)}")       # 7
print(f"唯一单词数: {len(unique_words)}")  # 4
print(f"唯一单词: {unique_words}")     # {'banana', 'apple', 'orange', 'mango'}

2. 成员检查（速度极快）

python 复制代码

python
# 集合的成员检查时间复杂度为O(1)，比列表的O(n)快得多
big_list = list(range(1000000))     # 100万个元素的列表
big_set = set(big_list)             # 转换为集合

import time

# 列表查找
start = time.time()
result = 999999 in big_list
end = time.time()
print(f"列表查找时间: {end - start:.6f}秒")  # 约0.005秒

# 集合查找
start = time.time()
result = 999999 in big_set
end = time.time()
print(f"集合查找时间: {end - start:.6f}秒")  # 约0.000001秒（快5000倍！）

# 实际应用：屏蔽词过滤
blocked_words = {"spam", "advertisement", "virus", "hack", "fraud"}
user_message = "this is not a spam message"

# 快速检查是否包含屏蔽词
if any(word in user_message for word in blocked_words):
    print("消息包含屏蔽词")
else:
    print("消息正常")

3. 数据对比与分析

python 复制代码

python
# 1. 网站用户分析
yesterday_users = {"alice", "bob", "charlie", "david"}
today_users = {"bob", "charlie", "eve", "frank"}

# 连续访问用户
retained_users = yesterday_users & today_users
print(f"连续访问用户: {retained_users}")  # {'bob', 'charlie'}

# 新增用户
new_users = today_users - yesterday_users
print(f"新增用户: {new_users}")  # {'eve', 'frank'}

# 流失用户
lost_users = yesterday_users - today_users
print(f"流失用户: {lost_users}")  # {'alice', 'david'}

# 总访问用户
total_users = yesterday_users | today_users
print(f"总访问用户数: {len(total_users)}")  # 6

# 2. 购物篮分析
customer1_cart = {"apple", "banana", "milk", "bread"}
customer2_cart = {"banana", "orange", "milk", "eggs"}
customer3_cart = {"apple", "bread", "cheese", "eggs"}

# 所有顾客购买的商品
all_products = customer1_cart | customer2_cart | customer3_cart
print(f"所有商品: {all_products}")

# 受欢迎的商品（被多个顾客购买）
popular_products = (customer1_cart & customer2_cart) | \
                   (customer1_cart & customer3_cart) | \
                   (customer2_cart & customer3_cart)
print(f"受欢迎商品: {popular_products}")  # {'banana', 'milk', 'apple', 'bread', 'eggs'}

# 3. 课程选修分析
math_students = {"alice", "bob", "charlie", "david"}
physics_students = {"bob", "charlie", "eve", "frank"}
chemistry_students = {"alice", "charlie", "frank", "grace"}

# 只选一门课的学生
only_math = math_students - physics_students - chemistry_students
only_physics = physics_students - math_students - chemistry_students
only_chemistry = chemistry_students - math_students - physics_students
print(f"只选数学: {only_math}")  # {'david'}
print(f"只选物理: {only_physics}")  # {'eve'}
print(f"只选化学: {only_chemistry}")  # {'grace'}

# 选了三门课的学生
all_three = math_students & physics_students & chemistry_students
print(f"三门都选: {all_three}")  # {'charlie'}

4. 数学运算与逻辑判断

python 复制代码

python
# 1. 素数筛选（埃拉托斯特尼筛法）
def find_primes(limit):
    primes = set()
    numbers = set(range(2, limit + 1))
    
    while numbers:
        prime = min(numbers)
        primes.add(prime)
        multiples = set(range(prime, limit + 1, prime))
        numbers -= multiples
    
    return sorted(primes)

print(f"100以内的素数: {find_primes(100)[:20]}...")  # 前20个素数

# 2. 寻找缺失的数字
def find_missing_numbers(full_set, partial_set):
    return sorted(full_set - partial_set)

# 假设完整编号是1-100，现有编号缺少一些
all_ids = set(range(1, 101))
existing_ids = set([1, 2, 3, 5, 6, 8, 9, 10, 15, 20, 50, 99, 100])
missing_ids = find_missing_numbers(all_ids, existing_ids)
print(f"缺失的编号数量: {len(missing_ids)}")
print(f"前10个缺失编号: {missing_ids[:10]}")

# 3. 寻找共同兴趣
alice_interests = {"reading", "hiking", "cooking", "photography"}
bob_interests = {"gaming", "hiking", "photography", "cycling"}
charlie_interests = {"reading", "cooking", "swimming", "photography"}

# 寻找所有人的共同兴趣
common_interests = alice_interests & bob_interests & charlie_interests
print(f"共同兴趣: {common_interests}")  # {'photography'}

# 寻找任意两人都有的兴趣
pairwise_common = (alice_interests & bob_interests) | \
                  (alice_interests & charlie_interests) | \
                  (bob_interests & charlie_interests)
print(f"任意两人都有的兴趣: {pairwise_common}")  # {'hiking', 'reading', 'cooking', 'photography'}

5. 网络与关系分析

python 复制代码

python
# 社交网络好友分析
alice_friends = {"bob", "charlie", "david", "eve"}
bob_friends = {"alice", "charlie", "frank", "grace"}
charlie_friends = {"alice", "bob", "david", "frank"}

# 共同好友
alice_bob_common = alice_friends & bob_friends
print(f"Alice和Bob的共同好友: {alice_bob_common}")  # {'charlie'}

# 好友推荐（朋友的朋友但不是自己的朋友）
def recommend_friends(my_friends, friends_friends):
    return friends_friends - my_friends - {"me"}  # 排除自己和已经是好友的人

# 给Alice推荐好友
alice_recommendations = set()
for friend in alice_friends:
    if friend == "bob":
        alice_recommendations |= recommend_friends(alice_friends, bob_friends)
    elif friend == "charlie":
        alice_recommendations |= recommend_friends(alice_friends, charlie_friends)

print(f"给Alice推荐的好友: {alice_recommendations}")  # {'frank', 'grace'}

# 封闭社群检测（互相都是好友）
def find_cliques(friends_dict):
    cliques = []
    people = list(friends_dict.keys())
    
    for i in range(len(people)):
        for j in range(i + 1, len(people)):
            for k in range(j + 1, len(people)):
                p1, p2, p3 = people[i], people[j], people[k]
                if (p2 in friends_dict[p1] and p3 in friends_dict[p1] and
                    p1 in friends_dict[p2] and p3 in friends_dict[p2] and
                    p1 in friends_dict[p3] and p2 in friends_dict[p3]):
                    cliques.append({p1, p2, p3})
    
    return cliques

friends_dict = {
    "alice": alice_friends,
    "bob": bob_friends,
    "charlie": charlie_friends,
    "david": {"alice", "charlie"},
    "eve": {"alice"},
    "frank": {"bob", "charlie"},
    "grace": {"bob"}
}

print(f"三人小团体: {find_cliques(friends_dict)}")  # [{'alice', 'bob', 'charlie'}]