文章目录
- 前言
- 一、集合的创建与特性
-
- [1. 集合的创建方式](#1. 集合的创建方式)
- [2. 集合的基本特性](#2. 集合的基本特性)
- [3. 集合的方法分类](#3. 集合的方法分类)
- 二、集合运算(交集、并集、差集)
-
- [1. 基本集合运算](#1. 基本集合运算)
- [2. 原地修改运算(直接修改原集合)](#2. 原地修改运算(直接修改原集合))
- [3. 集合关系判断](#3. 集合关系判断)
- 三、集合的应用场景
-
- [1. 数据去重(最常用)](#1. 数据去重(最常用))
- [2. 成员检查(速度极快)](#2. 成员检查(速度极快))
- [3. 数据对比与分析](#3. 数据对比与分析)
- [4. 数学运算与逻辑判断](#4. 数学运算与逻辑判断)
- [5. 网络与关系分析](#5. 网络与关系分析)
前言
本文主要介绍了集合的创建与特性、几何运算(交集、并集、差集)和集合的应用等知识点。
一、集合的创建与特性
1. 集合的创建方式
python
python
# 1. 花括号创建(不能用于空集合)
empty_set = set() # 空集合(必须用set())
fruits = {"apple", "banana", "orange"} # 字符串集合
numbers = {1, 2, 3, 4, 5} # 数字集合
# 2. set()函数创建
set_from_list = set([1, 2, 2, 3, 3, 3]) # {1, 2, 3}(自动去重)
set_from_string = set("hello") # {'h', 'e', 'l', 'o'}(无序)
set_from_tuple = set((1, 2, 3, 2, 1)) # {1, 2, 3}
# 3. 集合推导式
squares = {x**2 for x in range(5)} # {0, 1, 4, 9, 16}
even_numbers = {x for x in range(10) if x % 2 == 0} # {0, 2, 4, 6, 8}
# 4. 创建不可变集合(frozenset)
immutable_set = frozenset([1, 2, 3, 4, 5]) # 不可变集合,可用作字典键
2. 集合的基本特性
python
python
# 1. 无序性:元素没有固定顺序
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s) # 输出顺序可能每次不同,如 {1, 2, 3, 4, 5, 6, 9}
# 2. 元素唯一性:自动去重
s = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4}
print(s) # {1, 2, 3, 4}
# 3. 不可哈希的元素不能作为集合元素
# valid_set = {[1, 2], [3, 4]} # TypeError: 列表不可哈希
valid_set = {(1, 2), (3, 4)} # 元组可哈希,可以放入集合
# 4. 集合是可变的,但元素必须是不可变的
s = {1, 2, 3}
s.add(4) # 可以添加元素
# s.add([5, 6]) # 不能添加列表(可变)
# 5. 集合的大小和成员检查
s = {1, 2, 3, 4, 5}
print(len(s)) # 5
print(3 in s) # True
print(6 not in s) # True
3. 集合的方法分类
| 方法类型 | 方法名 | 说明 |
|---|---|---|
| 添加 | add() | 添加单个元素 |
| update() | 添加多个元素 | |
| 删除 | remove() | 删除指定元素(不存在则报错) |
| discard() | 删除指定元素(不存在不报错) | |
| pop() | 随机删除并返回一个元素 | |
| clear() | 清空集合 | |
| 集合运算 | union() | 并集 |
| intersection() | 交集 | |
| difference() | 差集 | |
| symmetric_difference() | 对称差集 | |
| 关系判断 | issubset() | 子集检查 |
| issuperset() | 超集检查 | |
| isdisjoint() | 是否无交集 |
二、集合运算(交集、并集、差集)
1. 基本集合运算
python
python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
# 1. 并集(Union):包含所有元素
print(A | B) # {1, 2, 3, 4, 5, 6, 7, 8}
print(A.union(B)) # 同上
print(A.union(B, {9, 10})) # 多个集合的并集
# 2. 交集(Intersection):共有元素
print(A & B) # {4, 5}
print(A.intersection(B)) # 同上
print(A.intersection(B, {4, 5, 9})) # {4, 5}
# 3. 差集(Difference):A有但B没有
print(A - B) # {1, 2, 3}
print(A.difference(B)) # 同上
print(B - A) # {6, 7, 8}(B有但A没有)
# 4. 对称差集(Symmetric Difference):非共有元素
print(A ^ B) # {1, 2, 3, 6, 7, 8}
print(A.symmetric_difference(B)) # 同上
2. 原地修改运算(直接修改原集合)
python
python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
# 1. 并集并赋值
A |= B # 等价于 A = A | B
print(A) # {1, 2, 3, 4, 5, 6, 7, 8}
# 重置A
A = {1, 2, 3, 4, 5}
# 2. 交集并赋值
A &= B # 等价于 A = A & B
print(A) # {4, 5}
# 重置A
A = {1, 2, 3, 4, 5}
# 3. 差集并赋值
A -= B # 等价于 A = A - B
print(A) # {1, 2, 3}
# 重置A
A = {1, 2, 3, 4, 5}
# 4. 对称差集并赋值
A ^= B # 等价于 A = A ^ B
print(A) # {1, 2, 3, 6, 7, 8}
3. 集合关系判断
python
python
A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
C = {4, 5, 6}
D = {1, 2}
# 1. 子集(Subset)
print(D <= A) # True(D是A的子集)
print(D.issubset(A)) # True
print(D < A) # True(D是A的真子集)
print(A <= A) # True(自己是自己的子集)
print(A < A) # False(不是真子集)
# 2. 超集(Superset)
print(B >= A) # True(B是A的超集)
print(B.issuperset(A)) # True
print(B > A) # True(B是A的真超集)
# 3. 不相交(Disjoint)
print(A.isdisjoint(C)) # True(A和C没有共同元素)
print(A.isdisjoint(B)) # False(A和B有共同元素)
三、集合的应用场景
1. 数据去重(最常用)
python
python
# 去除列表中的重复元素
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_numbers = list(set(numbers)) # [1, 2, 3, 4]
print(f"去重前: {len(numbers)}个元素") # 10
print(f"去重后: {len(unique_numbers)}个元素") # 4
# 统计文章中的唯一单词
text = "apple banana apple orange banana apple mango"
words = text.split()
unique_words = set(words)
print(f"总单词数: {len(words)}") # 7
print(f"唯一单词数: {len(unique_words)}") # 4
print(f"唯一单词: {unique_words}") # {'banana', 'apple', 'orange', 'mango'}
2. 成员检查(速度极快)
python
python
# 集合的成员检查时间复杂度为O(1),比列表的O(n)快得多
big_list = list(range(1000000)) # 100万个元素的列表
big_set = set(big_list) # 转换为集合
import time
# 列表查找
start = time.time()
result = 999999 in big_list
end = time.time()
print(f"列表查找时间: {end - start:.6f}秒") # 约0.005秒
# 集合查找
start = time.time()
result = 999999 in big_set
end = time.time()
print(f"集合查找时间: {end - start:.6f}秒") # 约0.000001秒(快5000倍!)
# 实际应用:屏蔽词过滤
blocked_words = {"spam", "advertisement", "virus", "hack", "fraud"}
user_message = "this is not a spam message"
# 快速检查是否包含屏蔽词
if any(word in user_message for word in blocked_words):
print("消息包含屏蔽词")
else:
print("消息正常")
3. 数据对比与分析
python
python
# 1. 网站用户分析
yesterday_users = {"alice", "bob", "charlie", "david"}
today_users = {"bob", "charlie", "eve", "frank"}
# 连续访问用户
retained_users = yesterday_users & today_users
print(f"连续访问用户: {retained_users}") # {'bob', 'charlie'}
# 新增用户
new_users = today_users - yesterday_users
print(f"新增用户: {new_users}") # {'eve', 'frank'}
# 流失用户
lost_users = yesterday_users - today_users
print(f"流失用户: {lost_users}") # {'alice', 'david'}
# 总访问用户
total_users = yesterday_users | today_users
print(f"总访问用户数: {len(total_users)}") # 6
# 2. 购物篮分析
customer1_cart = {"apple", "banana", "milk", "bread"}
customer2_cart = {"banana", "orange", "milk", "eggs"}
customer3_cart = {"apple", "bread", "cheese", "eggs"}
# 所有顾客购买的商品
all_products = customer1_cart | customer2_cart | customer3_cart
print(f"所有商品: {all_products}")
# 受欢迎的商品(被多个顾客购买)
popular_products = (customer1_cart & customer2_cart) | \
(customer1_cart & customer3_cart) | \
(customer2_cart & customer3_cart)
print(f"受欢迎商品: {popular_products}") # {'banana', 'milk', 'apple', 'bread', 'eggs'}
# 3. 课程选修分析
math_students = {"alice", "bob", "charlie", "david"}
physics_students = {"bob", "charlie", "eve", "frank"}
chemistry_students = {"alice", "charlie", "frank", "grace"}
# 只选一门课的学生
only_math = math_students - physics_students - chemistry_students
only_physics = physics_students - math_students - chemistry_students
only_chemistry = chemistry_students - math_students - physics_students
print(f"只选数学: {only_math}") # {'david'}
print(f"只选物理: {only_physics}") # {'eve'}
print(f"只选化学: {only_chemistry}") # {'grace'}
# 选了三门课的学生
all_three = math_students & physics_students & chemistry_students
print(f"三门都选: {all_three}") # {'charlie'}
4. 数学运算与逻辑判断
python
python
# 1. 素数筛选(埃拉托斯特尼筛法)
def find_primes(limit):
primes = set()
numbers = set(range(2, limit + 1))
while numbers:
prime = min(numbers)
primes.add(prime)
multiples = set(range(prime, limit + 1, prime))
numbers -= multiples
return sorted(primes)
print(f"100以内的素数: {find_primes(100)[:20]}...") # 前20个素数
# 2. 寻找缺失的数字
def find_missing_numbers(full_set, partial_set):
return sorted(full_set - partial_set)
# 假设完整编号是1-100,现有编号缺少一些
all_ids = set(range(1, 101))
existing_ids = set([1, 2, 3, 5, 6, 8, 9, 10, 15, 20, 50, 99, 100])
missing_ids = find_missing_numbers(all_ids, existing_ids)
print(f"缺失的编号数量: {len(missing_ids)}")
print(f"前10个缺失编号: {missing_ids[:10]}")
# 3. 寻找共同兴趣
alice_interests = {"reading", "hiking", "cooking", "photography"}
bob_interests = {"gaming", "hiking", "photography", "cycling"}
charlie_interests = {"reading", "cooking", "swimming", "photography"}
# 寻找所有人的共同兴趣
common_interests = alice_interests & bob_interests & charlie_interests
print(f"共同兴趣: {common_interests}") # {'photography'}
# 寻找任意两人都有的兴趣
pairwise_common = (alice_interests & bob_interests) | \
(alice_interests & charlie_interests) | \
(bob_interests & charlie_interests)
print(f"任意两人都有的兴趣: {pairwise_common}") # {'hiking', 'reading', 'cooking', 'photography'}
5. 网络与关系分析
python
python
# 社交网络好友分析
alice_friends = {"bob", "charlie", "david", "eve"}
bob_friends = {"alice", "charlie", "frank", "grace"}
charlie_friends = {"alice", "bob", "david", "frank"}
# 共同好友
alice_bob_common = alice_friends & bob_friends
print(f"Alice和Bob的共同好友: {alice_bob_common}") # {'charlie'}
# 好友推荐(朋友的朋友但不是自己的朋友)
def recommend_friends(my_friends, friends_friends):
return friends_friends - my_friends - {"me"} # 排除自己和已经是好友的人
# 给Alice推荐好友
alice_recommendations = set()
for friend in alice_friends:
if friend == "bob":
alice_recommendations |= recommend_friends(alice_friends, bob_friends)
elif friend == "charlie":
alice_recommendations |= recommend_friends(alice_friends, charlie_friends)
print(f"给Alice推荐的好友: {alice_recommendations}") # {'frank', 'grace'}
# 封闭社群检测(互相都是好友)
def find_cliques(friends_dict):
cliques = []
people = list(friends_dict.keys())
for i in range(len(people)):
for j in range(i + 1, len(people)):
for k in range(j + 1, len(people)):
p1, p2, p3 = people[i], people[j], people[k]
if (p2 in friends_dict[p1] and p3 in friends_dict[p1] and
p1 in friends_dict[p2] and p3 in friends_dict[p2] and
p1 in friends_dict[p3] and p2 in friends_dict[p3]):
cliques.append({p1, p2, p3})
return cliques
friends_dict = {
"alice": alice_friends,
"bob": bob_friends,
"charlie": charlie_friends,
"david": {"alice", "charlie"},
"eve": {"alice"},
"frank": {"bob", "charlie"},
"grace": {"bob"}
}
print(f"三人小团体: {find_cliques(friends_dict)}") # [{'alice', 'bob', 'charlie'}]