55_Python字典与集合详解

Python字典与集合详解

文章目录

Python字典与集合详解
- 前言
- 一、字典（dict）基础
- - [1.1 创建字典](#1.1 创建字典)
  - [1.2 键（Key）的要求](#1.2 键（Key）的要求)
  - [1.3 访问与修改](#1.3 访问与修改)
- 二、字典的遍历
- - [2.1 三种基本遍历方式](#2.1 三种基本遍历方式)
  - [2.2 遍历时动态修改的注意事项](#2.2 遍历时动态修改的注意事项)
- 三、字典的常用操作
- - [3.1 增删操作](#3.1 增删操作)
  - [3.2 键、值、键值对的视图](#3.2 键、值、键值对的视图)
  - [3.3 字典的合并（Python 3.9+）](#3.3 字典的合并（Python 3.9+）)
- 四、集合（set）基础
- - [4.1 集合的特性](#4.1 集合的特性)
  - [4.2 集合的不可变性要求](#4.2 集合的不可变性要求)
- 五、集合的运算
- - [5.1 基本方法](#5.1 基本方法)
  - [5.2 数学集合运算](#5.2 数学集合运算)
  - [5.3 原地运算（修改原集合）](#5.3 原地运算（修改原集合）)
- 六、实战技巧与场景
- - [6.1 使用字典实现简单的缓存](#6.1 使用字典实现简单的缓存)
  - [6.2 使用集合进行高效去重](#6.2 使用集合进行高效去重)
  - [6.3 字典的嵌套结构处理](#6.3 字典的嵌套结构处理)
- 七、字典与集合的对比总结
- 八、综合实战：词频统计器
- 总结
- [✅ 亮点总结](#✅ 亮点总结)
- 适用场景
- 扩展方向

前言

如果说列表是"有序的编号容器"，那么字典（dict） 就是"智能的标签容器"------通过键（key）直接获取值（value），不需要遍历。而集合（set） 则是数学中"集合"概念的完美实现，擅长去重和交集/并集等运算。这两者都基于哈希表实现，因此查找操作的时间复杂度是O(1)------无论数据量多大，查找速度基本恒定。这也是为什么在涉及海量数据的"是否存在"判断时，字典和集合远比列表更高效。

很多从其他语言转过来的开发者习惯用列表嵌套循环做匹配查找，这在Python中是低效的反模式。本文不仅讲解字典和集合的基础操作，还会重点分析为什么键必须不可变、遍历字典时如何安全修改、集合运算在数据分析中的应用，以及字典在缓存、计数器等场景中的实战用法。掌握字典和集合，你的数据建模和算法能力将得到质的飞跃。

一、字典（dict）基础

1.1 创建字典

python 复制代码

# 字面量创建
person = {
    "name": "Alice",
    "age": 25,
    "city": "上海"
}

# dict()构造函数
person2 = dict(name="Bob", age=30, city="北京")

# 从可迭代的键值对序列创建
pairs = [("name", "Charlie"), ("age", 28)]
person3 = dict(pairs)

# 字典推导式
squares = {x: x**2 for x in range(1, 6)}
print(squares)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# fromkeys()：批量创建默认值
keys = ["a", "b", "c"]
default_dict = dict.fromkeys(keys, 0)
print(default_dict)  # {'a': 0, 'b': 0, 'c': 0}

# 空字典
empty = {}
empty2 = dict()

1.2 键（Key）的要求

字典的键必须是不可变类型 ：字符串、数字、元组，不能用列表、字典、集合：

python 复制代码

# 合法的键
d = {
    "name": "Alice",       # 字符串
    42: "答案",             # 整数
    3.14: "圆周率",         # 浮点数
    (1, 2): "坐标点",      # 元组
    True: "布尔键"          # 布尔值
}

# 非法的键（会报错）
# d2 = {[1, 2]: "列表键"}      # TypeError!
# d3 = {{1: 2}: "字典键"}      # TypeError!
# d4 = {{1, 2}: "集合键"}      # TypeError!

1.3 访问与修改

python 复制代码

user = {"name": "Alice", "age": 25}

# 通过键访问
print(user["name"])        # Alice
print(user["age"])         # 25
# print(user["email"])     # KeyError! 键不存在会报错

# get() 安全访问（推荐）
print(user.get("name"))         # Alice
print(user.get("email"))        # None（不存在返回None）
print(user.get("email", "无"))  # 无（可自定义默认值）

# 修改/添加
user["age"] = 26                # 修改已有键的值
user["email"] = "a@test.com"    # 添加新键值对
print(user)

# 批量更新
user.update({"age": 27, "phone": "13800138000"})
print(user)  # {'name': 'Alice', 'age': 27, 'email': 'a@test.com', 'phone': '13800138000'}

二、字典的遍历

2.1 三种基本遍历方式

python 复制代码

scores = {"语文": 88, "数学": 92, "英语": 79}

# 遍历键
for subject in scores:
    print(subject, end=" ")  # 语文 数学 英语

# 遍历值
for score in scores.values():
    print(score, end=" ")  # 88 92 79

# 同时遍历键和值（最常用）
for subject, score in scores.items():
    print(f"{subject}: {score}分")

2.2 遍历时动态修改的注意事项

python 复制代码

# 错误示范：遍历时修改字典大小会引发RuntimeError
# for k in scores:
#     if scores[k] < 80:
#         del scores[k]  # RuntimeError!

# 正确做法1：遍历副本
for k in list(scores.keys()):
    if scores[k] < 80:
        del scores[k]

# 正确做法2：字典推导式创建新字典
scores = {k: v for k, v in scores.items() if v >= 80}
print(scores)  # {'语文': 88, '数学': 92}

三、字典的常用操作

3.1 增删操作

python 复制代码

d = {"a": 1, "b": 2, "c": 3}

# setdefault()：键存在则返回，不存在则设置默认值
value = d.setdefault("d", 4)
print(value)  # 4
print(d)      # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# 键已存在时，setdefault不影响原值
value = d.setdefault("a", 100)
print(value)  # 1（不是100）
print(d)      # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# pop()：删除键并返回值（可设置默认值）
val = d.pop("b")
print(val)   # 2
val2 = d.pop("z", "不存在")
print(val2)  # 不存在

# popitem()：删除并返回最后一个键值对（Python 3.7+保证有序）
d["e"] = 5
last_item = d.popitem()
print(last_item)  # ('e', 5)

# del语句
del d["a"]
print(d)  # {'c': 3, 'd': 4}

3.2 键、值、键值对的视图

python 复制代码

d = {"x": 10, "y": 20, "z": 30}

keys = d.keys()
values = d.values()
items = d.items()

print(keys)    # dict_keys(['x', 'y', 'z'])
print(values)  # dict_values([10, 20, 30])
print(items)   # dict_items([('x', 10), ('y', 20), ('z', 30)])

# 这些视图是"活的"，会随字典变化
d["w"] = 40
print(keys)    # dict_keys(['x', 'y', 'z', 'w'])  ------ 自动更新

# 成员判断（默认检查键）
print("x" in d)      # True
print(10 in d)       # False（不检查值）

3.3 字典的合并（Python 3.9+）

python 复制代码

d1 = {"a": 1, "b": 2}
d2 = {"b": 3, "c": 4}

# | 运算符合并（后面覆盖前面）
merged = d1 | d2
print(merged)  # {'a': 1, 'b': 3, 'c': 4}

# |= 原地更新
d1 |= d2
print(d1)  # {'a': 1, 'b': 3, 'c': 4}

四、集合（set）基础

4.1 集合的特性

集合是无序、不重复的元素集合，基于哈希表实现：

python 复制代码

# 创建集合
s1 = {1, 2, 3, 4}
s2 = set([1, 2, 2, 3, 3, 3])  # 自动去重
print(s2)  # {1, 2, 3}

# 空集合必须用set()，{}创建的是空字典！
empty_set = set()
not_a_set = {}  # 这是空字典！
print(type(empty_set))  # <class 'set'>
print(type(not_a_set))  # <class 'dict'>

关键区分 ：{} 是空字典，{1, 2, 3} 是集合，{"key": "val"} 是字典。

4.2 集合的不可变性要求

与字典的键类似，集合的元素必须不可变：

python 复制代码

# 合法的元素
s = {1, 3.14, "hello", (1, 2), True}

# 非法的元素
# s2 = {[1, 2]}         # TypeError
# s3 = {{1: 2}}         # TypeError
# s4 = {{1, 2}}         # TypeError (集合自身可变)

需要使用不可变集合时，可以用 frozenset：

python 复制代码

fs = frozenset([1, 2, 3])
# fs.add(4)  # AttributeError! frozenset是不可变的

# frozenset可以作为集合元素或字典键
s = {frozenset([1, 2]), frozenset([3, 4])}
print(s)  # {frozenset({1, 2}), frozenset({3, 4})}

五、集合的运算

5.1 基本方法

python 复制代码

animals = {"猫", "狗", "兔"}

# 添加
animals.add("鱼")
print(animals)  # {'猫', '狗', '兔', '鱼'}

# 删除：remove()（不存在会报错）
animals.remove("狗")
# animals.remove("龙")  # KeyError!

# 删除：discard()（不存在不报错，更安全）
animals.discard("龙")   # 什么都没发生，不会报错

# 随机弹出
item = animals.pop()
print(item)

# 清空
animals.clear()

5.2 数学集合运算

集合的运算符与数学概念完美对应，这让Python在处理数据交集、并集等需求时非常直观：

python 复制代码

A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}

# 并集（Union）
print(A | B)              # {1, 2, 3, 4, 5, 6, 7, 8}
print(A.union(B))         # 同上

# 交集（Intersection）
print(A & B)              # {4, 5}
print(A.intersection(B))  # 同上

# 差集（Difference）
print(A - B)              # {1, 2, 3}
print(A.difference(B))    # 同上

# 对称差（互斥部分）
print(A ^ B)                       # {1, 2, 3, 6, 7, 8}
print(A.symmetric_difference(B))   # 同上

# 子集与超集判断
print(A <= A)            # True（子集）
print(A < A)             # False（真子集）
print({1, 2} <= A)       # True
print(A >= {1, 2})       # True（超集）

# 判断是否互斥
print(A.isdisjoint({10, 20}))  # True（无交集）
print(A.isdisjoint({4, 5}))    # False（有交集）

这些集合运算在数据分析中非常实用。例如，两个用户的好友列表求交集就是"共同好友"，求差集就是"A有但B没有的好友"。运算符版本（| & - ^）要求操作数都是set，而方法版本（union() intersection()等）可以接受任何可迭代对象（如list、tuple），更加灵活。在实际开发中，如果你只需要做"某个元素是否在一个大集合中"的判断，用set比用list快几个数量级。

5.3 原地运算（修改原集合）

python 复制代码

A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7}

A |= B     # A.update(B)，A变为并集
A &= B     # A.intersection_update(B)
A -= B     # A.difference_update(B)
A ^= B     # A.symmetric_difference_update(B)

六、实战技巧与场景

6.1 使用字典实现简单的缓存

python 复制代码

def fibonacci_with_cache():
    """使用字典缓存加速斐波那契计算"""
    cache = {0: 0, 1: 1}

    def fib(n):
        if n not in cache:
            cache[n] = fib(n - 1) + fib(n - 2)
        return cache[n]

    return fib

fib = fibonacci_with_cache()
print(fib(100))  # 354224848179261915075 ------ 瞬间算出

6.2 使用集合进行高效去重

python 复制代码

# 列表去重
items = [1, 2, 2, 3, 3, 3, 4, 5, 5]
unique_items = list(set(items))
print(unique_items)  # [1, 2, 3, 4, 5]

# 海量数据的快速查找（O(1)时间复杂度）
blacklist = set(["user_a", "user_b", "user_c"])

users = ["user_b", "user_x", "user_a"]
for user in users:
    if user in blacklist:
        print(f"{user} 在黑名单中")
    else:
        print(f"{user} 允许访问")

6.3 字典的嵌套结构处理

python 复制代码

# 复杂的嵌套字典
company = {
    "名称": "TechCorp",
    "部门": {
        "研发": {
            "前端": ["Alice", "Bob"],
            "后端": ["Charlie", "David"]
        },
        "市场": ["Eve", "Frank"]
    }
}

# 安全地访问嵌套值
def safe_get(data, *keys):
    """安全地获取嵌套字典的值"""
    for key in keys:
        if isinstance(data, dict):
            data = data.get(key, {})
        else:
            return None
    return data if data != {} else None

print(safe_get(company, "部门", "研发", "后端"))  # ['Charlie', 'David']
print(safe_get(company, "部门", "财务"))           # None（不存在）

七、字典与集合的对比总结

特性	字典（dict）	集合（set）
结构	键值对 `{k: v}`	单一元素 `{v}`
Python版本	3.7+保持插入顺序	无序（但类似有序）
键/元素要求	不可变类型	不可变类型
重复	键唯一，值可重复	元素不重复
查找复杂度	O(1)	O(1)
主要用途	映射、缓存、配置	去重、成员判断、集合运算

八、综合实战：词频统计器

python 复制代码

def word_frequency_analyzer(text):
    """统计文本中每个单词的出现频率"""

    # 清理文本并分词
    import re
    words = re.findall(r'\w+', text.lower())

    # 使用字典统计词频
    freq = {}
    for word in words:
        freq[word] = freq.get(word, 0) + 1

    # 另一种方式：使用字典推导式配合集合
    # freq = {w: words.count(w) for w in set(words)}

    # 按频率降序排列，取前10名
    top_words = sorted(freq.items(), key=lambda x: x[1], reverse=True)[:10]

    # 找出只出现一次的词（排除常见词）
    stop_words = {"the", "a", "an", "is", "are", "was", "were", "of", "to", "in", "and", "for", "on", "it", "that", "this"}
    rare_words = {word for word, count in freq.items() if count == 1}
    interesting_rare = rare_words - stop_words

    print("=== 词频统计报告 ===")
    print(f"总词汇数（去重后）：{len(freq)}")
    print(f"唯一出现词汇数：{len(rare_words)}")
    print(f"有意义的唯一词汇数：{len(interesting_rare)}")
    print("\nTOP 10高频词：")
    for word, count in top_words:
        bar = "█" * count
        print(f"  {word:<12} {count:>3} {bar}")

    return freq

# 测试
sample_text = """
Python is a powerful programming language. Python is easy to learn
and Python has a very clean syntax. Many developers love Python
because Python is versatile and Python has great libraries.
Python Python Python!
"""

word_frequency_analyzer(sample_text)

总结

本文深入讲解了字典和集合这两种基于哈希的核心数据结构：

字典（dict）：键值对映射，O(1)查找，创建方式多样（字面量/构造函数/推导式）
键的要求：必须是不可变类型（字符串、数字、元组）
安全访问 ：优先使用 get() 和 setdefault()，避免KeyError
字典合并 ：Python 3.9+支持 | 和 |= 运算符
集合（set）：无序、不重复，擅长去重和数学集合运算
集合运算 ：并集|、交集&、差集-、对称差^，方法名更显式
frozenset：不可变集合，可作为集合元素或字典键
O(1)查找：字典和集合都是基于哈希表，成员判断极快

字典和集合是Python数据处理的两把利剑。在下一篇文章中，我们将聚焦字符串处理技巧，看看Python如何用优美的方式处理文本数据。

✅ 亮点总结

O(1) 常数级查找：基于哈希表的字典和集合，成员判断与键检索速度极快，适合高频访问场景
安全访问模式 ：get() 和 setdefault() 优雅处理键不存在的情况，告别 KeyError
字典合并运算符 ：Python 3.9+ 的 | 和 |= 语法，多个字典合并一行搞定
集合数学运算 ：并集 |、交集 &、差集 -、对称差 ^，完美对应数学集合概念
frozenset 不可变集合：可作为集合元素或字典键，适合需要不可变特性的场景

适用场景

缓存系统（Memoization）：用字典存储已计算结果，避免重复计算
数据去重与分析：集合自动去重 + 交集/差集实现共同好友、数据差异分析
配置项管理：字典天然适合存储键值对形式的配置参数

扩展方向

学习 collections 模块的 OrderedDict、ChainMap、Counter 等高级字典变体
掌握 JSON 数据的序列化与反序列化（json 模块），字典与JSON结构自然对应
推荐继续阅读下一篇：Python字符串处理技巧，解锁文本处理的十八般武艺