01-编程基础与数学基石： Python核心数据结构完全指南

Python核心数据结构完全指南

从列表到集合，掌握数据存储与操作的基石

一、为什么需要数据结构？

程序 = 算法 + 数据结构。数据结构决定了数据如何组织、存储和访问。选择合适的数据结构，能让代码更简洁、运行更快。

生活类比：

列表 = 购物清单（有序，可增删改）
元组 = 身份证信息（固定不变，不可修改）
字典 = 通讯录（通过姓名快速找到电话）
集合 = 文具盒里的笔（不重复，无序）

二、列表（List）：有序的可变序列

2.1 列表的本质

列表是一个有序、可变 的容器，可以存放任意类型的元素。底层实现是动态数组：

复制代码

内存中的列表：
┌────┬────┬────┬────┬────┬────┐
│ 10 │ 20 │ 30 │ 40 │    │    │  ← 预分配空间
└────┴────┴────┴────┴────┴────┘
  ↑                             ↑
  │                             │
 存储的元素              预留空间（便于追加）

当列表空间不足时，系统会自动分配更大的内存空间（通常为原大小的1.25倍），复制原数据过去。

2.2 创建与基本操作

python 复制代码

# 创建列表
empty_list = []                    # 空列表
numbers = [1, 2, 3, 4, 5]         # 整数列表
mixed = [1, "hello", 3.14, True]   # 混合类型
nested = [[1, 2], [3, 4]]          # 嵌套列表

# 使用 list() 构造函数
chars = list("hello")              # ['h', 'e', 'l', 'l', 'o']
range_list = list(range(5))        # [0, 1, 2, 3, 4]

# 访问元素（索引从0开始）
fruits = ["苹果", "香蕉", "橙子", "葡萄"]
print(fruits[0])     # 苹果
print(fruits[-1])    # 葡萄（负数从末尾开始）

# 切片操作 [start:stop:step]
print(fruits[1:3])   # ['香蕉', '橙子']
print(fruits[::2])   # ['苹果', '橙子']
print(fruits[::-1])  # ['葡萄', '橙子', '香蕉', '苹果']（反转）

2.3 常用方法

python 复制代码

# 添加元素
fruits = ["苹果", "香蕉"]
fruits.append("橙子")          # 末尾添加：['苹果', '香蕉', '橙子']
fruits.insert(1, "芒果")       # 指定位置：['苹果', '芒果', '香蕉', '橙子']
fruits.extend(["葡萄", "西瓜"]) # 扩展列表：['苹果', '芒果', '香蕉', '橙子', '葡萄', '西瓜']

# 删除元素
fruits.remove("香蕉")           # 删除指定值（只删第一个）
popped = fruits.pop()          # 删除并返回末尾元素
popped = fruits.pop(1)         # 删除并返回索引1的元素
del fruits[0]                  # 删除索引0的元素
fruits.clear()                 # 清空列表

# 查询与统计
fruits = ["苹果", "香蕉", "苹果", "橙子"]
print(fruits.index("香蕉"))    # 1（返回第一个匹配的索引）
print(fruits.count("苹果"))    # 2
print(len(fruits))             # 4

# 排序
numbers = [3, 1, 4, 1, 5, 9]
numbers.sort()                 # 原地排序：[1, 1, 3, 4, 5, 9]
numbers.sort(reverse=True)     # 降序：[9, 5, 4, 3, 1, 1]
sorted_numbers = sorted(numbers)  # 返回新列表，原列表不变
numbers.reverse()              # 反转列表

2.5 列表作为栈和队列

python 复制代码

# 栈（LIFO：后进先出）- 用 append/pop
stack = []
stack.append(1)   # 压栈
stack.append(2)
stack.append(3)
print(stack.pop())  # 3（后进先出）
print(stack.pop())  # 2

# 队列（FIFO：先进先出）- 用 collections.deque
from collections import deque
queue = deque([1, 2, 3])
queue.append(4)        # 入队
print(queue.popleft()) # 1（先进先出）
print(queue.popleft()) # 2

2.6 列表常见操作的时间复杂度

操作	时间复杂度	说明
`append()`	O(1)	均摊常数时间
`pop()`	O(1)	删除末尾元素
`pop(i)`	O(n)	删除中间元素，需要移动后续元素
`insert(i, item)`	O(n)	插入需要移动元素
`index(item)`	O(n)	需要遍历查找
`in` 操作符	O(n)	需要遍历
切片 `[i:j]`	O(k)	k为切片长度
排序 `sort()`	O(n log n)	Timsort算法

三、元组（Tuple）：不可变的序列

3.1 元组的本质

元组是有序、不可变 的容器。一旦创建，不能修改。由于不可变，元组可以作为字典的键，而列表不行。

python 复制代码

# 创建元组
empty_tuple = ()
single = (1,)                    # 注意逗号，否则是整数
colors = ("红", "绿", "蓝")
mixed = (1, "hello", 3.14)

# 不用括号也能创建
point = 10, 20                   # (10, 20)

# 类型转换
tuple_from_list = tuple([1, 2, 3])  # (1, 2, 3)
tuple_from_string = tuple("abc")    # ('a', 'b', 'c')

# 访问（与列表相同）
print(colors[0])     # 红
print(colors[-1])    # 蓝
print(colors[1:3])   # ('绿', '蓝')

3.2 元组的不可变性

python 复制代码

t = (1, 2, [3, 4])   # 元组包含一个可变列表
# t[0] = 10          # ❌ TypeError: 不能修改元组元素
t[2].append(5)       # ✅ 可以修改元组内的可变对象
print(t)             # (1, 2, [3, 4, 5])

# 元组解包（Python特性）
point = (10, 20)
x, y = point          # 解包
print(x, y)           # 10 20

# 交换变量（经典应用）
a, b = 5, 10
a, b = b, a           # 一行交换
print(a, b)           # 10 5

# 函数返回多个值
def get_min_max(numbers):
    return min(numbers), max(numbers)

min_val, max_val = get_min_max([3, 1, 4, 1, 5])
print(min_val, max_val)  # 1 5

3.3 列表 vs 元组：如何选择？

场景	推荐	原因
需要修改数据	列表	可变
数据固定不变	元组	更安全，性能更好
作为字典的键	元组	不可变才能哈希
函数返回多个值	元组	Python惯例
配置参数	元组	防止意外修改

四、字典（Dict）：键值对的映射

4.1 字典的本质

字典是键值对的集合，通过哈希表实现 O(1) 的平均查找时间。

复制代码

哈希表原理：
键 → 哈希函数 → 哈希值 → 数组索引 → 值

"name" ──► hash() ──► 12345 ──► 索引5 ──► "张三"
"age"  ──► hash() ──► 67890 ──► 索引2 ──► 18

4.2 创建字典

python 复制代码

# 多种创建方式
person = {"name": "张三", "age": 18, "city": "北京"}

# 使用 dict() 构造函数
person2 = dict(name="李四", age=20)      # {'name': '李四', 'age': 20}
person3 = dict([("name", "王五"), ("age", 22)])  # 从键值对列表

# 字典推导式
squares = {x: x**2 for x in range(5)}
print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# 使用 fromkeys 创建默认字典
keys = ["a", "b", "c"]
default_dict = dict.fromkeys(keys, 0)  # {'a': 0, 'b': 0, 'c': 0}

4.3 ⭐ 核心方法：get()（重点）

get() 是字典最常用的方法之一，用于安全地获取值，避免 KeyError。

python 复制代码

person = {"name": "张三", "age": 18}

# 普通访问：键不存在会报错
# print(person["city"])  # ❌ KeyError: 'city'

# get() 方法：键不存在返回 None（或指定默认值）
print(person.get("city"))        # None
print(person.get("city", "未知")) # 未知
print(person.get("name"))        # 张三

# 实际应用：计数统计
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
count = {}

for word in words:
    # 安全地获取当前计数，不存在则从0开始
    count[word] = count.get(word, 0) + 1

print(count)  # {'apple': 3, 'banana': 2, 'orange': 1}

# 更优雅的写法（使用 defaultdict）
from collections import defaultdict
count = defaultdict(int)
for word in words:
    count[word] += 1

4.4 字典的常用操作

python 复制代码

person = {"name": "张三", "age": 18, "city": "北京"}

# 访问与修改
print(person["name"])            # 张三
person["age"] = 19               # 修改
person["job"] = "工程师"          # 添加新键值对

# 删除
del person["city"]               # 删除键值对
age = person.pop("age")          # 删除并返回值
last = person.popitem()          # 删除并返回最后插入的键值对（Python 3.7+）

# 合并字典（Python 3.9+）
dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}
merged = dict1 | dict2            # {'a': 1, 'b': 3, 'c': 4}
dict1.update(dict2)               # 原地更新

# 遍历字典
for key in person:
    print(key, person[key])

for key, value in person.items():
    print(f"{key}: {value}")

for value in person.values():
    print(value)

# 检查键是否存在
if "name" in person:
    print(person["name"])

# 获取所有键/值/键值对
keys = person.keys()       # dict_keys(['name', 'job'])
values = person.values()   # dict_values(['张三', '工程师'])
items = person.items()     # dict_items([('name', '张三'), ('job', '工程师')])

4.5 setdefault() 方法

python 复制代码

# setdefault：键存在返回值，不存在则设置默认值并返回
person = {"name": "张三"}

name = person.setdefault("name", "未知")   # 键存在，返回 "张三"
city = person.setdefault("city", "北京")   # 键不存在，设置 {"city": "北京"}，返回 "北京"

print(person)  # {'name': '张三', 'city': '北京'}

五、⭐ 列表推导式（List Comprehension）

列表推导式是 Python 最优雅的特性之一，用一行代码代替多行循环。

5.1 基本语法

python 复制代码

# 语法：[表达式 for 变量 in 可迭代对象 if 条件]

# 传统写法 vs 列表推导式
squares = []
for x in range(10):
    squares.append(x ** 2)

# 一行搞定
squares = [x ** 2 for x in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

5.2 带条件的推导式

python 复制代码

# 筛选偶数
evens = [x for x in range(10) if x % 2 == 0]
print(evens)  # [0, 2, 4, 6, 8]

# 过滤并转换
numbers = [-3, -2, -1, 0, 1, 2, 3]
abs_values = [abs(x) for x in numbers]
print(abs_values)  # [3, 2, 1, 0, 1, 2, 3]

# 复杂条件：筛选并转换
even_squares = [x**2 for x in range(10) if x % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

5.3 嵌套循环的推导式

python 复制代码

# 传统嵌套循环
pairs = []
for x in [1, 2, 3]:
    for y in ['a', 'b']:
        pairs.append((x, y))

# 列表推导式
pairs = [(x, y) for x in [1, 2, 3] for y in ['a', 'b']]
print(pairs)  # [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]

# 矩阵展平
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [num for row in matrix for num in row]
print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

5.4 if-else 在推导式中

python 复制代码

# 注意：if-else 放在表达式位置，不是筛选条件
labels = [x if x % 2 == 0 else "奇数" for x in range(5)]
print(labels)  # [0, '奇数', 2, '奇数', 4]

# 实际应用：数据清洗
raw_data = ["apple", "", "banana", None, "orange", ""]
cleaned = [item for item in raw_data if item]  # 过滤空值
print(cleaned)  # ['apple', 'banana', 'orange']

5.5 其他推导式

python 复制代码

# 字典推导式
squares_dict = {x: x**2 for x in range(5)}
print(squares_dict)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# 集合推导式
unique_squares = {x**2 for x in [1, 2, 2, 3, 3, 3]}
print(unique_squares)  # {1, 4, 9}

# 生成器表达式（节省内存）
sum_squares = sum(x**2 for x in range(1000000))  # 不创建完整列表

六、集合（Set）：无序的不重复集合

6.1 集合的本质

集合是无序、不重复的元素集合，基于哈希表实现。主要用途：去重和集合运算（交、并、差）。

python 复制代码

# 创建集合
empty_set = set()           # 注意 {} 是空字典
numbers = {1, 2, 3, 3, 2}   # {1, 2, 3}（自动去重）
from_list = set([1, 2, 2, 3])  # {1, 2, 3}

# 字符串转集合（字符去重）
chars = set("hello")        # {'h', 'e', 'l', 'o'}

6.2 集合的基本操作

python 复制代码

# 添加和删除
s = {1, 2, 3}
s.add(4)                    # {1, 2, 3, 4}
s.add(2)                    # {1, 2, 3, 4}（无变化）
s.remove(3)                 # 删除，不存在则报错
s.discard(5)                # 删除，不存在也不报错
popped = s.pop()            # 删除并返回任意一个元素
s.clear()                   # 清空

# 成员检查（O(1) 时间复杂度）
s = {1, 2, 3, 4, 5}
print(3 in s)      # True
print(10 in s)     # False

# 列表去重（经典应用）
duplicates = [1, 2, 2, 3, 3, 3, 4]
unique = list(set(duplicates))
print(unique)  # [1, 2, 3, 4]（顺序不保证）
# 保持顺序的去重
unique_ordered = list(dict.fromkeys(duplicates))
print(unique_ordered)  # [1, 2, 3, 4]（保持顺序）

6.3 集合运算

python 复制代码

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# 并集（| 或 union）
print(A | B)        # {1, 2, 3, 4, 5, 6}
print(A.union(B))   # 同上

# 交集（& 或 intersection）
print(A & B)        # {3, 4}
print(A.intersection(B))

# 差集（- 或 difference）
print(A - B)        # {1, 2}（在A不在B）
print(B - A)        # {5, 6}（在B不在A）

# 对称差集（^ 或 symmetric_difference）
print(A ^ B)        # {1, 2, 5, 6}（在其中一个但不同时在）

# 子集判断
print({1, 2}.issubset(A))   # True
print(A.issuperset({1, 2})) # True

6.4 集合的实际应用

python 复制代码

# 1. 找出两个列表的共同元素
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
common = set(list1) & set(list2)
print(common)  # {4, 5}

# 2. 找出列表中只出现一次的元素
from collections import Counter
data = [1, 2, 2, 3, 3, 4]
unique = [x for x, count in Counter(data).items() if count == 1]
print(unique)  # [1, 4]

# 3. 数据清洗：保留特定集合中的值
valid_ids = {101, 102, 103, 104}
records = [101, 105, 102, 106, 103]
filtered = [x for x in records if x in valid_ids]
print(filtered)  # [101, 102, 103]

七、性能对比与选择指南

7.1 时间复杂度对比

数据结构	查找	插入	删除	有序性
列表	O(n)	O(n)	O(n)	✅
元组	O(n)	-	-	✅
字典	O(1) 平均	O(1) 平均	O(1) 平均	Python 3.7+
集合	O(1) 平均	O(1) 平均	O(1) 平均	❌

7.2 内存占用对比

python 复制代码

import sys

a_list = list(range(1000))
a_tuple = tuple(range(1000))
a_dict = {i: i for i in range(1000)}
a_set = set(range(1000))

print(f"列表: {sys.getsizeof(a_list)} 字节")
print(f"元组: {sys.getsizeof(a_tuple)} 字节")
print(f"字典: {sys.getsizeof(a_dict)} 字节")
print(f"集合: {sys.getsizeof(a_set)} 字节")

# 典型输出（仅供参考，实际会因实现而异）：
# 列表: 8056 字节
# 元组: 8024 字节
# 字典: 36872 字节
# 集合: 32936 字节

7.3 选择决策树

复制代码

需要存储数据？
    │
    ├── 需要键值对映射？ ──► 字典（dict）
    │
    ├── 需要去重？ ──► 集合（set）
    │
    └── 需要保持顺序？
            │
            ├── 需要修改？ ──► 列表（list）
            │
            └── 固定不变？ ──► 元组（tuple）

八、综合实战案例

8.1 学生成绩统计系统

python 复制代码

"""
功能：
1. 存储学生成绩（字典 + 列表）
2. 统计各科平均分
3. 找出不及格学生
4. 成绩排名
"""

# 学生成绩数据：{姓名: {"语文": 分数, "数学": 分数, "英语": 分数}}
students = {
    "张三": {"语文": 85, "数学": 92, "英语": 78},
    "李四": {"语文": 58, "数学": 65, "英语": 70},
    "王五": {"语文": 95, "数学": 88, "英语": 92},
    "赵六": {"语文": 45, "数学": 52, "英语": 60},
    "小明": {"语文": 88, "数学": 79, "英语": 85},
}

# 1. 计算各科平均分
subjects = ["语文", "数学", "英语"]
averages = {}

for subject in subjects:
    scores = [students[name][subject] for name in students]
    averages[subject] = sum(scores) / len(scores)

print("=== 各科平均分 ===")
for subject, avg in averages.items():
    print(f"{subject}: {avg:.1f}分")

# 2. 找出不及格学生（有任一科目 < 60）
failed_students = []
for name, scores in students.items():
    if any(score < 60 for score in scores.values()):
        failed_students.append(name)

print(f"\n=== 不及格学生 ===")
for name in failed_students:
    print(f"{name}: {students[name]}")

# 3. 计算每个学生的总分和平均分
student_stats = []
for name, scores in students.items():
    total = sum(scores.values())
    avg = total / len(scores)
    student_stats.append((name, total, avg))

# 按总分排序
student_stats.sort(key=lambda x: x[1], reverse=True)

print(f"\n=== 成绩排名 ===")
print(f"{'名次':<4}{'姓名':<6}{'总分':<6}{'平均分':<6}")
for i, (name, total, avg) in enumerate(student_stats, 1):
    print(f"{i:<4}{name:<6}{total:<6}{avg:.1f}")

# 4. 统计各分数段人数
score_ranges = {
    "优秀(≥90)": 0,
    "良好(80-89)": 0,
    "中等(70-79)": 0,
    "及格(60-69)": 0,
    "不及格(<60)": 0,
}

all_scores = [score for scores in students.values() for score in scores.values()]
for score in all_scores:
    if score >= 90:
        score_ranges["优秀(≥90)"] += 1
    elif score >= 80:
        score_ranges["良好(80-89)"] += 1
    elif score >= 70:
        score_ranges["中等(70-79)"] += 1
    elif score >= 60:
        score_ranges["及格(60-69)"] += 1
    else:
        score_ranges["不及格(<60)"] += 1

print(f"\n=== 分数段统计 ===")
for range_name, count in score_ranges.items():
    print(f"{range_name}: {count}人次")

8.2 文本词频统计

python 复制代码

"""
使用字典 + get() 方法统计词频
"""

text = """
Python is a powerful programming language. Python is easy to learn.
Python is used in data science, web development, and AI.
"""

# 清洗文本
import string
text = text.lower()
for punct in string.punctuation:
    text = text.replace(punct, " ")

# 分割单词
words = text.split()

# 方法1：使用 get() 统计词频
word_count = {}
for word in words:
    word_count[word] = word_count.get(word, 0) + 1

# 方法2：使用 defaultdict
from collections import defaultdict
word_count2 = defaultdict(int)
for word in words:
    word_count2[word] += 1

# 方法3：使用 Counter（最简洁）
from collections import Counter
word_count3 = Counter(words)

print("=== 词频统计 ===")
for word, count in sorted(word_count.items(), key=lambda x: x[1], reverse=True):
    print(f"{word}: {count}")

# 找出最常见的3个词
print(f"\n最常见的3个词: {Counter(words).most_common(3)}")

九、常见陷阱与最佳实践

9.1 常见陷阱

python 复制代码

# 陷阱1：列表作为函数默认参数
def bad_append(item, lst=[]):   # ❌ 默认列表会累积
    lst.append(item)
    return lst

print(bad_append(1))  # [1]
print(bad_append(2))  # [1, 2]  ❌ 不是 [2]

def good_append(item, lst=None):  # ✅ 正确做法
    if lst is None:
        lst = []
    lst.append(item)
    return lst

# 陷阱2：遍历时修改列表
numbers = [1, 2, 3, 4, 5]
for x in numbers:      # ❌ 可能导致意外
    if x % 2 == 0:
        numbers.remove(x)

# ✅ 正确做法：遍历副本
for x in numbers[:]:
    if x % 2 == 0:
        numbers.remove(x)

# 陷阱3：字典键必须是不可变类型
# d = {[1, 2]: "value"}  # ❌ 列表不能作为键
d = {(1, 2): "value"}    # ✅ 元组可以

# 陷阱4：使用 fromkeys 的陷阱
# d = dict.fromkeys(["a", "b"], [])  # 所有键共享同一个列表
# d["a"].append(1)  # d["b"] 也会改变

9.2 最佳实践速记

实践	说明
使用列表推导式替代简单 for 循环	代码更简洁、更 Pythonic
用 `get()` 安全访问字典	避免 KeyError
用 `defaultdict` 替代手动 `get()`	代码更优雅
用 `Counter` 做统计	比手动字典更高效
用 `set` 去重	比手动去重快得多
遍历字典时用 `items()`	同时获取键和值

十、总结

复制代码

┌─────────────────────────────────────────────────────────────────┐
│                     数据结构核心要点                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  列表：有序、可变、可重复                                        │
│  ├── 适用：存储有序数据、需要频繁修改                             │
│  └── 技巧：列表推导式、切片、append/pop                          │
│                                                                 │
│  元组：有序、不可变、可重复                                       │
│  ├── 适用：固定数据、字典键、函数多返回值                         │
│  └── 技巧：解包、交换变量                                       │
│                                                                 │
│  字典：无序(Py3.6+有序)、可变、键唯一                             │
│  ├── 适用：快速查找、键值映射、统计计数                           │
│  └── 技巧：get()、setdefault()、defaultdict、推导式              │
│                                                                 │
│  集合：无序、可变、元素唯一                                       │
│  ├── 适用：去重、成员测试、集合运算                               │
│  └── 技巧：交集&、并集|、差集-                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

一句话总结：列表存序列，元组锁不变；字典做映射，集合去重复；推导式一行顶三行，get() 方法保平安。