对python的再认识-基于数据结构进行-a004-列表-实用事务

Python 列表实用事务

本文聚焦 Python 列表的去重、计数与统计操作，包含多种去重方法、高效计数技巧及常用统计函数。

我也将给出个人思维导图

一、去重

1.1 使用 set()

最简单的去重方法，利用集合的特性：

python 复制代码

nums = [1, 2, 2, 3, 3, 3, 4]

# 转为集合再转回列表
unique = list(set(nums))
# [1, 2, 3, 4] - 顺序可能改变

# 字符串去重
chars = list('hello')
unique = list(set(chars))
# ['h', 'e', 'l', 'o']

注意：set() 不保证顺序，元素顺序可能改变。

1.2 保持顺序去重

python 复制代码

# 方法 1：使用 dict.fromkeys()（Python 3.7+，推荐）
nums = [1, 2, 2, 3, 3, 3, 4]
unique = list(dict.fromkeys(nums))
# [1, 2, 3, 4] - 保持原顺序

# 方法 2：列表推导式（巧妙但不易读）
def unique_ordered(lst):
    seen = []
    return [x for x in lst if not (x in seen or seen.append(x))]

# 逐行解释 [x for x in lst if not (x in seen or seen.append(x))]：
# 遍历 lst = [1, 2, 2, 3]：
#
#   x = 1:  (1 in seen) = False ← 事件流 A：没见过
#           → 继续执行右边：seen.append(1) → seen = [1]，返回 None
#           → (False or None) = None（假值）
#           → not None = True
#           → 结果：保留 1
#
#   x = 2:  (2 in seen) = False ← 事件流 A：没见过
#           → 继续执行右边：seen.append(2) → seen = [1, 2]，返回 None
#           → (False or None) = None
#           → not None = True
#           → 结果：保留 2
#
#   x = 2:  (2 in seen) = True  ← 事件流 B：已经见过了
#           → 短路！不执行右边（不 append，seen 保持不变）
#           → (True or ...) = True
#           → not True = False
#           → 结果：过滤掉这个 2
#
# 事件流 A：x in seen 为 False（没见过）
#   1. 继续计算 or 右边：seen.append(x)
#   2. append() 将 x 加入 seen，返回 None
#   3. (False or None) = None
#   4. not None = True → 保留 x
#
# 事件流 B：x in seen 为 True（已见过）
#   1. or 短路，不计算右边（不执行 append）
#   2. (True or ...) = True
#   3. not True = False → 过滤 x
#
# 关键技巧：
# - or 短路：左边为真就不算右边
# - append() 返回 None（假值）
# - not None = True（保留），not True = False（过滤）

# 等价的传统写法（更易理解）：
# def unique_ordered(lst):
#     result = []
#     for x in lst:
#         if x not in result:      # 如果 x 没见过
#             result.append(x)     # 添加到结果
#     return result

nums = [1, 2, 2, 3, 3, 3, 4]
unique = unique_ordered(nums)
# [1, 2, 3, 4]

# 方法 3：使用 set 辅助（更高效）
def unique_ordered_set(lst):
    seen = set()
    result = []
    for item in lst:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

1.3 字符串列表去重

python 复制代码

words = ['apple', 'banana', 'apple', 'cherry', 'banana']

# 基础去重
unique = list(set(words))
# ['cherry', 'banana', 'apple'] - 顺序可能改变

# 保持顺序
unique = list(dict.fromkeys(words))
# ['apple', 'banana', 'cherry']

# 不区分大小写去重
words = ['Apple', 'apple', 'BANANA', 'banana']
unique = list(dict.fromkeys(word.lower() for word in words))
# ['apple', 'banana']

1.4 嵌套列表去重

python 复制代码

# 嵌套列表（可哈希）
matrix = [[1, 2], [3, 4], [1, 2], [5, 6]]

# 方法 1：转为元组去重
unique = [list(x) for x in set(tuple(x) for x in matrix)]
# [[1, 2], [3, 4], [5, 6]] - 顺序可能改变

# 方法 2：保持顺序
def unique_nested(lst):
    seen = set()
    result = []
    for item in lst:
        t = tuple(item)
        if t not in seen:
            seen.add(t)
            result.append(item)
    return result

matrix = [[1, 2], [3, 4], [1, 2], [5, 6]]
unique = unique_nested(matrix)
# [[1, 2], [3, 4], [5, 6]]

1.5 按条件去重

python 复制代码

# 按某个属性去重（嵌套列表）
students = [['Alice', 20], ['Bob', 22], ['Alice', 21]]

# 按姓名去重（保留第一次出现）
def unique_by_key(lst, key_func):
    seen = set()
    result = []
    for item in lst:
        key = key_func(item)
        if key not in seen:
            seen.add(key)
            result.append(item)
    return result

unique = unique_by_key(students, lambda x: x[0])
# [['Alice', 20], ['Bob', 22]]

# 按值去重（数值列表）
nums = [1, -1, 2, -2, 1, 2]

# 按绝对值去重
unique = list(dict.fromkeys(nums, key=abs))
# [1, -1, 2, -2]

二、计数

2.1 count() 方法

列表内置的计数方法：

python 复制代码

nums = [1, 2, 2, 3, 3, 3, 4]

# 统计单个元素出现次数
nums.count(3)   # 3
nums.count(5)   # 0

# 字符串列表
words = ['apple', 'banana', 'apple', 'cherry']
words.count('apple')  # 2

# 统计多个元素
nums = [1, 2, 2, 3, 3, 3]
counts = {x: nums.count(x) for x in set(nums)}
# {1: 1, 2: 2, 3: 3}

注意：count() 时间复杂度为 O(n)，大量统计时效率较低。

2.2 collections.Counter

更高效的计数工具：

python 复制代码

from collections import Counter

nums = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]

# 创建计数器
counter = Counter(nums)
# Counter({4: 4, 3: 3, 2: 2, 1: 1})

# 获取元素出现次数
counter[3]      # 3
counter.get(5, 0)  # 0（不存在返回默认值）

# 获取最常见的 n 个元素
counter.most_common(2)
# [(4, 4), (3, 3)]

# 更新计数器
counter.update([3, 3, 5])
# Counter({4: 4, 3: 5, 2: 2, 1: 1, 5: 1})

注意：Counter 是 dict 的子类，Python 3.7+ 中保持插入顺序，元素按首次出现的顺序排列。

2.3 统计所有元素

python 复制代码

from collections import Counter

words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']

# 方法 1：Counter
counter = Counter(words)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1})

# 转为字典
dict(counter)
# {'apple': 3, 'banana': 2, 'cherry': 1}

# 方法 2：字典推导式
counts = {word: words.count(word) for word in set(words)}
# {'apple': 3, 'banana': 2, 'cherry': 1}

# 方法 3：循环统计
def count_all(lst):
    counts = {}
    for item in lst:
        counts[item] = counts.get(item, 0) + 1
    return counts

2.4 条件计数

python 复制代码

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# 统计偶数个数
sum(1 for x in nums if x % 2 == 0)  # 4

# 统计大于 5 的个数
sum(1 for x in nums if x > 5)  # 4

# 使用 len + filter
len(list(filter(lambda x: x % 2 == 0, nums)))  # 4

三、统计

3.1 基本统计函数

python 复制代码

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# 求和
sum(nums)           # 55

# 个数
len(nums)           # 10

# 最大值
max(nums)           # 10

# 最小值
min(nums)           # 1

# 最大值的索引
nums.index(max(nums))  # 9

3.2 statistics 模块

Python 标准库的统计模块：

python 复制代码

import statistics

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# 平均值
statistics.mean(nums)       # 5.5

# 中位数
statistics.median(nums)     # 5.5

# 众数（出现最频繁）
data = [1, 2, 2, 3, 3, 3, 4]
statistics.mode(data)      # 3

# 标准差
statistics.stdev(nums)      # 3.027...

# 方差
statistics.variance(nums)   # 9.166...

3.3 列表推导式统计

python 复制代码

# 提取嵌套数据统计
students = [['Alice', 85], ['Bob', 92], ['Charlie', 78]]

# 平均成绩
sum(s[1] for s in students) / len(students)  # 85.0

# 最高分学生
max(students, key=lambda s: s[1])  # ['Bob', 92]

# 及格人数
sum(1 for s in students if s[1] >= 80)  # 2

3.4 分组统计

python 复制代码

from collections import defaultdict

scores = [
    ['Alice', 'A', 85],
    ['Bob', 'B', 92],
    ['Charlie', 'A', 78],
    ['David', 'B', 88],
]

# 按班级分组
by_class = defaultdict(list)
for name, cls, score in scores:
    by_class[cls].append(score)

# 各班平均分
{cls: sum(scores) / len(scores) for cls, scores in by_class.items()}
# {'A': 81.5, 'B': 90.0}

四、性能对比

4.1 去重方法性能

python 复制代码

import timeit

TEST_LIST = "list(range(10000)) * 10"  # 100000 个元素

# set() 去重
timeit.timeit("list(set(lst))", setup="lst = " + TEST_LIST, number=100)
# 相对速度：1.0x（最快）

# dict.fromkeys() 去重
timeit.timeit("list(dict.fromkeys(lst))", setup="lst = " + TEST_LIST, number=100)
# 相对速度：1.2x

# 循环判断去重
timeit.timeit("unique_ordered_loop(lst)", setup="from __main__ import unique_ordered_loop; lst = " + TEST_LIST, number=100)
# 相对速度：10x（最慢）

方法	是否保持顺序	相对速度	适用场景
`set()`	否	1.0x	顺序无关
`dict.fromkeys()`	是	1.2x	保持顺序
循环判断	是	10x	小数据量

4.2 计数方法性能

方法	时间复杂度	适用场景
`list.count()`	O(n)	单个元素计数
`Counter`	O(n)	多元素计数（推荐）
字典推导式+count	O(n²)	不推荐（效率低）

五、常见陷阱

5.1 set() 改变顺序

python 复制代码

# 问题：set() 不保证顺序
nums = [3, 1, 2, 1, 4, 3]
list(set(nums))  # [1, 2, 3, 4] - 顺序改变

# 解决：使用 dict.fromkeys()
list(dict.fromkeys(nums))  # [3, 1, 2, 4]

5.2 不可哈希元素去重

python 复制代码

# 错误：嵌套列表无法用 set()
matrix = [[1, 2], [3, 4], [1, 2]]
set(matrix)  # TypeError: unhashable type: 'list'

# 解决：转为元组
set(tuple(x) for x in matrix)  # {(1, 2), (3, 4)}

5.3 count() 效率问题

python 复制代码

# 不推荐：多次调用 count()
nums = [1, 2, 2, 3, 3, 3]
counts = {x: nums.count(x) for x in set(nums)}  # O(n²)

# 推荐：使用 Counter
from collections import Counter
counts = Counter(nums)  # O(n)

六、总结

操作	推荐方法	注意事项
简单去重	`list(set(lst))`	顺序可能改变
保持顺序去重	`list(dict.fromkeys(lst))`	Python 3.7+
嵌套去重	转元组 + set	需处理不可哈希问题
单个计数	`lst.count(x)`	-
多个计数	`Counter(lst)`	高效，推荐使用
条件计数	`sum(1 for x in lst if ...)`	列表推导式
基本统计	`sum`/`len`/`max`/`min`	内置函数
高级统计	`statistics` 模块	需导入模块

对python的再认识-基于数据结构进行-a004-列表-实用事务

Python 列表 实用事务

一、去重

1.1 使用 set()

1.2 保持顺序去重

1.3 字符串列表去重

1.4 嵌套列表去重

1.5 按条件去重

二、计数

2.1 count() 方法

2.2 collections.Counter

2.3 统计所有元素

2.4 条件计数

三、统计

3.1 基本统计函数

3.2 statistics 模块

3.3 列表推导式统计

3.4 分组统计

四、性能对比

4.1 去重方法性能

4.2 计数方法性能

五、常见陷阱

5.1 set() 改变顺序

5.2 不可哈希元素去重

5.3 count() 效率问题

六、总结

Python 列表实用事务