第 6 章异常处理与文件操作

异常处理与文件操作是 Python 程序可靠性与数据持久化的核心技术。异常是程序运行时的非预期事件，未妥善处理会导致程序中断；文件操作则是程序与外部存储交互的基础，承担数据持久化与跨程序共享的角色。二者结合是构建健壮应用的关键 ------ 通过异常处理实现错误捕获与恢复，借助文件操作完成数据全生命周期管理。本章将阐述 Python 异常处理的层级体系与捕获策略，详解文本、CSV、JSON 等文件格式的读写规范及编码适配方案，并通过工程案例展示二者协同应用，构建工业级数据处理流程。

6.1 异常基础及日志实践

异常是程序执行中因逻辑错误、资源缺失等触发的非预期事件，本质是 Python 解释器抛出的异常对象。通过结构化异常处理可捕获异常并执行恢复逻辑；日志系统则为异常监控与问题溯源提供依据，是生产环境的必备工具。

6.1.1 异常的类型体系与触发场景

Python 异常基于类继承结构，所有异常继承自BaseException。Exception是可捕获业务异常的父类（如ZeroDivisionError）；SystemExit等特殊异常直接继承BaseException，标识系统级事件。

核心异常类型及触发机制

异常类型	触发场景	示例代码	异常信息摘要
`ZeroDivisionError`	除数为 0	`print(5 / 0)`	`division by zero`
`FileNotFoundError`	`open()`指定路径不存在	`open("missing.txt")`	`No such file or directory: 'missing.txt'`
`TypeError`	操作应用于不兼容类型（如字符串 + 整数）	`"age: " + 25`	`can only concatenate str to str`
`KeyError`	访问字典不存在的键	`user = {"name": "Alice"}; user["age"]`	`'age'`
`IndexError`	序列索引超出范围	`nums = [1,2,3]; nums[5]`	`list index out of range`
`UnicodeDecodeError`	文本编码与实际编码不匹配导致解码失败	`open("gbk_file.txt", encoding="utf-8")`	`'utf-8' codec can't decode byte 0xb0`

核心要点：

优先捕获Exception及其子类，避免直接捕获BaseException（可能误捕Ctrl+C等系统事件）；
异常类型严格区分大小写，需精确匹配类名；
自定义异常需继承Exception，确保与 Python 异常体系兼容。

6.1.2 异常处理的结构化语法：`try-except-else-finally`

Python 通过try-except语句实现异常捕获，辅以else（正常流程）与finally（资源清理），形成完整处理闭环。

1. 基础捕获模式：`try-except`

将可能触发异常的代码封装于try块，异常处理逻辑置于except块：

python 复制代码

def integer_division(a: int, b: int) -> None:
    try:
        result = a / b
        print(f"结果：{result}")
    except ZeroDivisionError:
        print("异常：除数不能为0")

integer_division(10, 2)  # 输出：结果：5.0
integer_division(10, 0)  # 输出：异常：除数不能为0

2. 多异常捕获策略

通过多except块差异化处理不同异常，或用元组统一处理同类异常：

python 复制代码

def get_sequence_element(sequence: list | tuple, index: int) -> None:
    try:
        element = sequence[index]
        print(f"元素：{element}")
    except TypeError:
        print("异常：输入非列表/元组，不支持索引")
    except IndexError:
        print(f"异常：索引{index}超出范围")

# 统一捕获写法（逻辑一致时）
except (TypeError, IndexError) as e:
    print(f"操作异常：{str(e)}")

3. `else`与`finally`的应用

else：仅try无异常时执行，隔离正常流程逻辑；
finally：无论是否异常均执行，用于资源清理（如关闭文件）。

python 复制代码

def read_text_file(file_path: str) -> None:
    file = None
    try:
        file = open(file_path, "r", encoding="utf-8")
        content = file.read()
    except FileNotFoundError:
        print(f"文件{file_path}不存在")
    else:
        print(f"内容（前50字符）：{content[:50]}")
    finally:
        if file:  # 确保文件关闭，避免句柄泄漏
            file.close()
            print(f"文件{file_path}已关闭")

6.1.3 异常日志的标准化记录

生产环境需通过logging模块记录异常上下文（时间、类型、堆栈等），支持多目标输出与分级记录。

1. 日志系统配置

python 复制代码

import logging

logging.basicConfig(
    level=logging.ERROR,  # 仅记录ERROR及以上级别
    format="%(asctime)s - %(module)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("error.log"),  # 输出至文件
        logging.StreamHandler()  # 输出至控制台
    ]
)

def calculate_square(num: int) -> int:
    try:
        return num **2
    except TypeError as e:
        logging.error(f"计算异常：{str(e)}", exc_info=True)  # 记录堆栈信息
        raise

calculate_square("five")  # 触发TypeError，日志记录至文件与控制台

2. 日志级别及应用

级别	优先级	用途	场景示例
`DEBUG`	10	开发调试，记录详细上下文	变量值、函数调用轨迹
`INFO`	20	生产监控，记录关键流程节点	服务启动、数据同步完成
`WARNING`	30	潜在风险提示（不影响当前流程）	配置缺失使用默认值、磁盘空间不足
`ERROR`	40	单模块执行失败（不中断全局）	文件读取失败、API 调用超时
`CRITICAL`	50	系统级错误（程序无法运行）	数据库连接失败、核心文件损坏

建议：开发环境用DEBUG，生产环境用WARNING/ERROR；日志按时间分片，避免单文件过大；异常日志需含exc_info=True记录堆栈。

6.2 文件操作（文本、CSV、JSON 及编码适配）

文件是程序与外部存储交互的核心媒介。Python 提供简洁的 I/O 接口，支持主流格式读写，通过with语句（上下文管理器）自动管理资源，避免句柄泄漏。

6.2.1 文本文件操作：读写规范与上下文管理器

文本文件存储纯字符数据，通过open()创建文件对象，with语句自动管理资源。

1. 文本读取方式

-** 全量读取（read()）**：适用于小型文件，一次性加载为字符串：

python 复制代码

with open("config.txt", "r", encoding="utf-8") as f:
    content = f.read()  # 全量读取

-** 逐行读取（迭代文件对象）**：适用于大型文件，内存占用低：

python 复制代码

with open("large.log", "r", encoding="utf-8") as f:
    for line in f:  # 逐行读取
        print(line.rstrip("\n"))  # 去除换行符

-** 按行列表读取（readlines()）**：适用于中等文件，返回行列表：

python 复制代码

with open("user.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()  # 每行作为列表元素

2. 文本写入模式

模式	功能	场景示例	注意事项
`"w"`	覆盖写入（文件不存在则创建）	生成报告、覆盖旧数据	会清空原有内容，需谨慎使用
`"a"`	追加写入（文件不存在则创建）	日志记录、增量数据写入	内容追加至尾部，不影响原有数据
`"r+"`	读写模式（文件不存在则报错）	修改配置文件	需通过`seek()`调整指针位置

python 复制代码

# 覆盖写入
with open("msg.txt", "w", encoding="utf-8") as f:
    f.write("Python异常处理\n")  # 手动添加换行符
# 追加写入
with open("msg.txt", "a", encoding="utf-8") as f:
    f.write("工业级实践")

6.2.2 编码适配：解决`UnicodeDecodeError`

编码不匹配是UnicodeDecodeError的主因。不同系统默认编码不同（Windows 用 GBK，Linux/macOS 用 UTF-8），需通过适配确保正确读取。

1. 主流编码特性

编码	字符支持范围	应用环境	建议
UTF-8	兼容 ASCII，支持所有 Unicode 字符	跨平台应用、网页、国际化系统	推荐作为默认编码，兼容性最强
GBK	支持中文、英文，不支持其他 Unicode	中文 Windows、老式文档	仅用于读取历史文档，新文件用 UTF-8
Latin-1	支持西方字符，不支持中文	老式 Linux、遗留系统	仅适配历史数据，新开发避免使用

2. 编码错误处理策略

-** 明确文件编码 **：用编辑器或chardet库检测（pip install chardet）：

python 复制代码

import chardet
with open("unknown.txt", "rb") as f:
    raw = f.read(1024)
    print(chardet.detect(raw))  # 检测编码及置信度

-** 异常捕获与重试 **：尝试常用编码列表：

python 复制代码

def read_with_encoding(file_path: str) -> str | None:
    encodings = ["utf-8", "gbk", "latin-1"]
    for enc in encodings:
        try:
            with open(file_path, "r", encoding=enc) as f:
                return f.read()
        except UnicodeDecodeError:
            continue
    print(f"无法解析{file_path}")
    return None

6.2.3 CSV 文件操作：结构化表格数据读写

CSV 是存储表格数据的通用格式，csv模块自动处理字段分隔与转义，比手动分割更可靠。

1. CSV 读取

假设 "学生成绩表.csv" 内容：

csv 复制代码

姓名,学号,数学,语文
Alice,2024001,95,92
Bob,2024002,88,85

读取代码：

python 复制代码

import csv

with open("学生成绩表.csv", "r", encoding="utf-8", newline="") as f:
    reader = csv.reader(f)
    header = next(reader)  # 获取表头
    print("表头：", header)
    for row in reader:  # 遍历数据行
        name, sid, math, chinese = row
        total = int(math) + int(chinese)
        print(f"{name}总分：{total}")

2. CSV 写入

python 复制代码

import csv

header = ["姓名", "城市", "年龄"]
data = [["David", "Beijing", 22], ["Ella", "Shanghai", 21]]

with open("用户表.csv", "w", encoding="utf-8", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(header)  # 写入表头
    writer.writerows(data)   # 批量写入数据

6.2.4 JSON 文件操作：结构化数据交换

JSON 是轻量级数据交换格式，json模块实现 Python 数据结构与 JSON 的转换，是配置文件与 API 交换的首选。

1. JSON 写入

python 复制代码

import json

user_config = {
    "username": "alice123",
    "theme": "dark",
    "enabled_features": ["notifications", "cloud_sync"]
}

with open("config.json", "w", encoding="utf-8") as f:
    # indent格式化输出，ensure_ascii保留非ASCII字符
    json.dump(user_config, f, indent=4, ensure_ascii=False)

2. JSON 读取

python 复制代码

import json

def load_config(file_path: str) -> dict | None:
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            return json.load(f)  # 加载为字典
    except FileNotFoundError:
        print(f"文件{file_path}不存在")
    except json.JSONDecodeError:
        print(f"文件{file_path}格式非法")
    return None

6.2.5 文件系统交互：路径处理与目录管理

pathlib（面向对象）与os（函数式）模块用于文件系统交互，推荐pathlib（API 更直观）。

1. `pathlib`核心用法

python 复制代码

from pathlib import Path

p = Path("data/config.json")
print("文件名：", p.name)        # config.json
print("后缀：", p.suffix)        # .json
print("父目录：", p.parent)      # data

# 创建目录（多级，存在不报错）
log_dir = Path("logs/202410")
log_dir.mkdir(parents=True, exist_ok=True)

# 遍历目录下的.txt文件
for txt in Path(".").glob("*.txt"):
    print(txt)

2. `os`模块用法（兼容旧代码）

python 复制代码

import os

# 路径拼接（自动适配系统分隔符）
p = os.path.join("data", "config.json")
# 创建目录
os.makedirs("logs/202410", exist_ok=True)

6.3 异常与文件操作的工程化融合

通过三个案例展示异常处理与文件操作的协同应用，构建健壮数据处理系统。

6.3.1 案例 1：批量文本文件处理系统

需求：遍历 "source_files" 目录下的.txt文件，替换 "Python" 为 "Python 3.12"，结果保存至 "processed_files"；捕获异常，确保单文件失败不中断批量流程。

python 复制代码

import logging
from pathlib import Path

logging.basicConfig(
    level=logging.ERROR,
    format="%(asctime)s - %(module)s - %(levelname)s - %(message)s",
    handlers=[logging.FileHandler("batch.log"), logging.StreamHandler()]
)

def process_file(input_path: Path, output_dir: Path) -> None:
    output_dir.mkdir(parents=True, exist_ok=True)
    output_path = output_dir / input_path.name
    try:
        with open(input_path, "r", encoding="utf-8") as f:
            content = f.read()
        modified = content.replace("Python", "Python 3.12")
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(modified)
        print(f"处理成功：{input_path.name}")
    except FileNotFoundError:
        logging.error(f"文件不存在：{input_path}")
    except UnicodeDecodeError as e:
        logging.error(f"编码错误（{input_path}）：{e}")
    except Exception as e:
        logging.error(f"未知错误（{input_path}）：{e}", exc_info=True)

def batch_process(source_dir: str, target_dir: str) -> None:
    source = Path(source_dir)
    if not source.exists() or not source.is_dir():
        logging.error(f"源目录{source}无效")
        return
    for file in source.glob("*.txt"):
        if file.is_file():
            process_file(file, Path(target_dir))

if __name__ == "__main__":
    batch_process("source_files", "processed_files")

6.3.2 案例 2：大日志文件分析系统

需求：分析 1GB + 的 Web 日志 "access.log"，统计 "/api/login" 接口访问次数；控制内存占用，记录进度，捕获 I/O 与内存异常。

python 复制代码

import logging
from pathlib import Path

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(module)s - %(levelname)s - %(message)s",
    handlers=[logging.FileHandler("log_analysis.log")]
)

def count_api(log_path: str, target_api: str) -> int:
    log = Path(log_path)
    if not log.exists() or not log.is_file():
        logging.error(f"日志{log}无效")
        return 0
    try:
        count = 0
        with open(log, "rb", buffering=1024*1024) as f:  # 1MB缓冲区
            line_num = 0
            for line in f:
                line_num += 1
                try:
                    line_str = line.decode("utf-8")
                except UnicodeDecodeError:
                    line_str = line.decode("gbk", errors="ignore")
                if target_api in line_str:
                    count += 1
                if line_num % 1_000_000 == 0:
                    logging.info(f"处理{line_num}行，计数：{count}")
        logging.info(f"总访问次数：{count}")
        return count
    except MemoryError:
        logging.error("内存不足")
    except OSError as e:
        logging.error(f"I/O错误：{e}")
    return 0

if __name__ == "__main__":
    print(f"访问次数：{count_api('access.log', '/api/login')}")

6.3.3 案例 3：用户数据管理系统

需求：开发用户管理系统，支持添加、查询用户，数据持久化至 "users.json"；处理用户名重复、输入非法等异常，记录审计日志。

python 复制代码

import json
import logging
from pathlib import Path
from datetime import datetime
from typing import Dict, Optional

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(module)s - %(levelname)s - %(message)s",
    handlers=[logging.FileHandler("user_manager.log")]
)

class UserManager:
    def __init__(self, data_file: str = "users.json"):
        self.data_file = Path(data_file)
        self.users: Dict[str, Dict] = self._load()
    
    def _load(self) -> Dict:
        try:
            if self.data_file.exists():
                with open(self.data_file, "r", encoding="utf-8") as f:
                    return json.load(f)
            return {}
        except json.JSONDecodeError:
            logging.error(f"{self.data_file}格式非法")
            return {}
    
    def add_user(self, username: str, email: str, age: int) -> bool:
        if not username or not email:
            logging.warning(f"{username}：用户名/邮箱为空")
            return False
        if not isinstance(age, int) or age < 0:
            logging.warning(f"{username}：年龄非法")
            return False
        if username in self.users:
            logging.warning(f"{username}：已存在")
            return False
        self.users[username] = {
            "email": email, "age": age,
            "create_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        }
        logging.info(f"添加成功：{username}")
        return True
    
    def get_user(self, username: str) -> Optional[Dict]:
        if username in self.users:
            logging.info(f"查询成功：{username}")
            return self.users[username]
        logging.warning(f"{username}：不存在")
        return None
    
    def save(self) -> bool:
        try:
            with open(self.data_file, "w", encoding="utf-8") as f:
                json.dump(self.users, f, indent=4, ensure_ascii=False)
            logging.info(f"保存成功：{len(self.users)}个用户")
            return True
        except Exception as e:
            logging.error(f"保存失败：{e}")
            return False

if __name__ == "__main__":
    mgr = UserManager()
    mgr.add_user("alice", "alice@ex.com", 25)
    mgr.add_user("bob", "bob@ex.com", "23")  # 失败（年龄非整数）
    print("查询alice：", mgr.get_user("alice"))
    mgr.save()

第 6 章 异常处理与文件操作

6.1 异常基础及日志实践

6.1.1 异常的类型体系与触发场景

核心异常类型及触发机制

6.1.2 异常处理的结构化语法：try-except-else-finally

1. 基础捕获模式：try-except

2. 多异常捕获策略

3. else与finally的应用

6.1.3 异常日志的标准化记录

1. 日志系统配置

2. 日志级别及应用

6.2 文件操作（文本、CSV、JSON 及编码适配）

6.2.1 文本文件操作：读写规范与上下文管理器

1. 文本读取方式

2. 文本写入模式

6.2.2 编码适配：解决UnicodeDecodeError

1. 主流编码特性

2. 编码错误处理策略

6.2.3 CSV 文件操作：结构化表格数据读写

1. CSV 读取

2. CSV 写入

6.2.4 JSON 文件操作：结构化数据交换

1. JSON 写入

2. JSON 读取

6.2.5 文件系统交互：路径处理与目录管理

1. pathlib核心用法

2. os模块用法（兼容旧代码）

6.3 异常与文件操作的工程化融合

6.3.1 案例 1：批量文本文件处理系统

6.3.2 案例 2：大日志文件分析系统

6.3.3 案例 3：用户数据管理系统

第 6 章异常处理与文件操作

6.1.2 异常处理的结构化语法：`try-except-else-finally`

1. 基础捕获模式：`try-except`

3. `else`与`finally`的应用

6.2.2 编码适配：解决`UnicodeDecodeError`

1. `pathlib`核心用法

2. `os`模块用法（兼容旧代码）