序列化的艺术：Python JSON处理完全解析

🔎大家好，我是ZTLJQ，希望你看完之后，能对你有所帮助，不足请指正！共同学习交流

📝个人主页－ZTLJQ的主页

🎁欢迎各位→点赞👍 + 收藏⭐️ + 留言📝📣系列果你对这个系列感兴趣的话

专栏 - Python从零到企业级应用：短时间成为市场抢手的程序员

✔说明⇢本人讲解主要包括Python爬虫、JS逆向、Python的企业级应用

如果你对这个系列感兴趣的话，可以关注订阅哟👋

为什么是JSON？

JavaScript Object Notation (JSON) 是一种轻量级的数据交换格式。它之所以如此流行，是因为：

易于阅读和编写: 对人类友好。
易于机器解析和生成: 格式简单，开销小。
语言无关: 虽然源自JavaScript，但几乎所有编程语言都支持。
与Python数据结构天然契合 : JSON的object对应Python的dict，array对应list，string、number、boolean、null也都有直接映射。

Python通过内置的json模块提供了完整的JSON编码（序列化）和解码（反序列化）功能。掌握它，就等于掌握了与现代Web世界对话的能力。

第一部分：核心概念与基本操作

1.1 基础术语

序列化 (Serialization / Encoding) : 将Python对象转换为JSON字符串的过程。json模块中对应的方法是 dumps (dump string) 和 dump。
反序列化 (Deserialization / Decoding) : 将JSON字符串转换为Python对象的过程。对应的方法是 loads (load string) 和 load。

口诀 : s 结尾的处理 字符串 (String) ，不带s的处理 文件 (File-like object)。

1.2 基本的序列化 (`json.dumps`)

python 复制代码

import json

# 准备一个复杂的Python数据结构
data = {
    "name": "Alice",
    "age": 30,
    "is_student": False,
    "courses": ["Math", "Physics"],
    "address": {
        "street": "123 Main St",
        "city": "Beijing",
        "zipcode": None # None 会变成 null
    }
}

# 将Python字典序列化为JSON字符串
json_string = json.dumps(data)
print(json_string)
# 输出: {"name": "Alice", "age": 30, "is_student": False, "courses": ["Math", "Physics"], "address": {"street": "123 Main St", "city": "Beijing", "zipcode": null}}

# 美化输出 (增加缩进)
pretty_json = json.dumps(data, indent=4)
print(pretty_json)
# 输出:
# {
#     "name": "Alice",
#     "age": 30,
#     "is_student": False,
#     "courses": [
#         "Math",
#         "Physics"
#     ],
#     "address": {
#         "street": "123 Main St",
#         "city": "Beijing",
#         "zipcode": null
#     }
# }

# 排序键 (可选，用于保证输出一致)
sorted_json = json.dumps(data, sort_keys=True, indent=2)

1.3 基本的反序列化 (`json.loads`)

python 复制代码

import json

# 一个JSON字符串
json_str = '{"name": "Bob", "age": 25, "hobbies": ["gaming", "reading"]}'

# 将JSON字符串反序列化为Python字典
python_dict = json.loads(json_str)
print(python_dict) # {'name': 'Bob', 'age': 25, 'hobbies': ['gaming', 'reading']}
print(type(python_dict)) # <class 'dict'>

# 访问数据
print(f"Hello, {python_dict['name']}! You are {python_dict['age']} years old.")

第二部分：实战案例 - 文件与API交互

案例一：读写JSON配置文件

这是最常见的应用场景之一。

python 复制代码

import json
from pathlib import Path

# ---- 写入配置文件 ----
config = {
    "database_url": "sqlite:///app.db",
    "debug": True,
    "allowed_hosts": ["localhost", "127.0.0.1"],
    "api_keys": {
        "openweathermap": "your_api_key_here"
    }
}

# 使用 json.dump 写入文件
config_path = Path("config.json")
with open(config_path, 'w', encoding='utf-8') as f:
    json.dump(config, f, indent=4, ensure_ascii=False) 
    # ensure_ascii=False 允许直接存储非ASCII字符（如中文），而不是转义

print(f"配置已保存至 {config_path}")

# ---- 读取配置文件 ----
def load_config(path: Path) -> dict:
    """安全地加载配置文件，如果文件不存在则返回默认配置"""
    if not path.exists():
        print(f"配置文件 {path} 不存在，使用默认配置。")
        return {"debug": False, "allowed_hosts": []}
    
    try:
        with open(path, 'r', encoding='utf-8') as f:
            return json.load(f) # 直接从文件对象加载
    except json.JSONDecodeError as e:
        print(f"配置文件格式错误: {e}")
        return {}
    except Exception as e:
        print(f"读取配置文件时出错: {e}")
        return {}

# 加载并使用配置
current_config = load_config(config_path)
print(f"Debug模式: {current_config.get('debug', False)}")

解析 : json.load() 和 json.dump() 直接操作文件句柄，非常方便。务必使用try-except来处理文件不存在或JSON格式错误的情况。

案例二：与Web API交互 (以获取天气为例)

我们将使用requests库（需pip install requests）来演示。

python 复制代码

import json
import requests
from typing import Optional, Dict, Any

def get_weather(city: str, api_key: str) -> Optional[Dict[Any, Any]]:
    """
    调用OpenWeatherMap API获取天气信息。
    返回解析后的Python字典，或None（如果失败）。
    """
    base_url = "http://api.openweathermap.org/data/2.5/weather"
    params = {
        'q': city,
        'appid': api_key,
        'units': 'metric' # 使用摄氏度
    }
    
    try:
        response = requests.get(base_url, params=params, timeout=10)
        
        # 检查HTTP状态码
        if response.status_code == 200:
            # 方式1: 让 requests 自动解析JSON
            data = response.json() # 这行代码内部调用了 json.loads(response.text)
            
            # 方式2: 手动解析 (等效)
            # data = json.loads(response.text)
            
            return data
        else:
            print(f"请求失败: HTTP {response.status_code}, {response.text}")
            return None
            
    except requests.exceptions.RequestException as e:
        print(f"网络请求异常: {e}")
        return None
    except json.JSONDecodeError as e:
        print(f"响应不是有效的JSON: {e}")
        return None

# 使用函数
api_key = "your_actual_api_key" # 替换为你的实际API Key
weather_data = get_weather("Beijing", api_key)

if weather_data:
    # 提取关键信息
    main_info = weather_data['main']
    weather_desc = weather_data['weather'][0]['description']
    
    print(f"城市: {weather_data['name']}")
    print(f"温度: {main_info['temp']}°C")
    print(f"体感温度: {main_info['feels_like']}°C")
    print(f"天气: {weather_desc.capitalize()}")
    print(f"气压: {main_info['pressure']} hPa")
else:
    print("无法获取天气信息。")

解析:

requests.get().json() 是最常用的方式，它会自动处理编码并调用json.loads。

即使API返回了JSON，也可能包含业务逻辑错误（如无效的城市名），所以要检查status_code。

网络请求可能超时或失败，必须用try-except捕获RequestException。

第三部分：高级主题与自定义处理

3.1 处理不支持的数据类型

json模块不能直接序列化所有Python类型，例如datetime、set、自定义类实例。

python 复制代码

import json
from datetime import datetime, date
from decimal import Decimal

# 包含不支持类型的字典
complex_data = {
    "event_name": "Conference",
    "start_time": datetime.now(), # datetime 不是标准JSON类型
    "participants": {"Alice", "Bob"}, # set 也不是
    "price": Decimal("99.99") # Decimal 通常需要转为float
}

# ❌ 直接序列化会报错
# json.dumps(complex_data) # TypeError: Object of type datetime is not JSON serializable

# ---- 解决方案1: 使用 default 参数 ----
def custom_serializer(obj):
    """自定义序列化函数"""
    if isinstance(obj, (datetime, date)):
        return obj.isoformat() # 转换为 ISO 8601 字符串
    elif isinstance(obj, set):
        return list(obj) # 转换为列表
    elif isinstance(obj, Decimal):
        return float(obj) # 或者 str(obj) 以保持精度
    else:
        raise TypeError(f"Object of type {type(obj)} is not JSON serializable")

serialized = json.dumps(complex_data, default=custom_serializer, indent=2)
print(serialized)
# "start_time" 的值现在是一个字符串，如 "2026-03-18T10:30:45.123456"

# ---- 解决方案2: 继承 JSONEncoder ----
class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.strftime("%Y-%m-%d %H:%M:%S") # 自定义格式
        elif isinstance(obj, set):
            return sorted(obj) # 转为有序列表
        elif isinstance(obj, Decimal):
            return str(obj) # 保持精确字符串
        return super().default(obj) # 调用父类方法处理其他情况

# 使用自定义编码器
serialized_v2 = json.dumps(complex_data, cls=CustomJSONEncoder, indent=2)
print(serialized_v2)

3.2 反序列化时的自定义处理 (`object_hook`)

有时你希望将特定的JSON结构反序列化为特定的Python对象。

python 复制代码

import json
from datetime import datetime

# JSON字符串，其中"time"字段是ISO格式的时间戳
json_str_with_date = '''
{
    "event": "Meeting",
    "time": "2026-03-18T14:00:00",
    "attendees": ["Charlie", "Diana"]
}
'''

# 自定义钩子函数
def datetime_hook(dct):
    """
    钩子函数：遍历字典，将符合时间格式的字符串转换为datetime对象。
    """
    for key, value in dct.items():
        if key == "time" and isinstance(value, str):
            try:
                # 尝试按指定格式解析
                dct[key] = datetime.fromisoformat(value)
            except ValueError:
                pass # 如果格式不对，就保留原样
    return dct

# 使用 object_hook
data = json.loads(json_str_with_date, object_hook=datetime_hook)
print(data)
# {'event': 'Meeting', 'time': datetime.datetime(2026, 3, 18, 14, 0), 'attendees': ['Charlie', 'Diana']}
print(type(data['time'])) # <class 'datetime.datetime'>

解析 : object_hook 在每次创建一个字典对象时都会被调用，允许你修改它。这对于构建领域模型非常有用。

3.3 处理大文件流 (`json.JSONDecoder`)

对于巨大的JSON文件，一次性加载到内存可能不现实。ijson库（需安装）可以实现流式解析，但标准库的json模块也能处理一些场景。

python 复制代码

import json

# 对于一个巨大的JSON数组文件
# huge_array.json: [ {...}, {...}, ... ]

def process_large_json_array(file_path):
    """流式处理大型JSON数组"""
    with open(file_path, 'r', encoding='utf-8') as f:
        # 假设整个文件是一个大的数组
        decoder = json.JSONDecoder()
        buffer = ""
        depth = 0
        start = 0
        
        # 这种方式比较复杂，通常推荐使用 ijson
        # 更简单的做法：如果文件是每行一个JSON对象（JSON Lines格式）
        pass

# 更常见的"大文件"解决方案：JSON Lines (.jsonl)
# 每行是一个独立的、有效的JSON对象
def process_json_lines(file_path):
    """处理JSON Lines格式的文件"""
    results = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line: # 跳过空行
                continue
            try:
                data = json.loads(line)
                # 处理单个对象
                result = process_single_record(data)
                results.append(result)
            except json.JSONDecodeError as e:
                print(f"第{line_num}行JSON格式错误: {e}")
    return results

第四部分：最佳实践与陷阱

总是处理异常 : json.loads() 和 json.load() 在遇到无效JSON时会抛出json.JSONDecodeError。文件I/O还会涉及FileNotFoundError, PermissionError等。务必使用try-except。
注意编码 : 读写文件时，始终指定encoding='utf-8'，以正确处理中文等Unicode字符。
浮点数精度 : JSON的数字是双精度浮点数。在序列化Decimal或高精度数值时，可能会丢失精度。如果精度至关重要，考虑先将它们序列化为字符串。
循环引用 : Python对象可能存在循环引用（A的属性指向B，B的属性又指向A）。json.dumps() 默认会检测并抛出RecursionError。确保你的数据结构是"树状"的，没有环。
性能 : json模块是用C实现的，非常快。对于绝大多数应用，性能不是问题。
不要信任输入: 来自外部的JSON数据可能是恶意的。在反序列化后，对数据进行验证（例如，检查类型、范围）是必要的安全措施。

结语

JSON是连接不同系统、不同语言的通用语言。Python的json模块以其简洁的API和强大的功能，让这种连接变得异常容易。

通过本篇博客的学习，你应该已经掌握了：

如何在Python对象和JSON字符串/文件之间进行双向转换。
如何处理配置文件和Web API等真实世界的案例。
如何通过default、cls和object_hook来扩展模块，处理自定义类型。
流式处理大数据的基本思路。

序列化的艺术：Python JSON处理完全解析

为什么是JSON？

第一部分：核心概念与基本操作

1.1 基础术语

1.2 基本的序列化 (json.dumps)

1.3 基本的反序列化 (json.loads)

第二部分：实战案例 - 文件与API交互

案例一：读写JSON配置文件

案例二：与Web API交互 (以获取天气为例)

第三部分：高级主题与自定义处理

3.1 处理不支持的数据类型

3.2 反序列化时的自定义处理 (object_hook)

3.3 处理大文件流 (json.JSONDecoder)

第四部分：最佳实践与陷阱

结语

1.2 基本的序列化 (`json.dumps`)

1.3 基本的反序列化 (`json.loads`)

3.2 反序列化时的自定义处理 (`object_hook`)

3.3 处理大文件流 (`json.JSONDecoder`)