精通 Python 设计模式——性能模式

上一章 我们介绍了并发与异步模式，它们有助于编写能够同时处理多任务的高效软件。接下来，我们将讨论一组具体的性能模式，用于提升应用的速度与资源利用率。

性能模式旨在解决常见瓶颈与优化难题，为开发者提供经过验证的方法论，以改进执行时间、降低内存占用，并实现有效扩展。

在本章中，我们将讨论以下主题：

Cache-Aside（旁路缓存）模式
Memoization（记忆化）模式
Lazy Loading（延迟加载）模式

技术要求

请先参见第 1 章中的通用要求。本章代码的额外环境需求如下：

安装 Faker 模块：
python -m pip install faker
安装 Redis 模块：
python -m pip install redis
安装并运行 Redis 服务器（使用 Docker）：
docker run --name myredis -p 6379:6379 redis
如有需要，可参阅文档：redis.io/docs/latest...

Cache-Aside（旁路缓存）模式

当数据读多写少 时，应用会使用缓存来优化对数据库/数据存储中信息的重复访问。有些系统内置并自动执行这类机制；若没有，我们就需要在应用中实现一种适配具体场景的缓存策略。

其中一种策略就是 Cache-Aside（旁路缓存） ：为提升性能，把高频访问的数据存入缓存，从而减少对数据存储的反复读取。

现实中的软件示例

Memcached：常见的缓存服务器，是面向小数据块的内存键值存储，常用于缓存数据库调用、API 调用结果或 HTML 片段。
Redis：同样常用作缓存服务器。如今在缓存或应用内存存储场景下，Redis 表现尤为出色。
Amazon ElastiCache ：官方文档称其为一项 Web 服务，便于在云端搭建、管理与扩展 分布式内存数据存储或缓存环境（docs.aws.amazon.com/elasticache...）。

适用场景

当我们需要降低数据库负载 时，Cache-Aside 很有用。把高频数据缓存起来，可以减少数据库查询次数；同时也能提升响应性，因为读取缓存的延迟更小。

注意：该模式适用于数据不经常变更 ，且存储不依赖多键强一致的场景。例如，适用于某些键不更新、偶尔删除数据项、且对"在缓存刷新前继续服务旧数据"没有强要求的文档类存储或数据库。

实现 Cache-Aside 的步骤

涉及数据库与缓存两部分，流程可概括为：

情形 1（读取） ：
若缓存命中则直接返回；若未命中，则从数据库读取，把结果写入缓存后再返回。
情形 2（更新） ：
先写数据库，再删除对应的缓存条目。

下面用"名人名言（quotes）"的小示例演示情形 1（读取） 。我们会准备：

一个 SQLite 数据库（Python 标准库 sqlite3 可直接操作）
一个 Redis 服务器与 redis-py 模块

我们使用脚本 ch08/cache_aside/populate_db.py 创建数据库与 quotes 表，并填充示例数据。为方便起见，利用 Faker 生成假数据。

数据填充脚本（节选）

python 复制代码

import sqlite3
from pathlib import Path
from random import randint
import redis
from faker import Faker
fake = Faker()
DB_PATH = Path(__file__).parent / Path("quotes.sqlite3")
cache = redis.StrictRedis(host="localhost", port=6379, decode_responses=True)

def setup_db():
    try:
        with sqlite3.connect(DB_PATH) as db:
            cursor = db.cursor()
            cursor.execute(
                """
                CREATE TABLE quotes(id INTEGER PRIMARY KEY, text TEXT)
            """
            )
            db.commit()
            print("Table 'quotes' created")
    except Exception as e:
        print(e)

def add_quotes(quotes_list):
    added = []
    try:
        with sqlite3.connect(DB_PATH) as db:
            cursor = db.cursor()
            for quote_text in quotes_list:
                quote_id = randint(1, 100) # nosec
                quote = (quote_id, quote_text)
                cursor.execute(
                    """INSERT OR IGNORE INTO quotes(id, text) VALUES(?, ?)""", quote
                )
                added.append(quote)
            db.commit()
    except Exception as e:
        print(e)
    return added

def main():
    msg = "Choose your mode! Enter 'init' or 'update_db_only' or 'update_all': "
    mode = input(msg)
    if mode.lower() == "init":
        setup_db()
    elif mode.lower() == "update_all":
        quotes_list = [fake.sentence() for _ in range(1, 11)]
        added = add_quotes(quotes_list)
        if added:
            print("New (fake) quotes added to the database:")
            for q in added:
                print(f"Added to DB: {q}")
                print("  - Also adding to the cache")
                cache.set(str(q[0]), q[1], ex=60)
    elif mode.lower() == "update_db_only":
        quotes_list = [fake.sentence() for _ in range(1, 11)]
        added = add_quotes(quotes_list)
        if added:
            print("New (fake) quotes added to the database ONLY:")
            for q in added:
                print(f"Added to DB: {q}")

接下来，在 ch08/cache_aside/cache_aside.py 中编写与旁路缓存相关的读取逻辑：

python 复制代码

import sqlite3
from pathlib import Path
import redis
CACHE_KEY_PREFIX = "quote"
DB_PATH = Path(__file__).parent / Path("quotes.sqlite3")
cache = redis.StrictRedis(host="localhost", port=6379, decode_responses=True)

def get_quote(quote_id: str) -> str:
    out = []
    quote = cache.get(f"{CACHE_KEY_PREFIX}.{quote_id}")
    if quote is None:
        # Get from the database
        query_fmt = "SELECT text FROM quotes WHERE id = {}"
        try:
            with sqlite3.connect(DB_PATH) as db:
                cursor = db.cursor()
                res = cursor.execute(query_fmt.format(quote_id)).fetchone()
                if not res:
                    return "There was no quote stored matching that id!"
                quote = res[0]
                out.append(f"Got '{quote}' FROM DB")
        except Exception as e:
            print(e)
            quote = ""
        # Add to the cache
        if quote:
            key = f"{CACHE_KEY_PREFIX}.{quote_id}"
            cache.set(key, quote, ex=60)
            out.append(f"Added TO CACHE, with key '{key}'")
    else:
        out.append(f"Got '{quote}' FROM CACHE")
    if out:
        return " - ".join(out)
    else:
        return ""

def main():
    while True:
        quote_id = input("Enter the ID of the quote: ")
        if quote_id.isdigit():
            out = get_quote(quote_id)
            print(out)
        else:
            print("You must enter a number. Please retry.")

测试步骤

运行 python ch08/cache_aside/populate_db.py，选择 init ，将创建 quotes.sqlite3 与 quotes 表。

再次运行 python ch08/cache_aside/populate_db.py，选择 update_all，示例输出类似：

vbnet 复制代码

Choose your mode! Enter 'init' or 'update_db_only' or 'update_all': update_all
New (fake) quotes added to the database:
Added to DB: (62, 'Instead not here public.')
- Also adding to the cache
...

或选择 update_db_only，则只向数据库写入，不写缓存：

sql 复制代码

Choose your mode! Enter 'init' or 'update_db_only' or 'update_all': update_db_only
New (fake) quotes added to the database ONLY:
Added to DB: (73, 'Whose determine group what site.')
...

运行读取程序：python ch08/cache_aside/cache_aside.py，根据提示输入 ID，如下所示：

sql 复制代码

Enter the ID of the quote: 23
Got 'Dark team exactly really wind.' FROM DB - Added TO CACHE, with key 'quote.23'
Enter the ID of the quote: 12
There was no quote stored matching that id!
Enter the ID of the quote: 43
Got 'Significant hot those think heart shake ago.' FROM DB - Added TO CACHE, with key 'quote.43'
...

当输入的 ID 仅在数据库中存在（但缓存未命中）时，程序会先从数据库读取，再立即写入缓存，之后再次查询即可走缓存返回，符合旁路缓存的预期。

提示：更新分支（写数据库并删除相应缓存键）留作练习。可以新增 update_quote()，在传入 quote_id 时更新数据库并删除缓存键，并通过命令行（例如 python cache_aside.py update）触发。

记忆化（Memoization）模式

记忆化模式 是一种关键的优化技术，通过缓存开销较大的函数调用结果来提升程序效率。它确保当一个函数以相同输入 被多次调用时，直接返回缓存结果，无需重复且昂贵的计算。

现实示例

斐波那契数列：经典示例。通过存储先前计算的值，算法避免重复计算，从而显著加速较大 n 的求值。
文本搜索算法 ：在处理海量文本的应用（搜索引擎、文档分析工具）中，缓存以前的查询结果，使相同查询能即时返回，显著改善用户体验。

适用场景

加速递归算法：记忆化可将原本时间复杂度很高的递归算法显著优化，典型如斐波那契数列求值。
降低计算开销：通过避免不必要的重复计算来节省 CPU 资源，适用于资源受限环境或大规模数据处理。
提升应用性能：直接效果是应用响应更快、效率更高，用户体验更好。

记忆化模式的实现

下面用 Python 的 functools.lru_cache 装饰器来实现记忆化------它对多次以相同参数 调用、且计算代价较高的函数尤其有效。通过缓存结果，后续相同参数的调用可直接命中缓存，显著缩短执行时间。

以斐波那契数列递归求值为例：

导入：

javascript 复制代码

from datetime import timedelta
from functools import lru_cache

递归版本（无缓存，对比用）：

kotlin 复制代码

def fibonacci_func1(n):
    if n < 2:
        return n
    return fibonacci_func1(n - 1) + fibonacci_func1(n - 2)

递归 + 记忆化（使用 lru_cache）：

python 复制代码

@lru_cache(maxsize=None)
def fibonacci_func2(n):
    if n < 2:
        return n
    return fibonacci_func2(n - 1) + fibonacci_func2(n - 2)

测试两者性能（以 n=30 为例）：

ini 复制代码

def main():
    import time
    n = 30
    start_time = time.time()
    result = fibonacci_func1(n)
    duration = timedelta(time.time() - start_time)
    print(f"Fibonacci_func1({n}) = {result}, calculated in {duration}")

    start_time = time.time()
    result = fibonacci_func2(n)
    duration = timedelta(time.time() - start_time)
    print(f"Fibonacci_func2({n}) = {result}, calculated in {duration}")

运行： python ch08/memoization.py
可能输出：

scss 复制代码

Fibonacci_func1(30) = 832040, calculated in 7:38:53.090973
Fibonacci_func2(30) = 832040, calculated in 0:00:02.760315

实际耗时会因环境而异，但带缓存的版本通常明显更快，两者差距会很大。

以上示例说明：记忆化可大幅减少计算斐波那契数所需的递归调用次数，尤其在 n 较大时效果显著。通过降低计算开销，记忆化不仅加快了计算速度，也节省系统资源，使应用更高效、更灵敏。

延迟加载（Lazy Loading）模式

延迟加载 是一种关键的软件工程设计方法，尤其适用于性能优化与资源管理。其思想是：把资源的初始化或加载推迟到真正需要的那一刻。这样一来，应用可以更高效地利用资源、缩短初始加载时间，并提升整体用户体验。

现实示例

在线艺术画廊浏览：页面不会预先加载成百上千张高清图片，而是只加载当前视口中的图片；随着滚动，更多图片无缝加载，既提升体验，又不致压垮设备的内存或网络带宽。
点播视频流媒体（如 Netflix、YouTube） ：按块（chunk）加载视频，既缩短了起播缓冲时间，又能随网络状况自适应，尽量保持连续与清晰。
Excel / Google 表格等应用：处理大数据集时，按需只加载与当前视图或操作相关的数据（某个工作表或单元格范围），显著加快操作并降低内存占用。

适用场景

降低首屏/初始加载时间：在 Web 开发中尤为重要，加载更快通常意味着更好的用户留存与参与度。
节省系统资源：在从高端桌面到入门手机的多样设备时代，按需用资源是提供一致体验的关键。
提升交互体验：用户期望迅捷、灵敏的交互；延迟加载通过减少等待让应用更"跟手"。

实现延迟加载------属性的惰性加载（lazy attribute loading）

考虑一个基于用户输入进行复杂数据分析或可视化的应用：底层计算可能非常耗时。引入延迟加载能显著改善性能。为便于演示，我们用一个模拟昂贵计算并为类属性赋值的例子。

思路：类中的某个属性在首次访问时才初始化。常用于属性初始化开销大、希望推迟到必要时刻才执行的场景。

**初始化：**先把 _data 设为 None，此时并未加载昂贵数据。

ruby 复制代码

class LazyLoadedData:
    def __init__(self):
        self._data = None

**属性访问：**用 @property 实现惰性逻辑：若 _data 为 None，则调用 load_data() 加载。

python 复制代码

    @property
    def data(self):
        if self._data is None:
            self._data = self.load_data()
        return self._data

**模拟昂贵加载：**用求和模拟；真实场景可能是远程取数、复杂计算等。

python 复制代码

    def load_data(self):
        print("Loading expensive data...")
        return sum(i * i for i in range(100000))

**测试：**首次访问触发加载，再次访问直接复用缓存值。

css 复制代码

def main():
    obj = LazyLoadedData()
    print("Object created, expensive attribute not loaded yet.")
    print("Accessing expensive attribute:")
    print(obj.data)
    print("Accessing expensive attribute again, no reloading occurs:")
    print(obj.data)

运行： python ch08/lazy_loading/lazy_attribute_loading.py
示例输出：

yaml 复制代码

Object created, expensive attribute not loaded yet.
Accessing expensive attribute:
Loading expensive data...
333328333350000
Accessing expensive attribute again, no reloading occurs:
333328333350000

可见：第一次访问才加载并缓存到 _data；后续访问直接返回，不再重复昂贵操作。对于"偶尔需要、成本很高"的数据或计算，这种延迟加载极其有效。

实现延迟加载------结合缓存（using caching）

第二个例子用递归阶乘 来模拟昂贵计算。虽然 Python 的 math.factorial 已很高效，但递归实现能很好地演示可被缓存加速的昂贵计算。我们用 lru_cache 进行缓存，这里把它视为用于延迟加载的机制。

导入：

javascript 复制代码

import time
from datetime import timedelta
from functools import lru_cache

递归阶乘（无缓存）：

python 复制代码

def recursive_factorial(n):
    """Calculate factorial (expensive for large n)"""
    if n == 1:
        return 1
    else:
        return n * recursive_factorial(n - 1)

**带缓存的阶乘：**首次计算后写入缓存，相同参数的后续调用直接命中缓存。

python 复制代码

@lru_cache(maxsize=128)
def cached_factorial(n):
    return recursive_factorial(n)

性能对比测试：

python 复制代码

def main():
    # Testing the performance
    n = 20
    # Without caching
    start_time = time.time()
    print(f"Recursive factorial of {n}: {recursive_factorial(n)}")
    duration = timedelta(time.time() - start_time)
    print(f"Calculation time without caching: {duration}.")
    # With caching
    start_time = time.time()
    print(f"Cached factorial of {n}: {cached_factorial(n)}")
    duration = timedelta(time.time() - start_time)
    print(f"Calculation time with caching: {duration}.")
    start_time = time.time()
    print(f"Cached factorial of {n}, repeated: {cached_factorial(n)}")
    duration = timedelta(time.time() - start_time)
    print(f"Second calculation time with caching: {duration}.")

运行： python ch08/lazy_loading/lazy_loading_with_caching.py
示例输出：

sql 复制代码

Recursive factorial of 20: 2432902008176640000
Calculation time without caching: 0:00:04.840851
Cached factorial of 20: 2432902008176640000
Calculation time with caching: 0:00:00.865173
Cached factorial of 20, repeated: 2432902008176640000
Second calculation time with caching: 0:00:00.350189

可以看到：无缓存时初次计算耗时较长；启用缓存后明显更快；重复调用则几乎瞬间返回。

需要说明的是：lru_cache 本质上是记忆化（memoization）工具，但也可适配到"仅在需要时才执行昂贵初始化"的场景，用以避免拖慢应用启动或常规交互。在本例中，我们用阶乘计算来模拟这类昂贵初始化过程。

如果你在思考它与"记忆化"的区别：这里的重点在于缓存被用于管理资源初始化的时机（延迟到需要时），而不只是为了减少重复计算。

小结

本章探讨了开发者可用来提升效率与可扩展性的多种模式：

旁路缓存（Cache-Aside） ：教会我们如何有效管理缓存，在动态数据源环境中优化性能与一致性。
记忆化（Memoization） ：通过缓存函数结果避免重复计算，显著加速递归与复杂计算类工作负载。
延迟加载（Lazy Loading） ：强调把初始化推迟到必须时再进行，既改善启动时延，又降低内存开销，特别适合不总是必要、但成本很高的操作。

下一章 我们将讨论分布式系统相关的模式。