Python LRU缓存应用与示例

文章目录

一、什么是LRU
二、核心实现
- [1. 数据结构](#1. 数据结构)
- [2. 关键方法](#2. 关键方法)
三、使用示例
- [1. 基本用法](#1. 基本用法)
- [2. 适用场景](#2. 适用场景)
四、优化建议
- [1. 线程安全改进](#1. 线程安全改进)
- [2. 缓存命中率统计](#2. 缓存命中率统计)
- [3. 支持 TTL](#3. 支持 TTL)
五、总结
- [1. 优点](#1. 优点)
- [2. 适用场景](#2. 适用场景)
- [3. 不适用场景](#3. 不适用场景)

一、什么是LRU

LRU（Least Recently Used，最近最少使用）是一种常用的缓存淘汰算法，用于在缓存空间不足时决定哪些数据应该被移除。

核心思想

如果一个数据最近被访问过，那么它将来被访问的概率也更高。因此，当缓存空间不足时，应该优先淘汰最久未被访问的数据。

工作原理

访问数据时

如果数据在缓存中（缓存命中），则将该数据标记为"最近使用"，并移动到缓存的最前面（或最后面，取决于实现）。

如果数据不在缓存中（缓存未命中），则从原始数据源加载。

缓存满时

需要插入新数据时，移除最久未被访问的数据（即LRU数据），

然后插入新数据到最新位置。

主要特性

固定容量：限制缓存大小，防止内存无限增长。

自动淘汰机制：当缓存满时，移除最旧的条目。

快速访问：get() 和 put() 操作的时间复杂度均为 O(1)。

保持访问顺序：每次访问或更新缓存条目时，会将其移至最新位置。

二、核心实现

1. 数据结构

使用 OrderedDict 存储键值对，并维护访问顺序：

最新访问的条目位于字典的末尾。
最久未访问的条目位于字典的开头。

2. 关键方法

__init__(self, capacity)

初始化缓存，设置最大容量。

参数： capacity (int)：缓存的最大条目数。

示例：

python 复制代码

cache = LatestCache(1000)  # 最大存储 1000 个条目

get(self, key)

获取缓存中的值，如果不存在则返回 None。

参数： key：要查询的键。

返回值：如果存在，返回对应的值；否则返回 None。

示例：

python 复制代码

value = cache.get("some_key")

put(self, key, value)

向缓存中添加或更新键值对。

参数： key：要存储的键； value：要存储的值。

行为：如果 key 已存在，更新其值并移至最新位置；如果缓存已满，移除最旧的条目。

示例：

python 复制代码

cache.put("some_key", "some_value")

三、使用示例

1. 基本用法

python 复制代码

from collections import OrderedDict

class LatestCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return None
        self.cache.move_to_end(key)  # 移至最新位置
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)  # 更新时移至最新位置
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最旧的条目


# 初始化缓存
cache = LatestCache(3)

# 添加数据
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)

# 查询数据
print(cache.get("a"))  # 输出: 1

# 缓存满时自动淘汰
cache.put("d", 4)      # 淘汰最久未访问的键 "b"
print(cache.get("b"))  # 输出: None（已被淘汰）

2. 适用场景

高频读取、低频写入：如配置缓存、静态数据缓存。
减少重复计算：如函数结果缓存。
优化数据库/API 查询：缓存热点数据，减少 IO 开销。

四、优化建议

1. 线程安全改进

当前实现非线程安全，多线程环境下可能导致数据竞争。可引入 threading.RLock 加锁：

python 复制代码

from threading import RLock

class LatestCache:
    def __init__(self, capacity):
        self._lock = RLock()
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        with self._lock:
            if key not in self.cache:
                return None
            self.cache.move_to_end(key)
            return self.cache[key]

    def put(self, key, value):
        with self._lock:
            if key in self.cache:
                self.cache.move_to_end(key)
            self.cache[key] = value
            if len(self.cache) > self.capacity:
                self.cache.popitem(last=False)

2. 缓存命中率统计

增加 hits 和 misses 统计，评估缓存效率：

hits: 记录成功从缓存中获取数据的次数

misses: 记录未能从缓存中获取数据的次数

cache: 使用OrderedDict实现的缓存存储，保持键的插入顺序

capacity: 缓存的最大容量

python 复制代码

from threading import RLock
from collections import OrderedDict


class LatestCache:
    def __init__(self, capacity):
        self._lock = RLock()
        self.hits = 0
        self.misses = 0
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        with self._lock:
            if key in self.cache:
                self.hits += 1
                self.cache.move_to_end(key)
                return self.cache[key]
            self.misses += 1
            return None

    def put(self, key, value):
        with self._lock:
            if key in self.cache:
                self.cache.move_to_end(key)
            self.cache[key] = value
            if len(self.cache) > self.capacity:
                self.cache.popitem(last=False)

    def hit_rate(self):
        with self._lock:
            total = self.hits + self.misses
            return (self.hits / total) if total > 0 else 0.0


# 初始化缓存
cache = LatestCache(3)

# 添加数据
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)

# 查询数据
print(cache.get("a"))  # 命中，输出: 1
print(cache.get("b"))  # 命中，输出: 2
print(cache.get("a"))  # 命中，输出: 1
print(cache.get("x"))  # 未命中，输出: None

# 缓存满时自动淘汰
cache.put("d", 4)  # 淘汰最久未访问的键 "c"
print(cache.get("c"))  # 未命中(已被淘汰)，输出: None

# 查看命中率统计
print(f"命中次数: {cache.hits}")  # 输出: 3 (aba)
print(f"未命中次数: {cache.misses}")  # 输出: 2 (xc)
print(f"命中率: {cache.hit_rate():.2%}")  # 输出: 60.00% (3命中/(3命中+2未命中))

3. 支持 TTL

TTL（Time To Live）是数据在缓存中存活的生存时间，过期后自动失效。

python 复制代码

from collections import OrderedDict
import time
import random


class LatestCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return None
        value, expire_time = self.cache[key]
        if expire_time and time.time() > expire_time:
            del self.cache[key]  # 自动清理过期数据
            return None
        self.cache.move_to_end(key)  # 更新为最近使用
        return value

    def put(self, key, value, ttl=None):
        expire_time = time.time() + ttl if ttl else None
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = (value, expire_time)
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最久未使用的


# 初始化缓存（容量为3）
cache = LatestCache(3)

# 添加数据（带TTL和不带TTL的混合）
cache.put("a", 1, ttl=2)  # 2秒后过期
cache.put("b", 2)  # 永不过期
cache.put("c", 3, ttl=4)  # 4秒后过期

# 立即查询（全部命中）
print(f"初始查询: a={cache.get('a')}, b={cache.get('b')}, c={cache.get('c')}")
# 输出: 初始查询: a=1, b=2, c=3

# 模拟2秒后（'a'已过期）
print("等待2秒后...")
time.sleep(2)

print(f"查询: a={cache.get('a')}, b={cache.get('b')}, c={cache.get('c')}")
# 输出: 查询: a=None , b=2, c=3

五、总结

1. 优点

简单高效：基于 OrderedDict，get() 和 put() 均为 O(1) 时间复杂度。

自动淘汰：LRU 策略防止内存无限增长。

易于扩展：可增加 TTL、线程安全、命中统计等功能。

2. 适用场景

Web 应用：缓存 API 响应、数据库查询结果。

计算密集型任务：缓存中间计算结果，避免重复计算。

配置管理：缓存频繁读取的配置数据。

3. 不适用场景

强一致性要求：缓存可能导致数据短暂不一致，如缓存更新延迟、缓存失效策略、分布式环境同步等。

超大数据集：单机内存有限，可改用 Redis 等分布式缓存。