布隆过滤器原理与 Redis 防穿透实战

一、为什么需要布隆过滤器？

1.1 缓存穿透问题

缓存穿透是指查询一个不存在的数据，由于缓存和数据库都没有该数据，每次请求都会穿透到数据库。
命中
未命中
无数据
每次都查
请求
缓存
返回
数据库
返回 NULL
数据库压力大/崩溃

1.2 布隆过滤器解决方案

布隆过滤器（Bloom Filter）是一种空间效率极高的概率型数据结构 ，用于判断元素一定不存在 或可能存在。

核心思想：用多个哈希函数将元素映射到一个位数组中，存在为 1，不存在为 0。

复制代码

布隆过滤器原理

元素 "user:1001" ──→ Hash1 → bit[3] = 1
              └──→ Hash2 → bit[7] = 1
              └──→ Hash3 → bit[12] = 1

查询 "user:1001" 是否存在：
  → 检查 bit[3]、bit[7]、bit[12] 是否全为 1
  → 全为 1 → 可能存在（有一定误判率）
  → 任一为 0 → 一定不存在 ✅

二、布隆过滤器原理

2.1 数据结构

布隆过滤器由两部分组成：

位数组（Bit Array）：长度为 m 的数组，每个位置存储 0 或 1
k 个哈希函数：将任意元素映射到 $0, m-1$ 范围

2.2 插入与查询流程

是
否
插入元素 X
计算 k 个哈希值
将 k 个位置设为 1
查询元素 X
计算 k 个哈希值
检查 k 个位置是否全为 1
可能存在
一定不存在

2.3 误判率推导

布隆过滤器的误判率公式：

p=(1−e−kn/m)kp = (1 - e^{-kn/m})^kp=(1−e−kn/m)k

其中：

m：位数组长度
k：哈希函数个数
n：已插入元素个数

最优哈希函数个数：

kopt=mnln⁡2k_{opt} = \frac{m}{n} \ln 2kopt=nmln2

最优位数组长度（给定 n 和期望误判率 p）：

mopt=−nln⁡p(ln⁡2)2m_{opt} = -\frac{n \ln p}{(\ln 2)^2}mopt=−(ln2)2nlnp

2.4 误判率与空间权衡

n（元素数）	m/n（位/元素）	误判率 p
100万	10	1.4%
100万	13.8	0.1%
100万	20	0.0001%
1000万	10	1.4%
1000万	13.8	0.1%

结论：m/n ≈ 10 时性价比最高，误判率约 1%。

三、Guava 布隆过滤器

3.1 引入依赖

xml 复制代码

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.3-jre</version>
</dependency>

3.2 基本使用

java 复制代码

public class GuavaBloomFilterDemo {
    
    public static void main(String[] args) {
        // 预计插入 100 万条数据，期望误判率 0.01%
        BloomFilter<Integer> filter = BloomFilter.create(
            Funnels.integerFunnel(),
            1_000_000,      // 预计插入数量
            0.01            // 期望误判率
        );
        
        // 插入元素
        for (int i = 0; i < 1_000_000; i++) {
            filter.put(i);
        }
        
        // 查询元素
        System.out.println(filter.mightContain(1));        // true
        System.out.println(filter.mightContain(999_999));  // true
        System.out.println(filter.mightContain(1_000_000)); // false 或 true（误判）
        
        // 实际误判率统计
        int falsePositives = 0;
        for (int i = 1_000_000; i < 2_000_000; i++) {
            if (filter.mightContain(i)) {
                falsePositives++;
            }
        }
        System.out.println("实际误判数: " + falsePositives);
    }
}

3.3 自定义哈希函数

java 复制代码

// 使用自定义 Funnel 指定序列化方式
BloomFilter<User> userFilter = BloomFilter.create(
    Funnels.jsonFunnel(),  // 自定义 JSON Funnel
    1_000_000,
    0.01
);

// Funnel 示例
public class User {
    private Long userId;
    
    public Funnel<User> funnel() {
        return (user, out) -> 
            out.putLong(user.getUserId());
    }
}

四、Redis 布隆过滤器

4.1 Redis Module 实现

Redis 4.0+ 支持布隆过滤器模块（redis-bloom）：

bash 复制代码

# 添加元素
BF.ADD myfilter item1

# 判断存在
BF.EXISTS myfilter item1

# 批量添加
BF.MADD myfilter item2 item3

# 批量判断
BF.MEXISTS myfilter item2 item4

# 自定义参数（需提前创建）
BF.RESERVE myfilter 0.01 100000
# 参数：key 误判率 预计元素数

4.2 RedisBloom Java 客户端

java 复制代码

@Configuration
public class RedisBloomConfig {
    
    @Bean
    public RedissonClient redissonClient() {
        return Redisson.create();
    }
    
    @Bean
    public RBloomFilter<String> userBloomFilter(RedissonClient redissonClient) {
        RBloomFilter<String> filter = redissonClient.getBloomFilter("user:bloom");
        // 初始化：预计 1000 万条，误判率 1%
        filter.tryInit(10_000_000, 0.01);
        return filter;
    }
}

4.3 Spring Boot 集成

java 复制代码

@Service
public class UserService {
    
    @Autowired
    private RBloomFilter<String> userBloomFilter;
    
    @Autowired
    private StringRedisTemplate redisTemplate;
    
    @Autowired
    private UserMapper userMapper;
    
    private static final String USER_CACHE_PREFIX = "user:cache:";
    
    public User getUser(Long userId) {
        String key = USER_CACHE_PREFIX + userId;
        String cacheValue = redisTemplate.opsForValue().get(key);
        
        if (cacheValue != null) {
            return JSON.parseObject(cacheValue, User.class);
        }
        
        // ⭐ 布隆过滤器防穿透
        String userIdStr = String.valueOf(userId);
        if (!userBloomFilter.mightExist(userIdStr)) {
            System.out.println("布隆过滤器判断不存在，直接返回空，避免查库");
            return null;
        }
        
        // 查数据库
        User user = userMapper.selectById(userId);
        
        if (user != null) {
            redisTemplate.opsForValue().set(key, JSON.toJSONString(user), 
                30, TimeUnit.MINUTES);
        } else {
            // ⭐ 空值缓存，防止穿透
            redisTemplate.opsForValue().set(key, "NULL", 
                5, TimeUnit.MINUTES);
        }
        
        return user;
    }
}

五、布隆过滤器的删除问题

5.1 布谷鸟过滤器（Cuckoo Filter）

布隆过滤器不支持删除，删除可能导致误判。使用布谷鸟过滤器可以解决此问题：

bash 复制代码

# Redis 布谷鸟过滤器
CF.ADD mycuckoofilter item1
CF.DEL mycuckoofilter item1
CF.EXISTS mycuckoofilter item1

5.2 计数布隆过滤器

java 复制代码

// 手动实现计数布隆过滤器
public class CountingBloomFilter<K> {
    private final int[] bitArray;
    private final int size;
    private final Funnel<K> funnel;
    private final int hashCount;
    
    public CountingBloomFilter(int size, int hashCount, Funnel<K> funnel) {
        this.size = size;
        this.hashCount = hashCount;
        this.funnel = funnel;
        this.bitArray = new int[size];  // 用 int 数组替代 bit 数组
    }
    
    public void put(K element) {
        for (int i = 0; i < hashCount; i++) {
            int index = hash(element, i);
            bitArray[index]++;  // ⭐ 计数而非置 1
        }
    }
    
    public boolean mightContain(K element) {
        for (int i = 0; i < hashCount; i++) {
            int index = hash(element, i);
            if (bitArray[index] == 0) {
                return false;
            }
        }
        return true;
    }
    
    public boolean delete(K element) {
        if (!mightContain(element)) {
            return false;
        }
        for (int i = 0; i < hashCount; i++) {
            int index = hash(element, i);
            bitArray[index]--;
        }
        return true;
    }
    
    private int hash(K element, int seed) {
        // MurmurHash 实现
        return Math.abs(Hashing.murmur3_128(seed).hashObject(element, funnel).asInt()) % size;
    }
}

六、综合防穿透方案

6.1 三层防护架构

存在但缓存未命中
未命中
请求
第1层：布隆过滤器

内存级，100ms 内响应

不存在 → 直接返回
第2层：Redis 空值缓存

TTL 短，防止雪崩
第3层：数据库查询

兜底

6.2 完整代码实现

java 复制代码

@Service
@Slf4j
public class ProductService {
    
    @Autowired
    private RBloomFilter<String> productBloomFilter;
    
    @Autowired
    private RedisTemplate<String, Object> redisTemplate;
    
    @Autowired
    private ProductMapper productMapper;
    
    // 布隆过滤器预热
    @PostConstruct
    public void warmUpBloomFilter() {
        log.info("开始预热布隆过滤器...");
        List<Product> allProducts = productMapper.selectList(
            new LambdaQueryWrapper<Product>().select(Product::getId)
        );
        for (Product product : allProducts) {
            productBloomFilter.add(String.valueOf(product.getId()));
        }
        log.info("布隆过滤器预热完成，共 {} 条记录", allProducts.size());
    }
    
    public Product getProduct(Long productId) {
        String cacheKey = "product:" + productId;
        String idStr = String.valueOf(productId);
        
        // 1. 查缓存
        Object cached = redisTemplate.opsForValue().get(cacheKey);
        if (cached != null) {
            if ("NULL".equals(cached)) {
                return null;
            }
            return JSON.parseObject(cached.toString(), Product.class);
        }
        
        // 2. 布隆过滤器判断（⭐ 核心防穿透逻辑）
        if (!productBloomFilter.mightExist(idStr)) {
            log.info("BloomFilter 判断 productId={} 不存在，直接返回空", productId);
            // 空值缓存，TTL 短一些
            redisTemplate.opsForValue().set(cacheKey, "NULL", 
                30, TimeUnit.SECONDS);
            return null;
        }
        
        // 3. 查数据库
        Product product = productMapper.selectById(productId);
        
        if (product != null) {
            redisTemplate.opsForValue().set(cacheKey, JSON.toJSONString(product),
                1, TimeUnit.HOURS);
        } else {
            // 空值缓存，防止穿透
            redisTemplate.opsForValue().set(cacheKey, "NULL",
                1, TimeUnit.MINUTES);
        }
        
        return product;
    }
}

七、避坑指南

7.1 布隆过滤器误判率配置

场景	预计数据量	误判率建议
用户 ID 过滤	1000万	0.01%
商品 ID 过滤	100万	0.1%
日志去重	1亿	1%
URL 去重	1000万	0.01%

7.2 布隆过滤器容量预估

java 复制代码

// 容量预估工具
public class BloomFilterCalculator {
    
    public static void main(String[] args) {
        // 给定期望误判率，计算最优容量
        double fpp = 0.01;  // 1% 误判率
        long expectedInsertions = 1_000_000;
        
        // 使用 Guava 计算
        long optimalNumBits = BloomFilterHelpers.optimalNumBits(
            expectedInsertions, fpp);
        int optimalNumHashFunctions = BloomFilterHelpers.optimalNumHashFunctions(
            expectedInsertions, optimalNumBits);
        
        System.out.println("最优位数: " + optimalNumBits);
        System.out.println("最优哈希函数数: " + optimalNumHashFunctions);
        System.out.println("占用内存: " + (optimalNumBits / 8 / 1024 / 1024) + " MB");
    }
}

7.3 常见错误

java 复制代码

// ❌ 错误1：布隆过滤器未初始化就使用
RBloomFilter<String> filter = redissonClient.getBloomFilter("test");
// 未调用 tryInit()，直接使用会报错

// ✅ 正确做法：提前初始化
filter.tryInit(1000, 0.01);

// ❌ 错误2：使用 String 作为 key 但内容不同导致误判
// "1001" 和 "01001" 可能落在不同的哈希位置
// ✅ 统一使用固定格式的字符串 key

// ❌ 错误3：布隆过滤器满了导致误判率飙升
// 定期重建或设置合理的容量预估

八、总结

布隆过滤器
✅ 优势

空间效率极高/查询快/O(k)

内存级方案优于 DB 过滤
❌ 局限

不支持删除/有误判率

不支持按范围查询
🎯 适用场景

缓存穿透防护/爬虫 URL 去重

垃圾邮件过滤/推荐去重

布隆过滤器是解决缓存穿透的利器，配合 Redis 分布式部署和空值缓存，可以构建完整的防穿透体系。核心在于根据数据量合理预估容量和误判率，在空间和准确性之间取得平衡。