Redis 中的概率过滤器：布隆过滤器与布谷鸟过滤器实战对比

作为 Java 开发者，你是否曾经在处理大规模数据时遇到过这样的困扰：如何快速判断一个元素是否存在于海量数据集中？数据库查询太慢，内存存储成本太高。尤其是在面对缓存穿透问题时，系统性能直线下降，CPU 使用率飙升。这种痛点几乎是所有高并发系统必经之路。Redis 提供的两种概率型过滤器------布隆过滤器和布谷鸟过滤器，正是解决这类问题的强大工具。

概率型过滤器：以空间换时间

概率型过滤器是一种特殊的数据结构，它用极小的内存空间完成元素存在性的快速判断。不过，这种高效是有代价的------它们只能给出"可能存在"或"一定不存在"的判断，存在一定的误判率。

什么场景适合使用概率型过滤器？

缓存穿透防护
爬虫 URL 去重
垃圾邮件过滤
大数据集合的快速判重

布隆过滤器：经典版

工作原理

布隆过滤器本质上是一个很长的二进制向量和一系列随机映射函数（哈希函数）。

graph LR A[输入元素] --> B[哈希函数1] A --> C[哈希函数2] A --> D[哈希函数3] B --> E[位数组] C --> E D --> E E --> F[判断元素是否存在]

当插入一个元素时：

使用多个哈希函数计算出多个哈希值
将位数组中对应位置设为 1

当查询一个元素时：

使用相同的哈希函数计算哈希值
检查位数组中对应位置是否都为 1

如果有任一位为 0，则元素一定不存在
如果全部为 1，则元素可能存在（存在误判可能）

举个简单例子：想象一个 8 位长的位数组，初始全为 0。我们有三个哈希函数，对单词"hello"计算后分别得到位置 1、4、7。插入时，我们将这三个位置设为 1。当查询"hello"时，我们检查位置 1、4、7 是否都为 1；当查询"world"时，如果它的哈希位置中任一位为 0，我们就能确定它一定不在集合中。

布隆过滤器的数学原理

布隆过滤器的假阳性率可以通过以下公式计算： P = (1 - e^(-kn/m))^k

其中：

k: 哈希函数数量
n: 插入的元素数量
m: 位数组长度
e: 自然对数的底数

最佳哈希函数数量 k = (m/n) * ln2，此时假阳性率最低。

Redis 实现布隆过滤器

注意：使用 Redis 布隆过滤器前，需要先安装 RedisBloom 模块。

安装 RedisBloom 模块：

bash 复制代码

# 方法1：使用Docker
docker run -p 6379:6379 redislabs/rebloom

# 方法2：手动编译安装
git clone https://github.com/RedisBloom/RedisBloom.git
cd RedisBloom
make
redis-server --loadmodule ./rebloom.so

在 Java 中使用 Jedis 客户端操作布隆过滤器：

java 复制代码

// 首先确保Redis服务器已安装RedisBloom模块
public class BloomFilterExample {
    private final JedisPool jedisPool;

    public BloomFilterExample(String host, int port) {
        this.jedisPool = new JedisPool(host, port);
    }

    public void addToFilter(String key, String value) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 检查过滤器是否存在，不存在则创建
            // BF.RESERVE创建布隆过滤器，0.01表示目标误判率，10000表示初始容量
            if (!jedis.exists(key)) {
                jedis.sendCommand(Command.valueOf("BF.RESERVE"),
                                  key,
                                  "0.01", // 目标误判率
                                  "10000" // 初始容量
                                 );
            }
            // 添加元素
            jedis.sendCommand(Command.valueOf("BF.ADD"), key, value);
        } catch (Exception e) {
            // 处理异常
            e.printStackTrace();
        }
    }

    public boolean exists(String key, String value) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 检查元素是否存在
            Object result = jedis.sendCommand(Command.valueOf("BF.EXISTS"), key, value);
            return result != null && ((Long) result) == 1;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
    }

    // 批量添加和检查方法
    public void addBatch(String key, String... values) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 参数拼接：key + values数组
            String[] params = new String[values.length + 1];
            params[0] = key;
            System.arraycopy(values, 0, params, 1, values.length);
            jedis.sendCommand(Command.valueOf("BF.MADD"), params);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void close() {
        if (jedisPool != null) {
            jedisPool.close();
        }
    }
}

布隆过滤器的优缺点

优点：

空间效率极高（约 9.6 bits/元素，误判率 0.01 时）
插入和查询时间复杂度为 O(k)，k 为哈希函数个数
不存在假反例（不会说不存在但实际存在）

缺点：

存在假阳性（可能说存在但实际不存在）
无法删除元素（删除会影响其他元素判断）
当负载因子(n/m)增大时，误判率急剧上升（n 为元素数量，m 为位数组长度）

布谷鸟过滤器：新一代概率型过滤器

布谷鸟过滤器是 2014 年提出的新型概率型过滤器，Redis 在 5.0 版本后通过 RedisBloom 模块也提供了支持。

工作原理

布谷鸟过滤器基于布谷鸟哈希表，使用两个哈希函数和一个指纹数组。

graph TD A[输入元素] --> B[计算元素的指纹] A --> C[哈希函数1] A --> D[哈希函数2] B --> E[存储元素的指纹] C --> E D --> E E --> F[查找元素]

打个比方，布谷鸟过滤器就像一个公寓楼：每个新住户（元素）有两个可能的房间选择。如果两个房间都有人住，新住户会"踢走"其中一个，被踢走的人再找自己的另一个可能房间，以此类推。这种机制确保每个元素都能找到家，同时保持较高的空间利用率。

当插入元素时：

计算元素的指纹（fingerprint）：通常是元素哈希值的后几位（如 8 位）
使用两个哈希函数计算两个可能位置
如果任一位置为空，则存储指纹
如果都不为空，踢出其中一个，被踢出的元素重新计算位置（类似布谷鸟喂养行为）

当查询元素时：

计算元素的指纹和两个可能位置
检查这两个位置是否包含该指纹

如有任一位置包含，则元素可能存在
如两个位置都不包含，则元素一定不存在

Redis 实现布谷鸟过滤器

java 复制代码

public class CuckooFilterExample {
    private final JedisPool jedisPool;

    public CuckooFilterExample(String host, int port) {
        this.jedisPool = new JedisPool(host, port);
    }

    public void createFilter(String key, int capacity) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 检查过滤器是否存在，不存在才创建（幂等操作）
            if (!jedis.exists(key)) {
                // 创建布谷鸟过滤器
                // capacity参数表示期望的元素数量，而非内存大小
                // 默认错误率为0.01，可选指定：CF.RESERVE key capacity [error_rate] [bucket_size] [max_kicks]
                jedis.sendCommand(Command.valueOf("CF.RESERVE"),
                                key,
                                String.valueOf(capacity));
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public boolean addItem(String key, String item) {
        try (Jedis jedis = jedisPool.getResource()) {
            Object result = jedis.sendCommand(Command.valueOf("CF.ADD"), key, item);
            // 返回1表示插入成功，0表示插入失败（可能因达到最大踢出次数）
            return result != null && ((Long) result) == 1;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
    }

    public boolean existsItem(String key, String item) {
        try (Jedis jedis = jedisPool.getResource()) {
            Object result = jedis.sendCommand(Command.valueOf("CF.EXISTS"), key, item);
            return result != null && ((Long) result) == 1;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
    }

    public boolean deleteItem(String key, String item) {
        try (Jedis jedis = jedisPool.getResource()) {
            Object result = jedis.sendCommand(Command.valueOf("CF.DEL"), key, item);
            // 返回1表示删除成功，0表示元素不存在
            return result != null && ((Long) result) == 1;
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }
    }

    // 处理插入失败的重试逻辑（生产环境推荐）
    public boolean addItemWithRetry(String key, String item, int maxRetries) {
        int tries = 0;
        while (tries < maxRetries) {
            boolean success = addItem(key, item);
            if (success) {
                return true;
            }
            tries++;
            // 插入失败，可能需要扩容过滤器
            // 实际生产中可考虑创建新的更大容量过滤器并迁移数据
        }
        return false;
    }

    public void close() {
        if (jedisPool != null) {
            jedisPool.close();
        }
    }
}

布谷鸟过滤器的优缺点

优点：

支持动态删除元素
较低的假阳性率（理论上为 1/2^f，其中 f 为指纹长度，默认为 8 位，约 0.4%）
更好的空间利用率（尤其是高装载因子时）
在大多数负载下，假阳性率相对稳定

缺点：

实现复杂度高于布隆过滤器
插入操作可能失败（当循环踢出次数超过阈值，默认 500 次）
内存占用略高于同误判率的布隆过滤器（约 12 位/元素，比布隆多 25%左右）

过滤器扩容策略

当过滤器接近容量上限时，需要考虑扩容。下面是两种过滤器的扩容方案：

布隆过滤器扩容

java 复制代码

public void expandBloomFilter(String key, int newCapacity) {
    String tempKey = key + ":new";
    try (Jedis jedis = jedisPool.getResource()) {
        // 1. 创建新的更大容量过滤器
        jedis.sendCommand(Command.valueOf("BF.RESERVE"),
                          tempKey,
                          "0.01",
                          String.valueOf(newCapacity));

        // 2. 获取需要迁移的元素（实际项目中通常从数据源重新加载）
        // 注意：布隆过滤器无法提取已添加的元素，需要从外部数据源获取
        List<String> elements = getAllElementsFromDataSource();

        // 3. 批量添加到新过滤器
        for (int i = 0; i < elements.size(); i += 1000) {
            int end = Math.min(i + 1000, elements.size());
            List<String> batch = elements.subList(i, end);
            addBatch(tempKey, batch.toArray(new String[0]));
        }

        // 4. 原子替换旧过滤器
        jedis.rename(tempKey, key);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

布谷鸟过滤器扩容

java 复制代码

public void expandCuckooFilter(String key, int newCapacity) {
    String tempKey = key + ":new";
    try (Jedis jedis = jedisPool.getResource()) {
        // 1. 创建新的更大容量过滤器
        jedis.sendCommand(Command.valueOf("CF.RESERVE"),
                          tempKey,
                          String.valueOf(newCapacity));

        // 2. 获取需要迁移的元素（实际项目中通常从数据源重新加载）
        List<String> elements = getAllElementsFromDataSource();

        // 3. 批量添加到新过滤器（布谷鸟过滤器没有批量添加命令，需要循环）
        for (String element : elements) {
            jedis.sendCommand(Command.valueOf("CF.ADD"), tempKey, element);
        }

        // 4. 原子替换旧过滤器
        jedis.rename(tempKey, key);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

// 从数据源获取所有元素的方法（示例）
private List<String> getAllElementsFromDataSource() {
    // 实际实现根据你的数据存储方式
    // 可能是从数据库、文件或其他服务获取
    return new ArrayList<>();
}

实战案例：缓存穿透防护

下面是一个在 Spring Boot 项目中使用布隆过滤器防止缓存穿透的例子：

java 复制代码

@Service
public class ProductService {
    private final RedisTemplate<String, Object> redisTemplate;
    private final BloomFilterExample bloomFilter;
    private final ProductRepository productRepository;
    private final RLock lock; // 使用Redisson的分布式锁

    private static final String BLOOM_FILTER_KEY = "product:exists:filter";
    private static final String CACHE_KEY_PREFIX = "product:";

    public ProductService(RedisTemplate<String, Object> redisTemplate,
                          BloomFilterExample bloomFilter,
                          ProductRepository productRepository,
                          RedissonClient redissonClient) {
        this.redisTemplate = redisTemplate;
        this.bloomFilter = bloomFilter;
        this.productRepository = productRepository;
        this.lock = redissonClient.getLock("product:lock");

        // 系统启动时，将数据库中所有商品ID加入布隆过滤器
        initBloomFilter();
    }

    private void initBloomFilter() {
        List<Long> allProductIds = productRepository.findAllProductIds();
        for (Long id : allProductIds) {
            bloomFilter.addToFilter(BLOOM_FILTER_KEY, String.valueOf(id));
        }
    }

    public Product getProductById(Long id) {
        String cacheKey = CACHE_KEY_PREFIX + id;

        // 1. 先查询Redis缓存
        Product product = (Product) redisTemplate.opsForValue().get(cacheKey);
        if (product != null) {
            return product;
        }

        // 2. 缓存未命中，使用布隆过滤器判断商品ID是否存在
        boolean mayExist = bloomFilter.exists(BLOOM_FILTER_KEY, String.valueOf(id));
        if (!mayExist) {
            // 布隆过滤器说不存在，则一定不存在，直接返回null
            return null;
        }

        // 3. 布隆过滤器说可能存在，添加分布式锁防止缓存击穿
        try {
            // 尝试获取锁，等待500ms，持有锁10秒
            if (lock.tryLock(500, 10000, TimeUnit.MILLISECONDS)) {
                try {
                    // 双重检查，可能其他线程已经加载过缓存
                    product = (Product) redisTemplate.opsForValue().get(cacheKey);
                    if (product != null) {
                        return product;
                    }

                    // 查询数据库
                    product = productRepository.findById(id).orElse(null);

                    // 将查询结果放入缓存
                    if (product != null) {
                        // 正常数据缓存1小时
                        redisTemplate.opsForValue().set(cacheKey, product, 1, TimeUnit.HOURS);
                    } else {
                        // 缓存空对象，防止缓存穿透，但过期时间较短（5分钟）
                        redisTemplate.opsForValue().set(cacheKey, new EmptyProduct(), 5, TimeUnit.MINUTES);
                    }
                } finally {
                    lock.unlock(); // 确保锁释放
                }
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            // 处理中断异常
        }

        return product;
    }

    // 新增商品时，同时更新布隆过滤器
    public Product addProduct(Product product) {
        Product saved = productRepository.save(product);
        bloomFilter.addToFilter(BLOOM_FILTER_KEY, String.valueOf(saved.getId()));
        return saved;
    }

    // 空对象标记类，用于缓存null结果
    private static class EmptyProduct extends Product {
        // 空实现
    }
}

实战案例：网站爬虫 URL 去重

使用布谷鸟过滤器实现爬虫 URL 去重：

java 复制代码

@Component
public class WebCrawler {
    private final CuckooFilterExample cuckooFilter;
    private final ExecutorService executorService;
    private final Queue<String> urlQueue = new ConcurrentLinkedQueue<>();

    private static final String CRAWLED_URLS_FILTER = "crawler:urls";

    public WebCrawler(CuckooFilterExample cuckooFilter) {
        this.cuckooFilter = cuckooFilter;
        this.executorService = Executors.newFixedThreadPool(10);

        // 初始化布谷鸟过滤器，设置容量为100万URL
        this.cuckooFilter.createFilter(CRAWLED_URLS_FILTER, 1_000_000);
    }

    public void addSeed(String url) {
        if (!cuckooFilter.existsItem(CRAWLED_URLS_FILTER, url)) {
            urlQueue.add(url);
        }
    }

    public void start() {
        for (int i = 0; i < 10; i++) {
            executorService.submit(this::crawlTask);
        }
    }

    private void crawlTask() {
        while (!Thread.currentThread().isInterrupted()) {
            String url = urlQueue.poll();
            if (url == null) {
                try {
                    Thread.sleep(1000); // 没有URL时等待
                    continue;
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    break;
                }
            }

            // 再次检查URL是否已被爬取（可能其他线程已处理）
            if (cuckooFilter.existsItem(CRAWLED_URLS_FILTER, url)) {
                continue;
            }

            try {
                // 标记URL已被处理
                // 使用带重试的方法，确保添加成功
                boolean added = cuckooFilter.addItemWithRetry(CRAWLED_URLS_FILTER, url, 3);
                if (!added) {
                    // 添加失败，可能是过滤器已满，跳过此URL
                    continue;
                }

                // 执行爬取逻辑
                Document doc = Jsoup.connect(url)
                        .userAgent("Mozilla/5.0")
                        .timeout(5000)
                        .get();

                // 解析页面并获取新链接
                Elements links = doc.select("a[href]");
                for (Element link : links) {
                    String nextUrl = link.absUrl("href");
                    if (!nextUrl.isEmpty() && !cuckooFilter.existsItem(CRAWLED_URLS_FILTER, nextUrl)) {
                        urlQueue.add(nextUrl);
                    }
                }

                // 处理爬取的内容...
                processContent(url, doc);

            } catch (Exception e) {
                // 爬取失败，可以选择重试或者放弃
                System.err.println("爬取URL失败: " + url + " - " + e.getMessage());
            }
        }
    }

    private void processContent(String url, Document document) {
        // 处理爬取的内容，如提取信息、保存数据等
    }

    public void shutdown() {
        executorService.shutdown();
        try {
            if (!executorService.awaitTermination(30, TimeUnit.SECONDS)) {
                executorService.shutdownNow();
            }
        } catch (InterruptedException e) {
            executorService.shutdownNow();
            Thread.currentThread().interrupt();
        }
    }
}

布隆过滤器与布谷鸟过滤器的性能对比

让我们通过一个简单的 Java 测试来对比两者的性能：

java 复制代码

public class FilterBenchmark {
    private static final int ITEMS_COUNT = 1_000_000;
    private static final int QUERY_COUNT = 100_000;
    private final BloomFilterExample bloomFilter;
    private final CuckooFilterExample cuckooFilter;
    private final String bloomKey = "benchmark:bloom";
    private final String cuckooKey = "benchmark:cuckoo";

    public FilterBenchmark(String redisHost, int redisPort) {
        this.bloomFilter = new BloomFilterExample(redisHost, redisPort);
        this.cuckooFilter = new CuckooFilterExample(redisHost, redisPort);

        // 初始化过滤器
        try (Jedis jedis = new Jedis(redisHost, redisPort)) {
            jedis.sendCommand(Command.valueOf("BF.RESERVE"), bloomKey, "0.01", String.valueOf(ITEMS_COUNT));
            jedis.sendCommand(Command.valueOf("CF.RESERVE"), cuckooKey, String.valueOf(ITEMS_COUNT));
        }
    }

    public void runBenchmark() {
        System.out.println("开始性能测试，插入: " + ITEMS_COUNT + "个元素，查询: " + QUERY_COUNT + "个元素");

        // 生成测试数据
        List<String> itemsToInsert = new ArrayList<>(ITEMS_COUNT);
        for (int i = 0; i < ITEMS_COUNT; i++) {
            itemsToInsert.add(UUID.randomUUID().toString());
        }

        // 用HashSet存储插入的元素，提高查找效率
        Set<String> insertedItems = new HashSet<>(itemsToInsert);

        // 测试布隆过滤器插入性能
        long bloomInsertStart = System.currentTimeMillis();
        for (String item : itemsToInsert) {
            bloomFilter.addToFilter(bloomKey, item);
        }
        long bloomInsertTime = System.currentTimeMillis() - bloomInsertStart;

        // 测试布谷鸟过滤器插入性能
        long cuckooInsertStart = System.currentTimeMillis();
        for (String item : itemsToInsert) {
            cuckooFilter.addItem(cuckooKey, item);
        }
        long cuckooInsertTime = System.currentTimeMillis() - cuckooInsertStart;

        // 生成查询数据（80%存在，20%不存在）
        List<String> itemsToQuery = new ArrayList<>(QUERY_COUNT);
        for (int i = 0; i < QUERY_COUNT * 0.8; i++) {
            itemsToQuery.add(itemsToInsert.get(new Random().nextInt(ITEMS_COUNT)));
        }
        for (int i = 0; i < QUERY_COUNT * 0.2; i++) {
            itemsToQuery.add(UUID.randomUUID().toString());
        }
        Collections.shuffle(itemsToQuery);

        // 测试布隆过滤器查询性能
        int bloomFalsePositives = 0;
        long bloomQueryStart = System.currentTimeMillis();
        for (String item : itemsToQuery) {
            boolean exists = bloomFilter.exists(bloomKey, item);
            if (exists && !insertedItems.contains(item)) {
                // 使用HashSet.contains()提高效率
                bloomFalsePositives++;
            }
        }
        long bloomQueryTime = System.currentTimeMillis() - bloomQueryStart;

        // 测试布谷鸟过滤器查询性能
        int cuckooFalsePositives = 0;
        long cuckooQueryStart = System.currentTimeMillis();
        for (String item : itemsToQuery) {
            boolean exists = cuckooFilter.existsItem(cuckooKey, item);
            if (exists && !insertedItems.contains(item)) {
                cuckooFalsePositives++;
            }
        }
        long cuckooQueryTime = System.currentTimeMillis() - cuckooQueryStart;

        // 测试内存占用
        long bloomMemory = 0;
        long cuckooMemory = 0;
        try (Jedis jedis = new Jedis("localhost", 6379)) {
            // 使用try-catch捕获可能的类型转换异常
            try {
                Map<String, Object> bloomInfo = (Map<String, Object>) jedis.sendCommand(
                    Command.valueOf("BF.INFO"), bloomKey);
                bloomMemory = Long.parseLong(bloomInfo.get("bytes").toString());
            } catch (ClassCastException e) {
                System.err.println("获取布隆过滤器内存信息失败");
            }

            try {
                Map<String, Object> cuckooInfo = (Map<String, Object>) jedis.sendCommand(
                    Command.valueOf("CF.INFO"), cuckooKey);
                cuckooMemory = Long.parseLong(cuckooInfo.get("size").toString());
            } catch (ClassCastException e) {
                System.err.println("获取布谷鸟过滤器内存信息失败");
            }
        }

        // 输出结果
        System.out.println("布隆过滤器插入时间: " + bloomInsertTime + "ms");
        System.out.println("布谷鸟过滤器插入时间: " + cuckooInsertTime + "ms");
        System.out.println("布隆过滤器查询时间: " + bloomQueryTime + "ms");
        System.out.println("布谷鸟过滤器查询时间: " + cuckooQueryTime + "ms");
        System.out.println("布隆过滤器假阳性率: " + (bloomFalsePositives * 100.0 / (QUERY_COUNT * 0.2)) + "%");
        System.out.println("布谷鸟过滤器假阳性率: " + (cuckooFalsePositives * 100.0 / (QUERY_COUNT * 0.2)) + "%");
        System.out.println("布隆过滤器内存占用: " + bloomMemory + " 字节 (约 " + (bloomMemory * 8.0 / ITEMS_COUNT) + " bits/元素)");
        System.out.println("布谷鸟过滤器内存占用: " + cuckooMemory + " 字节 (约 " + (cuckooMemory * 8.0 / ITEMS_COUNT) + " bits/元素)");
    }

    public static void main(String[] args) {
        FilterBenchmark benchmark = new FilterBenchmark("localhost", 6379);
        benchmark.runBenchmark();
    }
}

基于上述测试，我们可以得出以下性能数据：

graph LR A[性能指标] --> B[插入时间] A --> C[查询时间] A --> D[内存占用] A --> E[假阳性率] B --> F[布隆较快\n尤其在低负载下] C --> G[相近\n两者都很快] D --> H[布隆约9.6bits/元素\n布谷鸟约12bits/元素] E --> I[布谷鸟更低\n且更稳定]

生产环境注意事项

持久化与重启

Redis 过滤器数据默认会随 Redis 持久化机制（RDB/AOF）保存，需注意以下几点：

大型过滤器可能导致 RDB 保存时间增加，影响主线程
使用 AOF 时，频繁操作过滤器会使 AOF 文件增长迅速
重启后，大型过滤器加载可能导致 Redis 启动时间延长

建议：

对超大型过滤器（>100MB）考虑使用无持久化的专用 Redis 实例
定期监控 Redis 内存占用和持久化时间

集群支持

在 Redis 集群环境中使用过滤器时注意事项：

RedisBloom 的过滤器只能在单个分片上操作，不能跨分片
确保相关过滤器的 key 分配到同一分片（使用 hash tags，如{user:123}:filter）
避免单个过滤器过大导致分片不均衡

监控与维护

定期检查过滤器状态，防患于未然：

java 复制代码

public void monitorFilter(String key, boolean isBloom) {
    try (Jedis jedis = jedisPool.getResource()) {
        String command = isBloom ? "BF.INFO" : "CF.INFO";
        Object result = jedis.sendCommand(Command.valueOf(command), key);

        // 类型安全检查
        if (!(result instanceof Map)) {
            System.err.println("获取过滤器信息失败，返回类型非预期");
            return;
        }

        try {
            Map<String, Object> info = (Map<String, Object>) result;

            // 检查容量使用情况
            if (isBloom) {
                long capacity = Long.parseLong(info.get("capacity").toString());
                long size = Long.parseLong(info.get("size").toString());
                double fillRatio = size / (double) capacity;

                if (fillRatio > 0.75) {
                    // 布隆过滤器接近装满，考虑创建新的更大容量过滤器
                    System.out.println("布隆过滤器 " + key + " 装载率: " + fillRatio);
                }
            } else {
                long bucketSize = Long.parseLong(info.get("bucket_size").toString());
                long inserted = Long.parseLong(info.get("num_items").toString());
                long deleted = Long.parseLong(info.get("num_deletes").toString());
                long size = Long.parseLong(info.get("size").toString());

                // 布谷鸟过滤器删除过多可能导致性能下降
                if (deleted > inserted * 0.3) {
                    System.out.println("布谷鸟过滤器 " + key + " 删除率高: " + (deleted/inserted));
                }
            }
        } catch (ClassCastException | NullPointerException e) {
            System.err.println("解析过滤器信息时出错: " + e.getMessage());
        }
    } catch (Exception e) {
        System.err.println("监控过滤器状态失败: " + e.getMessage());
    }
}

在 Java 项目中整合使用

下面是一个使用 Spring Boot 整合 Redis 布隆过滤器和布谷鸟过滤器的完整配置：

java 复制代码

@SpringBootApplication
public class FilterDemoApplication {
    public static void main(String[] args) {
        SpringApplication.run(FilterDemoApplication.class, args);
    }

    @Bean
    public JedisPool jedisPool(
            @Value("${spring.redis.host}") String host,
            @Value("${spring.redis.port}") int port,
            @Value("${spring.redis.password}") String password,
            @Value("${spring.redis.jedis.pool.max-active}") int maxActive,
            @Value("${spring.redis.jedis.pool.max-idle}") int maxIdle,
            @Value("${spring.redis.jedis.pool.min-idle}") int minIdle) {

        JedisPoolConfig config = new JedisPoolConfig();
        config.setMaxTotal(maxActive); // 标准属性名
        config.setMaxIdle(maxIdle);
        config.setMinIdle(minIdle);
        config.setTestOnBorrow(true);
        config.setTestOnReturn(true);

        if (password != null && !password.isEmpty()) {
            return new JedisPool(config, host, port, 2000, password);
        } else {
            return new JedisPool(config, host, port, 2000);
        }
    }

    @Bean
    public BloomFilterExample bloomFilterExample(JedisPool jedisPool) {
        return new BloomFilterExample(jedisPool);
    }

    @Bean
    public CuckooFilterExample cuckooFilterExample(JedisPool jedisPool) {
        return new CuckooFilterExample(jedisPool);
    }

    @Bean
    public RedissonClient redissonClient(
            @Value("${spring.redis.host}") String host,
            @Value("${spring.redis.port}") int port,
            @Value("${spring.redis.password}") String password) {
        Config config = new Config();
        String address = "redis://" + host + ":" + port;
        config.useSingleServer()
              .setAddress(address)
              .setPassword(password.isEmpty() ? null : password);
        return Redisson.create(config);
    }
}

在 application.yml 中配置 Redis 连接：

yaml 复制代码

spring:
  redis:
    host: localhost
    port: 6379
    password:
    jedis:
      pool:
        max-active: 8
        max-idle: 8
        min-idle: 0

过滤器选择建议

flowchart TD A[选择过滤器类型] --> B{需要删除元素?} B -->|是| C[布谷鸟过滤器] B -->|否| D{内存限制极为严格?} D -->|是| E[布隆过滤器] D -->|否| F{追求更低误判率?} F -->|是| C F -->|否| E E --> G[适用场景:\n缓存穿透\n爬虫URL去重\n垃圾过滤] C --> H[适用场景:\n需要删除的场景\n高精度要求\n动态数据集]

总结

特性	布隆过滤器	布谷鸟过滤器
支持删除	❌	✅
插入性能	很快 O(k)	快，但高负载下可能变慢
查询性能	很快 O(k)	很快 O(1)
内存占用	9.6bits/元素 (误判率 0.01)	12bits/元素 (比布隆多约 25%)
假阳性率	随负载增加而增加	相对稳定(约 0.4%)
实现复杂度	简单	较复杂
Redis 支持	RedisBloom	RedisBloom
适用场景	大规模数据、内存受限、静态数据集	需要删除、更高准确率、动态数据集