Rust 练习册 67：自定义集合与数据结构实现

在计算机科学中，集合（Set）是一种基础而重要的数据结构，它存储不重复的元素并支持各种集合运算，如并集、交集、差集等。在 Exercism 的 "custom-set" 练习中，我们将从零开始实现一个自定义的集合数据结构，这不仅能帮助我们深入理解集合的内部工作原理，还能学习 Rust 中的泛型编程、trait 约束和数据结构设计。

什么是集合？

集合是计算机科学中的一种抽象数据类型，具有以下特点：

唯一性：集合中的每个元素都是唯一的，不允许重复
无序性：集合中的元素没有特定的顺序
确定性：可以快速判断一个元素是否属于集合
集合运算：支持并集、交集、差集等操作

让我们先看看练习提供的结构和函数签名：

rust 复制代码

use std::marker::PhantomData;

#[derive(Debug, PartialEq)]
pub struct CustomSet<T> {
    // This field is here to make the template compile and not to
    // complain about unused type parameter 'T'. Once you start
    // solving the exercise, delete this field and the 'std::marker::PhantomData'
    // import.
    phantom: PhantomData<T>,
}

impl<T> CustomSet<T> {
    pub fn new(_input: &[T]) -> Self {
        unimplemented!();
    }

    pub fn contains(&self, _element: &T) -> bool {
        unimplemented!();
    }

    pub fn add(&mut self, _element: T) {
        unimplemented!();
    }

    pub fn is_subset(&self, _other: &Self) -> bool {
        unimplemented!();
    }

    pub fn is_empty(&self) -> bool {
        unimplemented!();
    }

    pub fn is_disjoint(&self, _other: &Self) -> bool {
        unimplemented!();
    }

    pub fn intersection(&self, _other: &Self) -> Self {
        unimplemented!();
    }

    pub fn difference(&self, _other: &Self) -> Self {
        unimplemented!();
    }

    pub fn union(&self, _other: &Self) -> Self {
        unimplemented!();
    }
}

我们需要实现这个泛型集合结构体，它应该支持：

创建集合
检查元素是否存在
添加元素
各种集合运算（子集、不相交、交集、差集、并集）
检查是否为空

算法设计

1. 基础实现（使用 Vec）

rust 复制代码

use std::collections::HashSet;

#[derive(Debug)]
pub struct CustomSet<T: Eq + std::hash::Hash + Clone> {
    items: HashSet<T>,
}

impl<T: Eq + std::hash::Hash + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        self.items == other.items
    }
}

impl<T: Eq + std::hash::Hash + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut set = HashSet::new();
        for item in input {
            set.insert(item.clone());
        }
        CustomSet { items: set }
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.contains(element)
    }

    pub fn add(&mut self, element: T) {
        self.items.insert(element);
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.is_subset(&other.items)
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.is_disjoint(&other.items)
    }

    pub fn intersection(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.intersection(&other.items).cloned().collect(),
        }
    }

    pub fn difference(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.difference(&other.items).cloned().collect(),
        }
    }

    pub fn union(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.union(&other.items).cloned().collect(),
        }
    }
}

2. 从零实现（不使用标准库的 HashSet）

rust 复制代码

use std::vec::Vec;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
    items: Vec<T>,
}

impl<T: Eq + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        if self.items.len() != other.items.len() {
            return false;
        }
        
        // 检查每个元素是否都在另一个集合中
        for item in &self.items {
            if !other.contains(item) {
                return false;
            }
        }
        
        for item in &other.items {
            if !self.contains(item) {
                return false;
            }
        }
        
        true
    }
}

impl<T: Eq + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut set = CustomSet { items: Vec::new() };
        for item in input {
            set.add(item.clone());
        }
        set
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.iter().any(|x| x == element)
    }

    pub fn add(&mut self, element: T) {
        if !self.contains(&element) {
            self.items.push(element);
        }
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.iter().all(|item| other.contains(item))
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.iter().all(|item| !other.contains(item))
    }

    pub fn intersection(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn difference(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if !other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn union(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            result.add(item.clone());
        }
        for item in &other.items {
            result.add(item.clone());
        }
        result
    }
}

测试用例分析

通过查看测试用例，我们可以更好地理解需求：

rust 复制代码

#[test]
fn sets_with_no_elements_are_empty() {
    let set: CustomSet<()> = CustomSet::new(&[]);
    assert!(set.is_empty());
}

空集合应该被识别为空。

rust 复制代码

#[test]
fn true_when_the_element_is_in_the_set() {
    let set = CustomSet::new(&[1, 2, 3]);
    assert!(set.contains(&1));
}

集合应能正确识别包含的元素。

rust 复制代码

#[test]
fn sets_with_same_elements_are_subsets() {
    let set1 = CustomSet::new(&[1, 2, 3]);
    let set2 = CustomSet::new(&[1, 2, 3]);
    assert!(set1.is_subset(&set2));
    assert!(set2.is_subset(&set1));
}

具有相同元素的集合互为子集。

rust 复制代码

#[test]
fn sets_with_one_element_in_common_are_not_disjoint() {
    let set1 = CustomSet::new(&[1, 2]);
    let set2 = CustomSet::new(&[2, 3]);
    assert!(!set1.is_disjoint(&set2));
    assert!(!set2.is_disjoint(&set1));
}

有共同元素的集合不是不相交的。

rust 复制代码

#[test]
fn sets_with_the_same_elements_are_equal() {
    let set1 = CustomSet::new(&[1, 2]);
    let set2 = CustomSet::new(&[2, 1]);
    assert_eq!(set1, set2);
}

元素相同的集合应该相等（与顺序无关）。

rust 复制代码

#[test]
fn intersection_of_two_sets_with_shared_elements_is_a_set_of_the_shared_elements() {
    let set1 = CustomSet::new(&[1, 2, 3, 4]);
    let set2 = CustomSet::new(&[3, 2, 5]);
    assert_eq!(set1.intersection(&set2), CustomSet::new(&[2, 3]));
    assert_eq!(set2.intersection(&set1), CustomSet::new(&[2, 3]));
}

交集应该包含两个集合的共同元素。

完整实现

考虑所有边界情况的完整实现：

rust 复制代码

use std::vec::Vec;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
    items: Vec<T>,
}

impl<T: Eq + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        if self.items.len() != other.items.len() {
            return false;
        }
        
        // 检查每个元素是否都在另一个集合中（不依赖顺序）
        self.items.iter().all(|item| other.contains(item)) &&
        other.items.iter().all(|item| self.contains(item))
    }
}

impl<T: Eq + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut set = CustomSet { items: Vec::new() };
        for item in input {
            set.add(item.clone());
        }
        set
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.iter().any(|x| x == element)
    }

    pub fn add(&mut self, element: T) {
        if !self.contains(&element) {
            self.items.push(element);
        }
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.iter().all(|item| other.contains(item))
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.iter().all(|item| !other.contains(item))
    }

    pub fn intersection(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn difference(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if !other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn union(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            result.add(item.clone());
        }
        for item in &other.items {
            result.add(item.clone());
        }
        result
    }
}

性能优化版本

考虑性能的优化实现：

rust 复制代码

use std::vec::Vec;
use std::collections::HashSet;
use std::hash::Hash;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Hash + Clone> {
    items: HashSet<T>,
}

impl<T: Eq + Hash + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        self.items == other.items
    }
}

impl<T: Eq + Hash + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut items = HashSet::new();
        for item in input {
            items.insert(item.clone());
        }
        CustomSet { items }
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.contains(element)
    }

    pub fn add(&mut self, element: T) {
        self.items.insert(element);
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.is_subset(&other.items)
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.is_disjoint(&other.items)
    }

    pub fn intersection(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.intersection(&other.items).cloned().collect(),
        }
    }

    pub fn difference(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.difference(&other.items).cloned().collect(),
        }
    }

    pub fn union(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.union(&other.items).cloned().collect(),
        }
    }
    
    // 添加一些实用方法
    pub fn len(&self) -> usize {
        self.items.len()
    }
    
    pub fn clear(&mut self) {
        self.items.clear();
    }
    
    pub fn remove(&mut self, element: &T) -> bool {
        self.items.remove(element)
    }
}

错误处理和边界情况

考虑更多边界情况的实现：

rust 复制代码

use std::vec::Vec;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
    items: Vec<T>,
}

impl<T: Eq + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        if self.items.len() != other.items.len() {
            return false;
        }
        
        self.items.iter().all(|item| other.contains(item)) &&
        other.items.iter().all(|item| self.contains(item))
    }
}

impl<T: Eq + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut set = CustomSet { items: Vec::new() };
        
        // 处理输入中的重复元素
        for item in input {
            set.add(item.clone());
        }
        
        set
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.iter().any(|x| x == element)
    }

    pub fn add(&mut self, element: T) {
        // 只有当元素不存在时才添加
        if !self.contains(&element) {
            self.items.push(element);
        }
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        // 空集是任何集合的子集
        if self.is_empty() {
            return true;
        }
        
        self.items.iter().all(|item| other.contains(item))
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        // 如果任一集合为空，则它们是不相交的
        if self.is_empty() || other.is_empty() {
            return true;
        }
        
        self.items.iter().all(|item| !other.contains(item))
    }

    pub fn intersection(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        
        // 优化：在较小的集合上迭代
        let (smaller, larger) = if self.items.len() <= other.items.len() {
            (self, other)
        } else {
            (other, self)
        };
        
        for item in &smaller.items {
            if larger.contains(item) {
                result.add(item.clone());
            }
        }
        
        result
    }

    pub fn difference(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        
        for item in &self.items {
            if !other.contains(item) {
                result.add(item.clone());
            }
        }
        
        result
    }

    pub fn union(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        
        // 先添加 self 的所有元素
        for item in &self.items {
            result.add(item.clone());
        }
        
        // 再添加 other 中不在 self 中的元素
        for item in &other.items {
            result.add(item.clone());
        }
        
        result
    }
}

扩展功能

基于基础实现，我们可以添加更多功能：

rust 复制代码

use std::vec::Vec;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
    items: Vec<T>,
}

impl<T: Eq + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        if self.items.len() != other.items.len() {
            return false;
        }
        
        self.items.iter().all(|item| other.contains(item)) &&
        other.items.iter().all(|item| self.contains(item))
    }
}

impl<T: Eq + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut set = CustomSet { items: Vec::new() };
        for item in input {
            set.add(item.clone());
        }
        set
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.iter().any(|x| x == element)
    }

    pub fn add(&mut self, element: T) {
        if !self.contains(&element) {
            self.items.push(element);
        }
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.iter().all(|item| other.contains(item))
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.iter().all(|item| !other.contains(item))
    }

    pub fn intersection(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn difference(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            if !other.contains(item) {
                result.add(item.clone());
            }
        }
        result
    }

    pub fn union(&self, other: &Self) -> Self {
        let mut result = CustomSet::new(&[]);
        for item in &self.items {
            result.add(item.clone());
        }
        for item in &other.items {
            result.add(item.clone());
        }
        result
    }
    
    // 扩展功能
    pub fn is_superset(&self, other: &Self) -> bool {
        other.is_subset(self)
    }
    
    pub fn len(&self) -> usize {
        self.items.len()
    }
    
    pub fn clear(&mut self) {
        self.items.clear();
    }
    
    pub fn iter(&self) -> std::slice::Iter<T> {
        self.items.iter()
    }
    
    pub fn into_iter(self) -> std::vec::IntoIter<T> {
        self.items.into_iter()
    }
    
    // 对称差集
    pub fn symmetric_difference(&self, other: &Self) -> Self {
        self.difference(other).union(&other.difference(self))
    }
}

实际应用场景

自定义集合在实际开发中有以下应用：

数据库查询：实现集合操作如 UNION、INTERSECT、EXCEPT
权限管理：用户权限集合的管理
标签系统：文章或产品的标签管理
去重处理：数据处理中的重复项移除
图算法：图遍历时的节点访问记录
缓存系统：缓存键的管理

算法复杂度分析

基于 Vec 的实现：
- contains: O(n)
- add: O(n)
- is_subset: O(n×m)
- intersection: O(n×m)
- difference: O(n×m)
- union: O(n×m)
基于 HashSet 的实现：
- contains: O(1) 平均情况
- add: O(1) 平均情况
- is_subset: O(n)
- intersection: O(min(n,m))
- difference: O(n)
- union: O(n+m)

与其他实现方式的比较

rust 复制代码

// 使用 BTreeSet 的实现（有序）
use std::collections::BTreeSet;

#[derive(Debug)]
pub struct CustomSet<T: Eq + Ord + Clone> {
    items: BTreeSet<T>,
}

impl<T: Eq + Ord + Clone> PartialEq for CustomSet<T> {
    fn eq(&self, other: &Self) -> bool {
        self.items == other.items
    }
}

impl<T: Eq + Ord + Clone> CustomSet<T> {
    pub fn new(input: &[T]) -> Self {
        let mut items = BTreeSet::new();
        for item in input {
            items.insert(item.clone());
        }
        CustomSet { items }
    }

    pub fn contains(&self, element: &T) -> bool {
        self.items.contains(element)
    }

    pub fn add(&mut self, element: T) {
        self.items.insert(element);
    }

    pub fn is_subset(&self, other: &Self) -> bool {
        self.items.is_subset(&other.items)
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }

    pub fn is_disjoint(&self, other: &Self) -> bool {
        self.items.is_disjoint(&other.items)
    }

    pub fn intersection(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.intersection(&other.items).cloned().collect(),
        }
    }

    pub fn difference(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.difference(&other.items).cloned().collect(),
        }
    }

    pub fn union(&self, other: &Self) -> Self {
        CustomSet {
            items: self.items.union(&other.items).cloned().collect(),
        }
    }
}

总结

通过 custom-set 练习，我们学到了：

数据结构设计：掌握了集合数据结构的实现原理
泛型编程：学会了如何实现泛型数据结构
trait 约束：理解了 Rust 中的 trait 约束机制
集合运算：熟练实现了各种集合操作算法
性能优化：了解了不同底层实现的性能特点
边界处理：学会了处理各种边界情况

这些技能在实际开发中非常有用，特别是在实现自定义数据结构、处理集合运算和进行性能优化时。自定义集合虽然是一个基础数据结构，但它涉及到了泛型编程、算法实现和性能优化等许多核心概念，是学习 Rust 数据结构实现的良好起点。

通过这个练习，我们也看到了 Rust 在泛型编程和 trait 系统方面的强大能力，以及如何用安全且高效的方式实现复杂的数据结构。这种结合了安全性和性能的语言特性正是 Rust 的魅力所在。