在计算机科学中,集合(Set)是一种基础而重要的数据结构,它存储不重复的元素并支持各种集合运算,如并集、交集、差集等。在 Exercism 的 "custom-set" 练习中,我们将从零开始实现一个自定义的集合数据结构,这不仅能帮助我们深入理解集合的内部工作原理,还能学习 Rust 中的泛型编程、trait 约束和数据结构设计。
什么是集合?
集合是计算机科学中的一种抽象数据类型,具有以下特点:
- 唯一性:集合中的每个元素都是唯一的,不允许重复
- 无序性:集合中的元素没有特定的顺序
- 确定性:可以快速判断一个元素是否属于集合
- 集合运算:支持并集、交集、差集等操作
让我们先看看练习提供的结构和函数签名:
rust
use std::marker::PhantomData;
#[derive(Debug, PartialEq)]
pub struct CustomSet<T> {
// This field is here to make the template compile and not to
// complain about unused type parameter 'T'. Once you start
// solving the exercise, delete this field and the 'std::marker::PhantomData'
// import.
phantom: PhantomData<T>,
}
impl<T> CustomSet<T> {
pub fn new(_input: &[T]) -> Self {
unimplemented!();
}
pub fn contains(&self, _element: &T) -> bool {
unimplemented!();
}
pub fn add(&mut self, _element: T) {
unimplemented!();
}
pub fn is_subset(&self, _other: &Self) -> bool {
unimplemented!();
}
pub fn is_empty(&self) -> bool {
unimplemented!();
}
pub fn is_disjoint(&self, _other: &Self) -> bool {
unimplemented!();
}
pub fn intersection(&self, _other: &Self) -> Self {
unimplemented!();
}
pub fn difference(&self, _other: &Self) -> Self {
unimplemented!();
}
pub fn union(&self, _other: &Self) -> Self {
unimplemented!();
}
}
我们需要实现这个泛型集合结构体,它应该支持:
- 创建集合
- 检查元素是否存在
- 添加元素
- 各种集合运算(子集、不相交、交集、差集、并集)
- 检查是否为空
算法设计
1. 基础实现(使用 Vec)
rust
use std::collections::HashSet;
#[derive(Debug)]
pub struct CustomSet<T: Eq + std::hash::Hash + Clone> {
items: HashSet<T>,
}
impl<T: Eq + std::hash::Hash + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
self.items == other.items
}
}
impl<T: Eq + std::hash::Hash + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut set = HashSet::new();
for item in input {
set.insert(item.clone());
}
CustomSet { items: set }
}
pub fn contains(&self, element: &T) -> bool {
self.items.contains(element)
}
pub fn add(&mut self, element: T) {
self.items.insert(element);
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.is_subset(&other.items)
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.is_disjoint(&other.items)
}
pub fn intersection(&self, other: &Self) -> Self {
CustomSet {
items: self.items.intersection(&other.items).cloned().collect(),
}
}
pub fn difference(&self, other: &Self) -> Self {
CustomSet {
items: self.items.difference(&other.items).cloned().collect(),
}
}
pub fn union(&self, other: &Self) -> Self {
CustomSet {
items: self.items.union(&other.items).cloned().collect(),
}
}
}
2. 从零实现(不使用标准库的 HashSet)
rust
use std::vec::Vec;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
items: Vec<T>,
}
impl<T: Eq + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
if self.items.len() != other.items.len() {
return false;
}
// 检查每个元素是否都在另一个集合中
for item in &self.items {
if !other.contains(item) {
return false;
}
}
for item in &other.items {
if !self.contains(item) {
return false;
}
}
true
}
}
impl<T: Eq + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut set = CustomSet { items: Vec::new() };
for item in input {
set.add(item.clone());
}
set
}
pub fn contains(&self, element: &T) -> bool {
self.items.iter().any(|x| x == element)
}
pub fn add(&mut self, element: T) {
if !self.contains(&element) {
self.items.push(element);
}
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.iter().all(|item| other.contains(item))
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.iter().all(|item| !other.contains(item))
}
pub fn intersection(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn difference(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if !other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn union(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
result.add(item.clone());
}
for item in &other.items {
result.add(item.clone());
}
result
}
}
测试用例分析
通过查看测试用例,我们可以更好地理解需求:
rust
#[test]
fn sets_with_no_elements_are_empty() {
let set: CustomSet<()> = CustomSet::new(&[]);
assert!(set.is_empty());
}
空集合应该被识别为空。
rust
#[test]
fn true_when_the_element_is_in_the_set() {
let set = CustomSet::new(&[1, 2, 3]);
assert!(set.contains(&1));
}
集合应能正确识别包含的元素。
rust
#[test]
fn sets_with_same_elements_are_subsets() {
let set1 = CustomSet::new(&[1, 2, 3]);
let set2 = CustomSet::new(&[1, 2, 3]);
assert!(set1.is_subset(&set2));
assert!(set2.is_subset(&set1));
}
具有相同元素的集合互为子集。
rust
#[test]
fn sets_with_one_element_in_common_are_not_disjoint() {
let set1 = CustomSet::new(&[1, 2]);
let set2 = CustomSet::new(&[2, 3]);
assert!(!set1.is_disjoint(&set2));
assert!(!set2.is_disjoint(&set1));
}
有共同元素的集合不是不相交的。
rust
#[test]
fn sets_with_the_same_elements_are_equal() {
let set1 = CustomSet::new(&[1, 2]);
let set2 = CustomSet::new(&[2, 1]);
assert_eq!(set1, set2);
}
元素相同的集合应该相等(与顺序无关)。
rust
#[test]
fn intersection_of_two_sets_with_shared_elements_is_a_set_of_the_shared_elements() {
let set1 = CustomSet::new(&[1, 2, 3, 4]);
let set2 = CustomSet::new(&[3, 2, 5]);
assert_eq!(set1.intersection(&set2), CustomSet::new(&[2, 3]));
assert_eq!(set2.intersection(&set1), CustomSet::new(&[2, 3]));
}
交集应该包含两个集合的共同元素。
完整实现
考虑所有边界情况的完整实现:
rust
use std::vec::Vec;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
items: Vec<T>,
}
impl<T: Eq + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
if self.items.len() != other.items.len() {
return false;
}
// 检查每个元素是否都在另一个集合中(不依赖顺序)
self.items.iter().all(|item| other.contains(item)) &&
other.items.iter().all(|item| self.contains(item))
}
}
impl<T: Eq + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut set = CustomSet { items: Vec::new() };
for item in input {
set.add(item.clone());
}
set
}
pub fn contains(&self, element: &T) -> bool {
self.items.iter().any(|x| x == element)
}
pub fn add(&mut self, element: T) {
if !self.contains(&element) {
self.items.push(element);
}
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.iter().all(|item| other.contains(item))
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.iter().all(|item| !other.contains(item))
}
pub fn intersection(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn difference(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if !other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn union(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
result.add(item.clone());
}
for item in &other.items {
result.add(item.clone());
}
result
}
}
性能优化版本
考虑性能的优化实现:
rust
use std::vec::Vec;
use std::collections::HashSet;
use std::hash::Hash;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Hash + Clone> {
items: HashSet<T>,
}
impl<T: Eq + Hash + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
self.items == other.items
}
}
impl<T: Eq + Hash + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut items = HashSet::new();
for item in input {
items.insert(item.clone());
}
CustomSet { items }
}
pub fn contains(&self, element: &T) -> bool {
self.items.contains(element)
}
pub fn add(&mut self, element: T) {
self.items.insert(element);
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.is_subset(&other.items)
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.is_disjoint(&other.items)
}
pub fn intersection(&self, other: &Self) -> Self {
CustomSet {
items: self.items.intersection(&other.items).cloned().collect(),
}
}
pub fn difference(&self, other: &Self) -> Self {
CustomSet {
items: self.items.difference(&other.items).cloned().collect(),
}
}
pub fn union(&self, other: &Self) -> Self {
CustomSet {
items: self.items.union(&other.items).cloned().collect(),
}
}
// 添加一些实用方法
pub fn len(&self) -> usize {
self.items.len()
}
pub fn clear(&mut self) {
self.items.clear();
}
pub fn remove(&mut self, element: &T) -> bool {
self.items.remove(element)
}
}
错误处理和边界情况
考虑更多边界情况的实现:
rust
use std::vec::Vec;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
items: Vec<T>,
}
impl<T: Eq + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
if self.items.len() != other.items.len() {
return false;
}
self.items.iter().all(|item| other.contains(item)) &&
other.items.iter().all(|item| self.contains(item))
}
}
impl<T: Eq + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut set = CustomSet { items: Vec::new() };
// 处理输入中的重复元素
for item in input {
set.add(item.clone());
}
set
}
pub fn contains(&self, element: &T) -> bool {
self.items.iter().any(|x| x == element)
}
pub fn add(&mut self, element: T) {
// 只有当元素不存在时才添加
if !self.contains(&element) {
self.items.push(element);
}
}
pub fn is_subset(&self, other: &Self) -> bool {
// 空集是任何集合的子集
if self.is_empty() {
return true;
}
self.items.iter().all(|item| other.contains(item))
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
// 如果任一集合为空,则它们是不相交的
if self.is_empty() || other.is_empty() {
return true;
}
self.items.iter().all(|item| !other.contains(item))
}
pub fn intersection(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
// 优化:在较小的集合上迭代
let (smaller, larger) = if self.items.len() <= other.items.len() {
(self, other)
} else {
(other, self)
};
for item in &smaller.items {
if larger.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn difference(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if !other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn union(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
// 先添加 self 的所有元素
for item in &self.items {
result.add(item.clone());
}
// 再添加 other 中不在 self 中的元素
for item in &other.items {
result.add(item.clone());
}
result
}
}
扩展功能
基于基础实现,我们可以添加更多功能:
rust
use std::vec::Vec;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Clone> {
items: Vec<T>,
}
impl<T: Eq + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
if self.items.len() != other.items.len() {
return false;
}
self.items.iter().all(|item| other.contains(item)) &&
other.items.iter().all(|item| self.contains(item))
}
}
impl<T: Eq + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut set = CustomSet { items: Vec::new() };
for item in input {
set.add(item.clone());
}
set
}
pub fn contains(&self, element: &T) -> bool {
self.items.iter().any(|x| x == element)
}
pub fn add(&mut self, element: T) {
if !self.contains(&element) {
self.items.push(element);
}
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.iter().all(|item| other.contains(item))
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.iter().all(|item| !other.contains(item))
}
pub fn intersection(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn difference(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
if !other.contains(item) {
result.add(item.clone());
}
}
result
}
pub fn union(&self, other: &Self) -> Self {
let mut result = CustomSet::new(&[]);
for item in &self.items {
result.add(item.clone());
}
for item in &other.items {
result.add(item.clone());
}
result
}
// 扩展功能
pub fn is_superset(&self, other: &Self) -> bool {
other.is_subset(self)
}
pub fn len(&self) -> usize {
self.items.len()
}
pub fn clear(&mut self) {
self.items.clear();
}
pub fn iter(&self) -> std::slice::Iter<T> {
self.items.iter()
}
pub fn into_iter(self) -> std::vec::IntoIter<T> {
self.items.into_iter()
}
// 对称差集
pub fn symmetric_difference(&self, other: &Self) -> Self {
self.difference(other).union(&other.difference(self))
}
}
实际应用场景
自定义集合在实际开发中有以下应用:
- 数据库查询:实现集合操作如 UNION、INTERSECT、EXCEPT
- 权限管理:用户权限集合的管理
- 标签系统:文章或产品的标签管理
- 去重处理:数据处理中的重复项移除
- 图算法:图遍历时的节点访问记录
- 缓存系统:缓存键的管理
算法复杂度分析
-
基于 Vec 的实现:
- contains: O(n)
- add: O(n)
- is_subset: O(n×m)
- intersection: O(n×m)
- difference: O(n×m)
- union: O(n×m)
-
基于 HashSet 的实现:
- contains: O(1) 平均情况
- add: O(1) 平均情况
- is_subset: O(n)
- intersection: O(min(n,m))
- difference: O(n)
- union: O(n+m)
与其他实现方式的比较
rust
// 使用 BTreeSet 的实现(有序)
use std::collections::BTreeSet;
#[derive(Debug)]
pub struct CustomSet<T: Eq + Ord + Clone> {
items: BTreeSet<T>,
}
impl<T: Eq + Ord + Clone> PartialEq for CustomSet<T> {
fn eq(&self, other: &Self) -> bool {
self.items == other.items
}
}
impl<T: Eq + Ord + Clone> CustomSet<T> {
pub fn new(input: &[T]) -> Self {
let mut items = BTreeSet::new();
for item in input {
items.insert(item.clone());
}
CustomSet { items }
}
pub fn contains(&self, element: &T) -> bool {
self.items.contains(element)
}
pub fn add(&mut self, element: T) {
self.items.insert(element);
}
pub fn is_subset(&self, other: &Self) -> bool {
self.items.is_subset(&other.items)
}
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
pub fn is_disjoint(&self, other: &Self) -> bool {
self.items.is_disjoint(&other.items)
}
pub fn intersection(&self, other: &Self) -> Self {
CustomSet {
items: self.items.intersection(&other.items).cloned().collect(),
}
}
pub fn difference(&self, other: &Self) -> Self {
CustomSet {
items: self.items.difference(&other.items).cloned().collect(),
}
}
pub fn union(&self, other: &Self) -> Self {
CustomSet {
items: self.items.union(&other.items).cloned().collect(),
}
}
}
总结
通过 custom-set 练习,我们学到了:
- 数据结构设计:掌握了集合数据结构的实现原理
- 泛型编程:学会了如何实现泛型数据结构
- trait 约束:理解了 Rust 中的 trait 约束机制
- 集合运算:熟练实现了各种集合操作算法
- 性能优化:了解了不同底层实现的性能特点
- 边界处理:学会了处理各种边界情况
这些技能在实际开发中非常有用,特别是在实现自定义数据结构、处理集合运算和进行性能优化时。自定义集合虽然是一个基础数据结构,但它涉及到了泛型编程、算法实现和性能优化等许多核心概念,是学习 Rust 数据结构实现的良好起点。
通过这个练习,我们也看到了 Rust 在泛型编程和 trait 系统方面的强大能力,以及如何用安全且高效的方式实现复杂的数据结构。这种结合了安全性和性能的语言特性正是 Rust 的魅力所在。