C#数组去重方法总结

文章目录

数组类型
- [1. **使用 LINQ 的 Distinct() 方法（最常用）**](#1. 使用 LINQ 的 Distinct() 方法（最常用）)
- [2. **使用 HashSet（自动去重）**](#2. 使用 HashSet（自动去重）)
- [3. **使用 LINQ 的 GroupBy 方法**](#3. 使用 LINQ 的 GroupBy 方法)
- [4. **自定义去重方法（不依赖LINQ）**](#4. 自定义去重方法（不依赖LINQ）)
- [5. **针对复杂对象的去重**](#5. 针对复杂对象的去重)
- [6. **保留原始顺序的去重方法**](#6. 保留原始顺序的去重方法)
- **性能比较**
- **简单示例**
List<>类型
- [1. **使用 LINQ 的 Distinct() 方法（推荐）**](#1. 使用 LINQ 的 Distinct() 方法（推荐）)
- [2. **使用 HashSet 的构造函数（高效去重）**](#2. 使用 HashSet 的构造函数（高效去重）)
- [3. **使用 ForEach 循环和 Contains 检查**](#3. 使用 ForEach 循环和 Contains 检查)
- [4. **使用 LINQ 的 GroupBy 方法**](#4. 使用 LINQ 的 GroupBy 方法)
- [5. **扩展方法（封装重用）**](#5. 扩展方法（封装重用）)
- [6. **处理大小写敏感的去重**](#6. 处理大小写敏感的去重)
- [7. **性能优化的去重方法**](#7. 性能优化的去重方法)
- [8. **完整示例**](#8. 完整示例)
- **性能比较和建议**

数组类型

在C#中给数组去重有多种方法，以下是几种常用的方式：

1. 使用 LINQ 的 Distinct() 方法（最常用）

csharp 复制代码

using System;
using System.Linq;

int[] numbers = { 1, 2, 2, 3, 4, 4, 5 };
string[] fruits = { "apple", "orange", "apple", "banana" };

// 整数数组去重
int[] uniqueNumbers = numbers.Distinct().ToArray();

// 字符串数组去重
string[] uniqueFruits = fruits.Distinct().ToArray();

// 显示结果
Console.WriteLine(string.Join(", ", uniqueNumbers)); // 1, 2, 3, 4, 5
Console.WriteLine(string.Join(", ", uniqueFruits));  // apple, orange, banana

2. 使用 HashSet（自动去重）

csharp 复制代码

using System;
using System.Collections.Generic;

int[] numbers = { 1, 2, 2, 3, 4, 4, 5 };

// 方法1：直接创建HashSet
HashSet<int> hashSet = new HashSet<int>(numbers);
int[] uniqueNumbers = hashSet.ToArray();

// 方法2：使用HashSet收集不重复元素
HashSet<int> set = new HashSet<int>();
List<int> result = new List<int>();

foreach (int num in numbers)
{
    if (set.Add(num)) // 如果成功添加（表示不重复）
    {
        result.Add(num);
    }
}

int[] uniqueArray = result.ToArray();

3. 使用 LINQ 的 GroupBy 方法

csharp 复制代码

using System;
using System.Linq;

int[] numbers = { 1, 2, 2, 3, 4, 4, 5 };

int[] uniqueNumbers = numbers
    .GroupBy(x => x)
    .Select(g => g.Key)
    .ToArray();

4. 自定义去重方法（不依赖LINQ）

csharp 复制代码

using System;
using System.Collections.Generic;

public static T[] RemoveDuplicates<T>(T[] array)
{
    List<T> result = new List<T>();
    HashSet<T> seen = new HashSet<T>();
    
    foreach (T item in array)
    {
        if (seen.Add(item)) // 如果元素是新出现的
        {
            result.Add(item);
        }
    }
    
    return result.ToArray();
}

// 使用示例
int[] numbers = { 1, 2, 2, 3, 4, 4, 5 };
int[] uniqueNumbers = RemoveDuplicates(numbers);

5. 针对复杂对象的去重

csharp 复制代码

using System;
using System.Linq;

public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
}

Person[] people = 
{
    new Person { Id = 1, Name = "Alice" },
    new Person { Id = 2, Name = "Bob" },
    new Person { Id = 1, Name = "Alice" } // 重复
};

// 根据Id去重
Person[] uniquePeople = people
    .GroupBy(p => p.Id)
    .Select(g => g.First())
    .ToArray();

// 根据多个属性去重
Person[] uniquePeople2 = people
    .GroupBy(p => new { p.Id, p.Name })
    .Select(g => g.First())
    .ToArray();

6. 保留原始顺序的去重方法

csharp 复制代码

using System;
using System.Collections.Generic;

public static T[] RemoveDuplicatesKeepOrder<T>(T[] array)
{
    var seen = new HashSet<T>();
    var result = new List<T>();
    
    foreach (var item in array)
    {
        if (!seen.Contains(item))
        {
            seen.Add(item);
            result.Add(item);
        }
    }
    
    return result.ToArray();
}

性能比较

对于简单数据类型 ：Distinct().ToArray() 通常是最佳选择，代码简洁且性能良好
对于大型数组 ：使用 HashSet 的方法性能更好，时间复杂度为 O(n)
如果需要保留顺序 ：使用自定义方法或 Distinct()（它也会保留顺序）

简单示例

csharp 复制代码

using System;
using System.Linq;

class Program
{
    static void Main()
    {
        // 示例1：基本去重
        int[] numbers = { 1, 2, 3, 2, 4, 3, 5 };
        int[] unique = numbers.Distinct().ToArray();
        
        // 示例2：字符串去重
        string[] words = { "hello", "world", "hello", "c#" };
        string[] uniqueWords = words.Distinct().ToArray();
        
        // 示例3：使用自定义比较器
        string[] caseInsensitive = { "Hello", "hello", "WORLD", "world" };
        string[] uniqueCaseInsensitive = caseInsensitive
            .Distinct(StringComparer.OrdinalIgnoreCase)
            .ToArray();
    }
}

推荐使用 ：对于大多数情况，使用 Distinct().ToArray() 是最简单直接的方法。如果需要处理自定义比较逻辑或特定性能需求，再考虑其他方法。

List<>类型

对于 List<string> 类型的去重操作，方法与数组类似但更灵活，以下是几种常用方式：

1. 使用 LINQ 的 Distinct() 方法（推荐）

csharp 复制代码

using System;
using System.Collections.Generic;
using System.Linq;

List<string> fruits = new List<string> { "apple", "orange", "apple", "banana", "orange" };

// 方法1：转换为新List
List<string> uniqueFruits1 = fruits.Distinct().ToList();

// 方法2：就地修改（去除重复项后重新赋值）
fruits = fruits.Distinct().ToList();

// 显示结果
Console.WriteLine(string.Join(", ", uniqueFruits1)); // apple, orange, banana

2. 使用 HashSet 的构造函数（高效去重）

csharp 复制代码

List<string> fruits = new List<string> { "apple", "orange", "apple", "banana", "orange" };

// 方法1：直接创建HashSet再转回List
HashSet<string> hashSet = new HashSet<string>(fruits);
List<string> uniqueFruits = hashSet.ToList();

// 方法2：使用HashSet构造函数并指定比较器（如忽略大小写）
HashSet<string> caseInsensitiveSet = new HashSet<string>(
    fruits, 
    StringComparer.OrdinalIgnoreCase
);
List<string> uniqueCaseInsensitive = caseInsensitiveSet.ToList();

3. 使用 ForEach 循环和 Contains 检查

csharp 复制代码

List<string> fruits = new List<string> { "apple", "orange", "apple", "banana", "orange" };
List<string> uniqueList = new List<string>();

foreach (string fruit in fruits)
{
    if (!uniqueList.Contains(fruit))
    {
        uniqueList.Add(fruit);
    }
}

// 或者使用ForEach方法
List<string> uniqueList2 = new List<string>();
fruits.ForEach(fruit => 
{
    if (!uniqueList2.Contains(fruit))
        uniqueList2.Add(fruit);
});

4. 使用 LINQ 的 GroupBy 方法

csharp 复制代码

List<string> fruits = new List<string> { "apple", "orange", "apple", "banana", "orange" };

List<string> uniqueFruits = fruits
    .GroupBy(f => f)
    .Select(g => g.Key)
    .ToList();

5. 扩展方法（封装重用）

csharp 复制代码

using System;
using System.Collections.Generic;
using System.Linq;

public static class ListExtensions
{
    // 扩展方法：去除重复项（返回新List）
    public static List<T> RemoveDuplicates<T>(this List<T> list)
    {
        return list.Distinct().ToList();
    }
    
    // 扩展方法：去除重复项（就地修改）
    public static void RemoveDuplicatesInPlace<T>(this List<T> list)
    {
        var uniqueItems = list.Distinct().ToList();
        list.Clear();
        list.AddRange(uniqueItems);
    }
    
    // 扩展方法：使用自定义比较器去重
    public static List<T> RemoveDuplicates<T>(
        this List<T> list, 
        IEqualityComparer<T> comparer)
    {
        return list.Distinct(comparer).ToList();
    }
}

// 使用示例
List<string> fruits = new List<string> { "apple", "orange", "apple", "banana", "orange" };

// 使用扩展方法
List<string> unique1 = fruits.RemoveDuplicates();
fruits.RemoveDuplicatesInPlace();

// 使用自定义比较器（忽略大小写）
List<string> mixedCase = new List<string> { "Apple", "apple", "ORANGE", "Orange" };
List<string> unique2 = mixedCase.RemoveDuplicates(StringComparer.OrdinalIgnoreCase);

6. 处理大小写敏感的去重

csharp 复制代码

List<string> mixedCase = new List<string> { "Apple", "apple", "ORANGE", "Orange", "banana", "BANANA" };

// 方法1：全部转换为小写/大写后去重
List<string> uniqueLower = mixedCase
    .Select(s => s.ToLower())
    .Distinct()
    .ToList();

// 方法2：使用忽略大小写的比较器
List<string> uniqueCaseInsensitive = mixedCase
    .Distinct(StringComparer.OrdinalIgnoreCase)
    .ToList();

// 方法3：保留原始大小写形式（第一次出现的）
List<string> uniquePreserveCase = mixedCase
    .GroupBy(s => s, StringComparer.OrdinalIgnoreCase)
    .Select(g => g.First())
    .ToList();

7. 性能优化的去重方法

csharp 复制代码

public static List<string> RemoveDuplicatesFast(List<string> list)
{
    if (list == null || list.Count == 0)
        return new List<string>();
    
    HashSet<string> seen = new HashSet<string>();
    List<string> result = new List<string>();
    
    foreach (string item in list)
    {
        if (seen.Add(item)) // 如果成功添加到HashSet（说明是新的）
        {
            result.Add(item);
        }
    }
    
    return result;
}

// 或者使用容量优化
public static List<string> RemoveDuplicatesOptimized(List<string> list)
{
    if (list == null || list.Count == 0)
        return new List<string>();
    
    HashSet<string> seen = new HashSet<string>(list.Count); // 预设容量
    List<string> result = new List<string>(list.Count); // 预设容量
    
    foreach (string item in list)
    {
        if (seen.Add(item))
        {
            result.Add(item);
        }
    }
    
    result.TrimExcess(); // 释放多余容量
    return result;
}

8. 完整示例

csharp 复制代码

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main()
    {
        // 创建示例数据
        List<string> fruits = new List<string>
        {
            "apple", "orange", "apple", "banana", "orange", "grape", "banana"
        };
        
        Console.WriteLine("原始列表: " + string.Join(", ", fruits));
        
        // 方法1：使用Distinct
        List<string> method1 = fruits.Distinct().ToList();
        Console.WriteLine("Distinct方法: " + string.Join(", ", method1));
        
        // 方法2：使用HashSet
        List<string> method2 = new HashSet<string>(fruits).ToList();
        Console.WriteLine("HashSet方法: " + string.Join(", ", method2));
        
        // 方法3：就地修改
        fruits = fruits.Distinct().ToList();
        Console.WriteLine("就地修改后: " + string.Join(", ", fruits));
        
        // 处理大小写敏感的场景
        List<string> caseSensitive = new List<string>
        {
            "Apple", "apple", "APPLE", "Orange", "orange"
        };
        
        Console.WriteLine("\n大小写敏感列表: " + string.Join(", ", caseSensitive));
        
        // 忽略大小写去重
        List<string> caseInsensitive = caseSensitive
            .Distinct(StringComparer.OrdinalIgnoreCase)
            .ToList();
        
        Console.WriteLine("忽略大小写去重: " + string.Join(", ", caseInsensitive));
    }
}

性能比较和建议

Distinct().ToList() - 最简洁，适用于大多数场景
HashSet<T>构造函数 - 性能最好，特别适合大数据量
ForEach + Contains - 最简单易懂，但性能最差（O(n²)）

推荐选择：

一般情况：使用 fruits.Distinct().ToList()
性能敏感：使用 new HashSet<string>(fruits).ToList()
需要自定义比较：使用 Distinct(StringComparer.OrdinalIgnoreCase).ToList()