07-C# - 技术栈

C#.Net-数据结构-学习笔记

一、数据结构概述

数据结构是底层数据的存储方式，分四大类：

Set集合：纯粹的容器，无序存储，元素唯一
线性结构：一对一存储，如数组、链表、队列、栈
树形结构：一对多存储，如二叉树、表达式目录树、菜单结构
图状结构：多对多存储，如拓扑图、地图网络

二、线性结构

连续存储(数组类)

Array

内存连续分配，元素类型相同，长度固定。支持索引访问，读取快；增删需要移动元素，慢。

csharp 复制代码

int[] intArray = new int[3];
intArray[0] = 123;
string[] stringArray = new string[] { "123", "234" };

ArrayList

长度可动态增加，元素类型为 object，值类型存入时会装箱，取出时需拆箱和强转。非泛型，性能低，现代项目基本不用。

csharp 复制代码

ArrayList arrayList = new ArrayList();
arrayList.Add("Richard");
arrayList.Add(32); // 值类型装箱
var value = (int)arrayList[2]; // 拆箱 + 强转

List<T>

底层也是数组，但泛型、类型安全，避免装箱拆箱，性能高于 ArrayList。读取快，增删慢。

csharp 复制代码

List<int> intList = new List<int>() { 1, 2, 3, 4 };
intList.Add(123);
int val = intList[0]; // 直接访问，无需转换

性能排序：Array ≈ List<T> > ArrayList

非连续存储(链表类)

LinkedList<T>

双向链表，每个节点记录前后节点地址，内存非连续。不支持索引访问，查找只能遍历(慢)；增删只需修改指针(快)。

csharp 复制代码

LinkedList<int> linkedList = new LinkedList<int>();
linkedList.AddFirst(123);
linkedList.AddLast(456);
LinkedListNode<int> node = linkedList.Find(123);
linkedList.AddBefore(node, 9);
linkedList.AddAfter(node, 9);
linkedList.Remove(node);

Queue<T>

先进先出(FIFO)，像没有瓶底的瓶子。

csharp 复制代码

Queue<string> queue = new Queue<string>();
queue.Enqueue("one");       // 入队
string item = queue.Dequeue(); // 出队并移除
string peek = queue.Peek();    // 查看队首，不移除

应用场景：任务队列、消息队列、日志异步处理。

Stack<T>

先进后出(LIFO)，像有瓶底的瓶子。

csharp 复制代码

Stack<string> stack = new Stack<string>();
stack.Push("one");         // 入栈
string item = stack.Pop(); // 出栈并移除
string peek = stack.Peek(); // 查看栈顶，不移除

应用场景：表达式求值、撤销操作、解析表达式目录树。

三、Set 集合(去重容器)

HashSet<T>

基于哈希分布，自动去重，无序存储。对于引用类型，默认按引用判断是否重复；若要按值去重，需重写 Equals 和 GetHashCode。

csharp 复制代码

HashSet<string> hashSet = new HashSet<string>();
hashSet.Add("123");
hashSet.Add("123"); // 重复，不会添加
Console.WriteLine(hashSet.Count); // 1

// 集合运算
HashSet<string> set1 = new HashSet<string>() { "A", "B", "C" };
HashSet<string> set2 = new HashSet<string>() { "B", "C", "D" };
set1.UnionWith(set2);           // 并集
set1.IntersectWith(set2);       // 交集
set1.ExceptWith(set2);          // 差集
set1.SymmetricExceptWith(set2); // 对称差(补集)

应用场景：点赞去重、IP统计、好友推荐(求差集找出"对方认识但我不认识的人")。

SortedSet<T>

自动去重 + 自动排序。可通过 IComparer<T> 自定义排序规则。

csharp 复制代码

SortedSet<string> sortedSet = new SortedSet<string>();
sortedSet.Add("689");
sortedSet.Add("123");
sortedSet.Add("456");
// 遍历输出：123, 456, 689

应用场景：实时排行榜。

四、键值对结构(Key-Value)

哈希散列的核心思想：用 key 计算哈希值，映射到数组索引，实现 O(1) 的增删查改。代价是用空间换性能，数据量过大时散列冲突增多，性能下降。

Hashtable

非泛型，key 和 value 均为 object，有装箱拆箱开销。支持线程安全包装。

csharp 复制代码

Hashtable table = new Hashtable();
table.Add("key1", "value1");
table["key2"] = "value2"; // 直接赋值，key不存在则新增，存在则覆盖
Hashtable.Synchronized(table); // 线程安全版本(单写多读)

Dictionary<TKey, TValue>

泛型版哈希表，类型安全，性能高。按插入顺序遍历(但官方不保证顺序，不应依赖此特性)。非线程安全，多线程场景用 ConcurrentDictionary<TKey, TValue>。

csharp 复制代码

Dictionary<int, string> dic = new Dictionary<int, string>();
dic.Add(1, "HaHa");
dic[4] = "HuHu"; // 不存在则新增，存在则覆盖
// dic.Add(4, "HuHu"); // key已存在会抛异常
foreach (var item in dic)
{
    Console.WriteLine($"Key:{item.Key}, Value:{item.Value}");
}

SortedDictionary<TKey, TValue>

自动按 Key 排序，性能略低于 Dictionary。

csharp 复制代码

SortedDictionary<int, string> dic = new SortedDictionary<int, string>();
dic.Add(5, "HoHo");
dic.Add(1, "HaHa");
dic.Add(3, "HeHe");
// 遍历输出顺序：1, 3, 5

SortedList

非泛型，按 Key 排序，支持索引访问，内存占用比 SortedDictionary 小。TrimToSize() 可最小化内存开销。

csharp 复制代码

SortedList sortedList = new SortedList();
sortedList.Add("First", "Hello");
sortedList["Third"] = "~~";
sortedList.TrimToSize(); // 释放多余内存

五、接口体系

IEnumerable：所有集合都实现了此接口，提供统一的遍历方式(foreach)，核心方法是 GetEnumerator()，返回迭代器
ICollection<T>：继承自 IEnumerable<T>，增加了 Count、CopyTo、Contains 等方法，以及 Add、Remove(泛型版本才有增删)
IList<T>：继承自 ICollection<T>，增加了索引访问(this[int index])和 Insert、RemoveAt
IQueryable：用于延迟查询(LINQ、EF)，基于表达式目录树，遍历时才真正执行查询

六、迭代器模式

结论：迭代器模式为不同数据结构提供统一的访问接口，客户端无需关心底层是数组还是链表。

C# 中所有集合都实现了 IEnumerable，foreach 本质上就是调用 GetEnumerator() 获取迭代器，再反复调用 MoveNext() 和 Current。

自定义迭代器

有两种方式实现自定义迭代器：

方式一：用 yield(推荐，简洁)

csharp 复制代码

public class EnumeratorIterator<TSource> : IEnumerable<TSource>
{
    private readonly IEnumerable<TSource> source;
    private readonly Func<TSource, bool> predicate;

    public EnumeratorIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate)
    {
        this.source = source;
        this.predicate = predicate;
    }

    public IEnumerator<TSource> GetEnumerator()
    {
        foreach (var item in source)
        {
            if (predicate(item))
                yield return item; // 编译器生成状态机
        }
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

方式二：手动实现 IEnumerator(理解底层用)

csharp 复制代码

public class ManualIterator<T> : IEnumerator<T>
{
    private readonly T[] _data;
    private int _index = -1;

    public ManualIterator(T[] data) { _data = data; }

    public T Current => _data[_index];
    object IEnumerator.Current => Current;

    public bool MoveNext()
    {
        _index++;
        return _index < _data.Length;
    }

    public void Reset() { _index = -1; }
    public void Dispose() { }
}

方式二展示了 yield 背后编译器实际生成的逻辑：维护一个 _index 状态，每次 MoveNext() 推进一步，Current 返回当前值。

迭代器模式实战：统一访问不同菜单

KFC 菜单用数组存储，麦当劳菜单用 List 存储，通过统一的 IIterator<Food> 接口访问：

csharp 复制代码

IIterator<Food> iterator = kfcMenu.GetEnumerator();
while (iterator.MoveNext())
{
    Food food = iterator.Current;
    Console.WriteLine(food.Name);
}
// 换成麦当劳菜单，调用方式完全一样
IIterator<Food> iterator1 = macDonaldMenu.GetEnumerator();
while (iterator1.MoveNext()) { ... }

七、yield 关键字

结论：yield 是语法糖，编译器自动生成状态机(迭代器)，实现 MoveNext、Current、Reset，让按需返回数据变得极其简洁。

yield return：按需返回

csharp 复制代码

public IEnumerable<int> Power()
{
    for (int i = 0; i < 10; i++)
    {
        yield return Get(i); // 每次调用 MoveNext 才执行到这里
        Console.WriteLine("yield 之后继续执行");
    }
}

调用 Power() 时不会立即执行任何代码，只有 foreach 遍历时，每次 MoveNext() 才推进一步。

yield break：提前终止

csharp 复制代码

public IEnumerable<int> CreateEnumerable()
{
    for (int i = 0; i < 5; i++)
    {
        yield return i;
        if (i == 4)
            yield break; // 终止迭代，后续代码不再执行
    }
    yield return -1; // 不会执行到
}

yield 与 finally

含有 yield 的方法中，finally 块会在迭代器被释放(Dispose)时执行，即使中途 break 也会触发：

csharp 复制代码

public IEnumerable<int> CreateEnumerable()
{
    try
    {
        for (int i = 0; i < 5; i++)
        {
            yield return i;
        }
    }
    finally
    {
        Console.WriteLine("停止迭代！"); // foreach break 后也会执行
    }
}

yield vs 普通方法

csharp 复制代码

// yield：按需获取，延迟执行，节省内存
public IEnumerable<int> Power()
{
    for (int i = 0; i < 10; i++)
        yield return Get(i); // 要一个拿一个
}

// 普通方法：一次性全部计算，全部放入内存
public IEnumerable<int> Common()
{
    List<int> list = new List<int>();
    for (int i = 0; i < 10; i++)
        list.Add(Get(i)); // 先全部算完
    return list;
}

自定义 LINQ 扩展方法

csharp 复制代码

public static IEnumerable<T> ElevenWhere<T>(
    this IEnumerable<T> source,
    Func<T, bool> func)
{
    foreach (var item in source)
    {
        if (func.Invoke(item))
            yield return item; // 延迟执行，遍历时才过滤
    }
}

// 使用
var result = studentList.ElevenWhere(s => s.Age < 30);
foreach (var item in result) // 遍历时才真正执行过滤逻辑
{
    Console.WriteLine(item.Name);
}

yield 的应用场景：大数据集分页加载、无限序列生成、LINQ 的 Where/Select 等延迟查询。

八、dynamic 关键字

结论：dynamic 是 C# 4.0 引入的动态类型，让 C# 具备弱类型特点，类型检查推迟到运行时。

csharp 复制代码

// 强类型：编译时检查
string s = "abcd";
// int i = (int)s; // 编译错误

// dynamic：运行时检查
dynamic d = "abcd";
int i = (int)d;  // 编译通过，运行时报错
d.Hello();       // 编译通过，运行时报错

任何与 dynamic 交互的表达式，结果也是 dynamic：

csharp 复制代码

dynamic str = "abcd";
Console.WriteLine(str.Length);      // dynamic
Console.WriteLine(str.Substring(1)); // dynamic

dynamic 的三个主要用途：

代替反射，性能比反射高：

csharp 复制代码

object obj = new YieldDemo();

// 反射方式
Type type = obj.GetType();
type.GetMethod("Power").Invoke(obj, null);

// dynamic 方式(更简洁，性能更好)
dynamic dObj = obj;
dObj.Power();

简化数据绑定，无需强转
与 COM/C++ 互操作更方便

注意：dynamic 失去了编译时类型检查，错误只在运行时暴露，使用时需谨慎。

九、线程安全集合

System.Collections.Concurrent 命名空间提供线程安全版本：

ConcurrentQueue<T>：线程安全的队列(FIFO)
ConcurrentStack<T>：线程安全的栈(LIFO)
ConcurrentBag<T>：线程安全的无序集合
ConcurrentDictionary<TKey, TValue>：线程安全的字典
BlockingCollection<T>：支持阻塞和限界的集合

十、性能对比与选型

数据结构	查询	增加	删除	特点
Array	O(1)	O(n)	O(n)	固定长度，内存连续
List<T>	O(1)	O(1)/O(n)*	O(n)	动态长度，内存连续
LinkedList<T>	O(n)	O(1)	O(1)	链表，非连续
HashSet<T>	O(1)	O(1)	O(1)	去重，无序
Dictionary<K,V>	O(1)	O(1)	O(1)	键值对，哈希散列

*List 尾部追加不需扩容时 O(1)，触发扩容或中间插入时 O(n)

选型建议：

频繁随机访问、少量增删 → Array 或 List<T>
频繁头部/中间增删 → LinkedList<T>
需要去重 → HashSet<T>
需要键值映射，增删查改都快 → Dictionary<TKey, TValue>
需要自动排序 → SortedSet<T> 或 SortedDictionary<TKey, TValue>
多线程场景 → Concurrent 系列