C++ 哈希表封装 unordered_map /unordered_set

🍁 前言

在 C++ 编程世界里，unordered_map / unordered_set 是平均 O(1) 复杂度的神器，广泛用于查找、去重、统计、缓存等场景。无论是面试、竞赛还是工程项目，理解其底层实现都是 ** 从 "会用" 到 "精通"** 的关键一步。

但绝大多数同学只停留在调用接口层面，对底层哈希表、哈希函数、冲突解决、扩容机制、迭代器设计一知半解。

本文将带你：

从 SGI-STL 源码理解哈希表封装思想
手写链地址法（拉链法） 高性能哈希表
实现迭代器 + const 迭代器，支持范围 for
封装 myunordered_map + myunordered_set
实现 operator[]、insert、find、erase 标准接口
解决 key 不可修改、const 正确性、模板嵌套、typename 等高频难点
提供可直接编译运行的完整代码

一、底层基石：哈希表与 STL 设计思想

1.1 为什么需要哈希表？

数组查找快但插入慢；链表插入快但查找慢。哈希表 = 数组 + 链表，结合两者优点：

通过哈希函数把 key 映射到数组下标 → O (1) 定位
冲突时使用链表挂载 → 解决冲突

平均：插入 / 删除 / 查找 **O(1)**最坏：O (N)（极少出现）

1.2 SGI-STL 设计精髓：适配器复用

SGI-STL 中：

hash_map / hash_set 并非各自实现
而是复用同一个哈希表 hashtable
上层只做接口封装，底层逻辑完全共享

这就是适配器模式：

hashtable：底层通用结构
unordered_map / unordered_set：上层适配器

二、哈希表核心理论（必须理解）

2.1 哈希函数

把任意 key 转为整型，再对表长取模，得到桶下标。

cpp 复制代码

hash_index = hash_func(key) % table_size

常见哈希算法：

除留余数法
BKDRHash（字符串最优）
MurmurHash（工程常用）

2.2 哈希冲突

不同 key 算出同一下标。解决方法：

开放定址法（线性探测、二次探测）
链地址法（拉链法） → STL 选择
再哈希法
公共溢出区

2.3 链地址法（哈希桶）原理

哈希表 = 指针数组（每个位置叫一个桶）
每个桶是一条链表
冲突元素直接插入链表
负载因子达到 1.0 扩容

2.4 扩容规则

表长必须取质数，降低冲突率
扩容后重新映射所有元素
采用移动节点而非拷贝，提升效率

三、整体架构设计（最核心）

3.1 三层架构

HashTable：底层哈希表（泛型）
KeyOfT 仿函数：从数据中提取 key
unordered_map / unordered_set：上层适配器

3.2 KeyOfT 仿函数（灵魂）

哈希表不知道存储的是 key 还是 pair<K,V>。通过仿函数统一提取 key：

set：SetKeyOfT → 返回 key
map：MapKeyOfT → 返回 pair.first

cpp 复制代码

// set 取 key
struct SetKeyOfT {
    const K& operator()(const K& key) {
        return key;
    }
};

// map 取 key
struct MapKeyOfT {
    const K& operator()(const pair<K, V>& kv) {
        return kv.first;
    }
};

3.3 key 不可修改原则

unordered_set：存储 const K
unordered_map：存储 pair<const K, V>key 改变会导致哈希位置失效，因此必须设为 const。

四、手写哈希表：HashTable 完整实现

4.1 哈希节点 HashNode

cpp 复制代码

template<class T>
struct HashNode
{
    T _data;
    HashNode<T>* _next;

    HashNode(const T& data)
        :_data(data)
        ,_next(nullptr)
    {}
};

4.2 哈希函数（支持 string）

cpp 复制代码

template<class K>
struct HashFunc
{
    size_t operator()(const K& key)
    {
        return (size_t)key;
    }
};

template<>
struct HashFunc<string>
{
    size_t operator()(const string& s)
    {
        size_t hash = 0;
        for (auto ch : s)
        {
            hash = hash * 131 + ch;
        }
        return hash;
    }
};

4.3 质数表（扩容用）

cpp 复制代码

inline unsigned long _stl_next_prime(unsigned long n)
{
    static const int __stl_num_primes = 28;
    static const unsigned long __stl_prime_list[__stl_num_primes] =
    {
        53, 97, 193, 389, 769,
        1543, 3079, 6151, 12289, 24593,
        49157, 98317, 196613, 393241, 786433,
        1572869, 3145739, 6291469, 12582917, 25165843,
        50331653, 100663319, 201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };
    for (int i = 0; i < __stl_num_primes; ++i)
    {
        if (__stl_prime_list[i] > n)
            return __stl_prime_list[i];
    }
    return __stl_prime_list[__stl_num_primes - 1];
}

4.4 迭代器实现（最难）

4.4.1 为什么迭代器要存哈希表指针？

因为 operator++ 需要跨桶遍历：

先遍历当前桶链表
链表走完 → 计算桶号 → 向后找下一个非空桶
没有哈希表指针无法找到桶数组

cpp 复制代码

template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
struct HTIterator
{
    typedef HashNode<T> Node;
    typedef HashTable<K, T, KeyOfT, Hash> HT;
    typedef HTIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;

    Node* _node;
    const HT* _ht;

    HTIterator(Node* node, const HT* ht)
        :_node(node)
        ,_ht(ht)
    {}

4.4.2 解引用与箭头

cpp 复制代码

    Ref operator*()
    {
        return _node->_data;
    }

    Ptr operator->()
    {
        return &_node->_data;
    }

    bool operator!=(const Self& s) const
    {
        return _node != s._node;
    }

4.4.3 迭代器自增（跨桶核心）

cpp 复制代码

    Self& operator++()
    {
        if (_node->_next)
        {
            _node = _node->_next;
        }
        else
        {
            KeyOfT kot;
            Hash hash;
            size_t hashi = hash(kot(_node->_data)) % _ht->_tables.size();
            hashi++;

            while (hashi < _ht->_tables.size() && _ht->_tables[hashi] == nullptr)
            {
                hashi++;
            }

            if (hashi == _ht->_tables.size())
            {
                _node = nullptr;
            }
            else
            {
                _node = _ht->_tables[hashi];
            }
        }
        return *this;
    }
};

4.5 HashTable 完整实现

cpp 复制代码

template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
class HashTable
{
    template<class K1, class T1, class Ref, class Ptr, class KeyOfT1, class Hash1>
    friend struct HTIterator;

    typedef HashNode<T> Node;

public:
    typedef HTIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;
    typedef HTIterator<K, T, const T&, const T*, KeyOfT, Hash> ConstIterator;

    // begin / end
    Iterator Begin()
    {
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            if (_tables[i])
                return Iterator(_tables[i], this);
        }
        return End();
    }

    Iterator End()
    {
        return Iterator(nullptr, this);
    }

    ConstIterator Begin() const
    {
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            if (_tables[i])
                return ConstIterator(_tables[i], this);
        }
        return End();
    }

    ConstIterator End() const
    {
        return ConstIterator(nullptr, this);
    }

    // 构造与析构
    HashTable()
        :_tables(_stl_next_prime(0))
        ,_n(0)
    {}

    ~HashTable()
    {
        for (size_t i = 0; i < _tables.size(); ++i)
        {
            Node* cur = _tables[i];
            while (cur)
            {
                Node* next = cur->_next;
                delete cur;
                cur = next;
            }
            _tables[i] = nullptr;
        }
    }

    // 插入（返回迭代器+是否成功）
    pair<Iterator, bool> Insert(const T& data)
    {
        KeyOfT kot;
        Hash hash;
        Iterator it = Find(kot(data));
        if (it != End())
        {
            return { it, false };
        }

        if (_n == _tables.size())
        {
            vector<Node*> newTable(_stl_next_prime(_tables.size()));
            for (size_t i = 0; i < _tables.size(); ++i)
            {
                Node* cur = _tables[i];
                while (cur)
                {
                    Node* next = cur->_next;
                    size_t hashi = hash(kot(cur->_data)) % newTable.size();
                    cur->_next = newTable[hashi];
                    newTable[hashi] = cur;
                    cur = next;
                }
                _tables[i] = nullptr;
            }
            _tables.swap(newTable);
        }

        size_t hashi = hash(kot(data)) % _tables.size();
        Node* newnode = new Node(data);
        newnode->_next = _tables[hashi];
        _tables[hashi] = newnode;
        ++_n;

        return { Iterator(newnode, this), true };
    }

    // 查找
    Iterator Find(const K& key)
    {
        KeyOfT kot;
        Hash hash;
        size_t hashi = hash(key) % _tables.size();
        Node* cur = _tables[hashi];

        while (cur)
        {
            if (kot(cur->_data) == key)
            {
                return Iterator(cur, this);
            }
            cur = cur->_next;
        }
        return End();
    }

    // 删除
    bool Erase(const K& key)
    {
        KeyOfT kot;
        Hash hash;
        size_t hashi = hash(key) % _tables.size();
        Node* prev = nullptr;
        Node* cur = _tables[hashi];

        while (cur)
        {
            if (kot(cur->_data) == key)
            {
                if (prev == nullptr)
                {
                    _tables[hashi] = cur->_next;
                }
                else
                {
                    prev->_next = cur->_next;
                }
                delete cur;
                --_n;
                return true;
            }
            prev = cur;
            cur = cur->_next;
        }
        return false;
    }

private:
    vector<Node*> _tables;
    size_t _n = 0;
};

五、封装 myunordered_set（只读迭代器）

cpp 复制代码

#pragma once
#include "HashTable.h"

namespace mySTL
{
    template<class K, class Hash = HashFunc<K>>
    class unordered_set
    {
        struct SetKeyOfT
        {
            const K& operator()(const K& key)
            {
                return key;
            }
        };

    public:
        typedef typename HashTable<K, const K, SetKeyOfT, Hash>::Iterator iterator;
        typedef typename HashTable<K, const K, SetKeyOfT, Hash>::ConstIterator const_iterator;

        iterator begin()
        {
            return _ht.Begin();
        }

        iterator end()
        {
            return _ht.End();
        }

        const_iterator begin() const
        {
            return _ht.Begin();
        }

        const_iterator end() const
        {
            return _ht.End();
        }

        pair<iterator, bool> insert(const K& key)
        {
            return _ht.Insert(key);
        }

        iterator find(const K& key)
        {
            return _ht.Find(key);
        }

        bool erase(const K& key)
        {
            return _ht.Erase(key);
        }

    private:
        HashTable<K, const K, SetKeyOfT, Hash> _ht;
    };
}

六、封装 myunordered_map（支持 operator \[\]）

cpp 复制代码

#pragma once
#include "HashTable.h"

namespace mySTL
{
    template<class K, class V, class Hash = HashFunc<K>>
    class unordered_map
    {
        struct MapKeyOfT
        {
            const K& operator()(const pair<const K, V>& kv)
            {
                return kv.first;
            }
        };

    public:
        typedef typename HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::Iterator iterator;
        typedef typename HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::ConstIterator const_iterator;

        iterator begin()
        {
            return _ht.Begin();
        }

        iterator end()
        {
            return _ht.End();
        }

        const_iterator begin() const
        {
            return _ht.Begin();
        }

        const_iterator end() const
        {
            return _ht.End();
        }

        pair<iterator, bool> insert(const pair<K, V>& kv)
        {
            return _ht.Insert(make_pair(kv.first, kv.second));
        }

        iterator find(const K& key)
        {
            return _ht.Find(key);
        }

        bool erase(const K& key)
        {
            return _ht.Erase(key);
        }

        // operator[]
        V& operator[](const K& key)
        {
            pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
            return ret.first->second;
        }

    private:
        HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
    };
}

七、测试代码（可直接运行）

cpp 复制代码

#include <iostream>
#include <string>
#include "myUnorderedMap.h"
#include "myUnorderedSet.h"

using namespace std;
using namespace mySTL;

void test_unordered_set()
{
    unordered_set<int> s;
    s.insert(1);
    s.insert(2);
    s.insert(3);
    s.insert(2);

    for (auto x : s)
    {
        cout << x << " ";
    }
    cout << endl;
}

void test_unordered_map()
{
    unordered_map<string, string> dict;
    dict["hello"] = "你好";
    dict["hash"] = "哈希";
    dict["map"] = "映射";

    for (auto& kv : dict)
    {
        cout << kv.first << " : " << kv.second << endl;
    }
}

int main()
{
    test_unordered_set();
    test_unordered_map();
    return 0;
}

八、高频面试题（本文全覆盖）

1. unordered_map /unordered_set 底层结构？

哈希表（链地址法）。

2. 为什么迭代器要存哈希表指针？

为了支持 operator++ 跨桶遍历。

3. 为什么 map 存储 pair<const K,V>？

key 不能修改，否则哈希位置失效，结构崩溃。

4. 哈希表为什么使用质数容量？

让余数分布更均匀，大幅降低冲突率。

5. 为什么 unordered_map 和 unordered_set 可以复用？

通过 KeyOfT 仿函数统一提取 key。

6. 扩容为什么要重新哈希？

表长度变化，元素下标全部改变。

7. 为什么要实现 const_iterator？

保证 const 对象可遍历，且数据不可被修改。

8. insert 为什么返回 pair<iterator, bool>？

iterator：指向元素
bool：是否插入成功是实现 operator[] 的基础。