手撕哈希表（Hash Table）：从原理到C++完整实现

哈希表作为O(1)级别查找 的数据结构，是面试与工程开发中的高频考点。本文从哈希核心概念讲起，深入哈希函数、哈希冲突、两种冲突解决方案，并提供可直接运行的C++完整代码，带你彻底吃透哈希表。

文章目录

[手撕哈希表（Hash Table）：从原理到C++完整实现](#手撕哈希表（Hash Table）：从原理到C++完整实现)
- 一、哈希表核心概念
- - [1.1 什么是哈希](#1.1 什么是哈希)
  - [1.2 直接定址法](#1.2 直接定址法)
  - [1.3 哈希冲突](#1.3 哈希冲突)
  - [1.4 负载因子](#1.4 负载因子)
  - [1.5 关键字转整数](#1.5 关键字转整数)
- 二、哈希函数设计
- - [2.1 除法散列法（最常用）](#2.1 除法散列法（最常用）)
  - [2.2 乘法散列法](#2.2 乘法散列法)
  - [2.3 全域散列法](#2.3 全域散列法)
- 三、哈希冲突解决方案
- - [3.1 开放定址法](#3.1 开放定址法)
  - - [3.1.1 线性探测](#3.1.1 线性探测)
    - [3.1.2 二次探测](#3.1.2 二次探测)
    - [3.1.3 双重散列](#3.1.3 双重散列)
  - [3.2 链地址法（哈希桶，工程首选）](#3.2 链地址法（哈希桶，工程首选）)
- 四、C++完整实现
- - [4.1 通用哈希仿函数](#4.1 通用哈希仿函数)
  - [4.2 开放定址法（线性探测）完整代码](#4.2 开放定址法（线性探测）完整代码)
  - [4.3 链地址法（哈希桶）完整代码](#4.3 链地址法（哈希桶）完整代码)
- 五、测试代码
- 六、总结

一、哈希表核心概念

1.1 什么是哈希

哈希（Hash）又称散列，通过哈希函数建立关键字Key与存储位置的映射关系 ，实现数据的快速插入、查找、删除，理想时间复杂度为O(1)。

1.2 直接定址法

关键字范围集中时的极简哈希方式：

关键字为[0,99]整数：直接用关键字作为数组下标
关键字为小写字母：下标 = 字符ASCII码 - 'a'的ASCII码

示例：LeetCode 387. 字符串中的第一个唯一字符

cpp 复制代码

class Solution {
public:
    int firstUniqChar(string s) {
        // 用字母相对ASCII码作为下标，统计字符出现次数
        int count[26] = {0};
        for (auto ch : s) {
            count[ch - 'a']++;
        }
        // 遍历找第一个出现一次的字符
        for (size_t i = 0; i < s.size(); ++i) {
            if (count[s[i] - 'a'] == 1) {
                return i;
            }
        }
        return -1;
    }
};

1.3 哈希冲突

不同关键字通过哈希函数计算出相同存储位置，称为哈希冲突（哈希碰撞）。

冲突无法完全避免，只能通过优秀哈希函数减少冲突 ，并设计冲突解决方案。

1.4 负载因子

负载因子 = 哈希表中元素个数 / 哈希表大小

负载因子越大：冲突概率越高，空间利用率越高
负载因子越小：冲突概率越低，空间利用率越低

1.5 关键字转整数

非整数类型（如string、自定义类型）需先转为整数，再进行哈希计算。

二、哈希函数设计

哈希函数核心目标：让关键字均匀散列到哈希表中，减少冲突。

2.1 除法散列法（最常用）

公式：h(key) = key % M（M为哈希表大小）

建议：M取不接近2的整数次幂的质数，避免后几位固定导致大量冲突
Java HashMap优化：M取2的整数次幂，用位运算替代取模，同时让key所有位参与计算

2.2 乘法散列法

公式：h(key) = floor(M × ((A × key) % 1.0))

A取黄金分割比(√5-1)/2 ≈ 0.618，对表大小M无要求

2.3 全域散列法

随机选择哈希函数，防止恶意构造数据导致极端冲突，公式：
h_ab(key) = ((a × key + b) % P) % M

P为大质数，a∈ $1,P-1$ ，b∈ $0,P-1$

三、哈希冲突解决方案

3.1 开放定址法

所有元素存储在哈希表数组中，冲突时按规则寻找空位置，负载因子必须<1。

3.1.1 线性探测

冲突后依次向后探测，公式：hashi = (hash0 + i) % M（i=1,2,3...）

优点：实现简单
缺点：易产生数据堆积，查找效率下降

3.1.2 二次探测

冲突后按平方数跳跃探测，公式：hashi = (hash0 ± i²) % M

优点：缓解线性探测的堆积问题

3.1.3 双重散列

用第二个哈希函数计算偏移量，公式：hashi = (hash0 + i × h2(key)) % M

要求：h2(key)与M互质，保证遍历全表

3.2 链地址法（哈希桶，工程首选）

哈希表存储链表指针，冲突元素挂在对应位置的链表上，负载因子可>1。

优点：无堆积问题，实现简单，效率稳定
Java8 HashMap优化：链表长度>8转为红黑树，进一步提升效率

四、C++完整实现

4.1 通用哈希仿函数

支持int、string等类型转整数，string采用BKDR哈希（减少冲突）：

cpp 复制代码

// 通用哈希仿函数
template<class K>
struct HashFunc {
    size_t operator()(const K& key) {
        return (size_t)key;
    }
};

// string特化（BKDR哈希）
template<>
struct HashFunc<string> {
    size_t operator()(const string& key) {
        size_t hash = 0;
        for (auto ch : key) {
            hash = hash * 131 + ch; // 131为优质质数
        }
        return hash;
    }
};

// SGI STL质数表（扩容用）
inline unsigned long __stl_next_prime(unsigned long n) {
    static const int __stl_num_primes = 28;
    static const unsigned long __stl_prime_list[__stl_num_primes] = {
        53, 97, 193, 389, 769,
        1543, 3079, 6151, 12289, 24593,
        49157, 98317, 196613, 393241, 786433,
        1572869, 3145739, 6291469, 12582917, 25165843,
        50331653, 100663319, 201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };

    const unsigned long* first = __stl_prime_list;
    const unsigned long* last = __stl_prime_list + __stl_num_primes;
    const unsigned long* pos = lower_bound(first, last, n);
    return pos == last ? *(last - 1) : *pos;
}

4.2 开放定址法（线性探测）完整代码

cpp 复制代码

namespace open_address {
    // 节点状态：存在/空/已删除
    enum State {
        EXIST,
        EMPTY,
        DELETE
    };

    // 哈希节点
    template<class K, class V>
    struct HashData {
        pair<K, V> _kv;
        State _state = EMPTY;
    };

    // 开放定址哈希表
    template<class K, class V, class Hash = HashFunc<K>>
    class HashTable {
    public:
        HashTable() {
            _tables.resize(__stl_next_prime(0));
        }

        // 插入
        bool Insert(const pair<K, V>& kv) {
            if (Find(kv.first)) {
                return false;
            }

            // 负载因子>0.7扩容
            if (_n * 10 / _tables.size() >= 7) {
                HashTable<K, V, Hash> newHT;
                newHT._tables.resize(__stl_next_prime(_tables.size() + 1));
                // 旧数据重新映射到新表
                for (size_t i = 0; i < _tables.size(); ++i) {
                    if (_tables[i]._state == EXIST) {
                        newHT.Insert(_tables[i]._kv);
                    }
                }
                _tables.swap(newHT._tables);
            }

            Hash hash;
            size_t hash0 = hash(kv.first) % _tables.size();
            size_t hashi = hash0;
            size_t i = 1;
            // 线性探测找空位置
            while (_tables[hashi]._state == EXIST) {
                hashi = (hash0 + i) % _tables.size();
                ++i;
            }

            // 插入数据
            _tables[hashi]._kv = kv;
            _tables[hashi]._state = EXIST;
            ++_n;
            return true;
        }

        // 查找
        HashData<K, V>* Find(const K& key) {
            Hash hash;
            size_t hash0 = hash(key) % _tables.size();
            size_t hashi = hash0;
            size_t i = 1;
            // 遇到EMPTY停止查找
            while (_tables[hashi]._state != EMPTY) {
                if (_tables[hashi]._state == EXIST && _tables[hashi]._kv.first == key) {
                    return &_tables[hashi];
                }
                hashi = (hash0 + i) % _tables.size();
                ++i;
            }
            return nullptr;
        }

        // 删除（标记删除，不实际移除）
        bool Erase(const K& key) {
            HashData<K, V>* ret = Find(key);
            if (ret == nullptr) {
                return false;
            }
            ret->_state = DELETE;
            --_n;
            return true;
        }

    private:
        vector<HashData<K, V>> _tables;
        size_t _n = 0; // 有效元素个数
    };
}

4.3 链地址法（哈希桶）完整代码

cpp 复制代码

namespace hash_bucket {
    // 链表节点
    template<class K, class V>
    struct HashNode {
        pair<K, V> _kv;
        HashNode<K, V>* _next;

        HashNode(const pair<K, V>& kv) 
            : _kv(kv), _next(nullptr) 
        {}
    };

    // 链地址哈希表
    template<class K, class V, class Hash = HashFunc<K>>
    class HashTable {
        typedef HashNode<K, V> Node;
    public:
        HashTable() {
            _tables.resize(__stl_next_prime(0), nullptr);
        }

        // 析构：释放所有节点
        ~HashTable() {
            for (size_t i = 0; i < _tables.size(); ++i) {
                Node* cur = _tables[i];
                while (cur) {
                    Node* next = cur->_next;
                    delete cur;
                    cur = next;
                }
                _tables[i] = nullptr;
            }
        }

        // 插入（头插）
        bool Insert(const pair<K, V>& kv) {
            if (Find(kv.first)) {
                return false;
            }

            Hash hs;
            size_t hashi = hs(kv.first) % _tables.size();

            // 负载因子==1扩容
            if (_n == _tables.size()) {
                vector<Node*> newTables(__stl_next_prime(_tables.size() + 1), nullptr);
                // 旧节点直接移动到新表（高效）
                for (size_t i = 0; i < _tables.size(); ++i) {
                    Node* cur = _tables[i];
                    while (cur) {
                        Node* next = cur->_next;
                        // 重新计算哈希位置
                        size_t newHash = hs(cur->_kv.first) % newTables.size();
                        // 头插新表
                        cur->_next = newTables[newHash];
                        newTables[newHash] = cur;
                        cur = next;
                    }
                    _tables[i] = nullptr;
                }
                _tables.swap(newTables);
                // 重新计算插入位置
                hashi = hs(kv.first) % _tables.size();
            }

            // 头插法插入新节点
            Node* newNode = new Node(kv);
            newNode->_next = _tables[hashi];
            _tables[hashi] = newNode;
            ++_n;
            return true;
        }

        // 查找
        Node* Find(const K& key) {
            Hash hs;
            size_t hashi = hs(key) % _tables.size();
            Node* cur = _tables[hashi];
            // 遍历对应链表
            while (cur) {
                if (cur->_kv.first == key) {
                    return cur;
                }
                cur = cur->_next;
            }
            return nullptr;
        }

        // 删除
        bool Erase(const K& key) {
            Hash hs;
            size_t hashi = hs(key) % _tables.size();
            Node* prev = nullptr;
            Node* cur = _tables[hashi];

            while (cur) {
                if (cur->_kv.first == key) {
                    // 头节点删除
                    if (prev == nullptr) {
                        _tables[hashi] = cur->_next;
                    } else {
                        // 中间/尾节点删除
                        prev->_next = cur->_next;
                    }
                    delete cur;
                    --_n;
                    return true;
                }
                prev = cur;
                cur = cur->_next;
            }
            return false;
        }

    private:
        vector<Node*> _tables; // 指针数组（哈希桶）
        size_t _n = 0; // 有效元素个数
    };
}

五、测试代码

cpp 复制代码

int main() {
    // 测试链地址法哈希表
    hash_bucket::HashTable<int, string> ht;
    ht.Insert(make_pair(1, "one"));
    ht.Insert(make_pair(2, "two"));
    ht.Insert(make_pair(3, "three"));

    // 查找测试
    auto node = ht.Find(2);
    if (node) {
        cout << "key:2 value:" << node->_kv.second << endl;
    }

    // 删除测试
    ht.Erase(2);
    node = ht.Find(2);
    if (!node) {
        cout << "key:2 删除成功" << endl;
    }

    // 测试string类型
    hash_bucket::HashTable<string, int> strHT;
    strHT.Insert(make_pair("hash", 100));
    strHT.Insert(make_pair("table", 200));
    auto strNode = strHT.Find("hash");
    if (strNode) {
        cout << "key:hash value:" << strNode->_kv.second << endl;
    }

    return 0;
}

六、总结

哈希表核心：哈希函数+冲突解决，目标O(1)查找
哈希函数：除法散列法最常用，string推荐BKDR哈希
冲突解决 ：
- 开放定址法：实现简单，易堆积，负载因子<1
- 链地址法：工程首选，无堆积，支持负载因子>1
扩容：哈希表大小取质数，负载因子达到阈值自动扩容

本文实现的哈希表可直接用于学习与面试，理解原理后，可轻松掌握unordered_map/unordered_set底层逻辑。