【探寻C++之旅】第十六章：unordered系列的认识与模拟实现

请君浏览

- 前言
- [1. 了解unordered系列](#1. 了解unordered系列)
- - [1.1 初步认识](#1.1 初步认识)
  - [1.2 核心差异](#1.2 核心差异)
  - [1.3 使用示例](#1.3 使用示例)
  - [1.4 与普通 map/set的区别](#1.4 与普通 map/set的区别)
- [2. 模拟实现unordered系列](#2. 模拟实现unordered系列)
- - [2.1 迭代器](#2.1 迭代器)
- 尾声

前言

今天，我们继续踏入追寻C++的冒险历程。上一章我们讲解了数据结构哈希表，那么本章让我们用哈希表来模拟实现一下C++STL中的unordered_map和unordered_set。下面让我们一起来进入本章的学习。

1. 了解unordered系列

在 C++ STL（标准模板库）中，unordered_map 和 unordered_set 是两种基于哈希表（Hash Table） 实现的关联式容器，核心特点是无序存储、平均时间复杂度 O (1) 的增删查操作，但需注意哈希冲突对性能的影响。二者设计目标不同（键值对存储 vs 单一值存储），但底层原理和核心特性高度相似。

1.1 初步认识

unordered_set：

unordered_set的声明如下，Key就是unordered_set底层关键字的类型

cpp 复制代码

template <  class Key, 					   // unordered_set::key_type/value_type
            class Hash = hash<Key>, 		// unordered_set::hasher
            class Pred = equal_to<Key>, 	// unordered_set::key_equal
            class Alloc = allocator<Key> 	// unordered_set::allocator_type
> class unordered_set;

unordered_set默认要求Key⽀持转换为整形，如果不⽀持或者想按⾃⼰的需求⾛可以⾃⾏实现⽀持将Key转成整形的仿函数传给第⼆个模板参数
unordered_set默认要求Key⽀持⽐较相等，如果不⽀持或者想按⾃⼰的需求⾛可以⾃⾏实现⽀持将Key⽐较相等的仿函数传给第三个模板参数
unordered_set底层存储数据的内存是从空间配置器申请的，如果需要可以⾃⼰实现内存池，传给第四个参数。

⼀般情况下，我们都不需要传后三个模板参数。unordered_set迭代器遍历不再有序，为了跟set区分，所以取名unordered_set。前⾯部分我们已经学习了set容器的使⽤，set和unordered_set的功能⾼度相似，只是底层结构不同，有⼀些性能和使⽤的差异，下面会讲他们的差异部分。

unordered_map：

unoredered_map的声明如下：

cpp 复制代码

template < class Key,                                    // unordered_map::key_type
           class T,                                      // unordered_map::mapped_type
           class Hash = hash<Key>,                       // unordered_map::hasher
           class Pred = equal_to<Key>,                   // unordered_map::key_equal
> class unordered_map;

unordered_map 和 unordered_set 共享相同的底层实现逻辑 ------链式哈希表（哈希桶），这决定了它们的共同特性：

存储结构：哈希桶（Hash Bucket）

底层由「数组 + 链表 / 红黑树」组成：

哈希数组（桶数组）：数组的每个元素称为一个「桶（Bucket）」，桶的索引由「键的哈希值 % 桶数组大小」计算得出；
解决哈希冲突：当两个不同的键计算出相同的桶索引时（哈希冲突），桶内会通过「链表」存储冲突元素（C++11 后部分实现会在链表长度超过阈值时转为红黑树，进一步优化查询效率）。

这种结构的核心优势是：平均情况下，增删查操作的时间复杂度为 O (1)（直接通过哈希值定位桶，再遍历桶内短链表）；最坏情况（所有键哈希冲突）下复杂度为 O (n)（退化为单链表遍历）。

无序性：

与 map/set（基于红黑树实现，有序存储）不同，unordered_map/unordered_set 的元素不按键的大小排序，存储顺序完全由键的哈希值决定，因此遍历结果是 "随机无序" 的。

键的唯一性：

二者均要求「键（Key）不可重复」：

插入已存在的键时，insert 函数会返回 pair<iterator, bool>，其中 bool 为 false，表示插入失败，迭代器指向已存在的旧键；
无法通过任何接口修改已插入的键（unordered_map 的键被封装为 const K，unordered_set 的值本身就是键），只能删除后重新插入。

哈希函数与自定义键：

默认情况下，二者使用 STL 提供的 std::hash 模板计算键的哈希值（支持 int、string、double 等基础类型）；若存储自定义类型（如自定义结构体） ，需手动提供哈希函数（通过模板参数传入，如 unordered_map<MyStruct, int, MyHashFunc>），同时确保哈希函数满足：

相同的键必须生成相同的哈希值；
不同的键尽量生成不同的哈希值（减少冲突）。

1.2 核心差异

存储内容与使用场景：

unordered_map 和 unordered_set 的本质区别在于存储的元素类型，这直接决定了它们的使用场景：

特性	unordered_map	unordered_set
存储元素	键值对（`std::pair<const K, V>`）	单一值（`const K`，值即键）
核心功能	通过键快速查找对应的值（键值映射）	快速判断一个值是否存在（去重、判重）
关键接口差异	支持 `operator[]`（通过键访问 / 修改值）	无 `operator[]`（仅需判断值是否存在）
迭代器解引用结果	指向键值对（`pair<const K, V>`），可通过 `->first` 访问键、`->second` 访问值	指向单一值（`const K`），解引用直接得到键
典型使用场景	字典（如单词 - 翻译映射）、缓存（如 ID - 用户信息映射）	去重（如统计数组中不重复元素）、快速判重（如判断用户是否在黑名单）

1.3 使用示例

unordered_map 常用接口：

示例：

cpp 复制代码

#include <iostream>
#include <unordered_map>
#include <string>
using namespace std;

int main() {
    // 1. 定义 unordered_map（键：string，值：int）
    unordered_map<string, int> score_map;

    // 2. 插入键值对（三种方式）
    score_map.insert({"Alice", 95});          // 列表初始化
    score_map.insert(pair<string, int>("Bob", 88)); // pair 构造
    score_map["Charlie"] = 92;                // operator[] 插入（不存在则新建，存在则修改值）

    // 3. 查找键（find 返回迭代器，未找到则返回 end()）
    auto it = score_map.find("Bob");
    if (it != score_map.end()) {
        cout << "Bob's score: " << it->second << endl; // 访问值：it->second
    }

    // 4. 修改值（通过 operator[] 或迭代器）
    score_map["Bob"] = 90;                    // operator[] 直接修改
    it->second = 90;                          // 迭代器修改（需确保迭代器有效）

    // 5. 遍历（无序）
    for (const auto& kv : score_map) {
        cout << kv.first << ": " << kv.second << endl; // kv.first 是键，kv.second 是值
    }

    // 6. 删除键
    score_map.erase("Charlie");

    return 0;
}

unordered_set 常用接口：

示例：

cpp 复制代码

#include <iostream>
#include <unordered_set>
#include <vector>
using namespace std;

int main() {
    // 1. 定义 unordered_set（存储 int 类型，值即键）
    unordered_set<int> num_set;

    // 2. 插入值（去重特性：重复值插入失败）
    vector<int> nums = {3, 1, 4, 1, 5, 9, 2, 6, 5};
    for (int num : nums) {
        auto ret = num_set.insert(num); // 返回 pair<iterator, bool>
        if (!ret.second) {
            cout << "值 " << num << " 已存在，插入失败" << endl;
        }
    }

    // 3. 查找值（判断是否存在）
    if (num_set.count(5)) { // count 返回 0（不存在）或 1（存在）
        cout << "值 5 存在于集合中" << endl;
    }

    // 4. 遍历（无序，去重后结果）
    for (int num : num_set) {
        cout << num << " "; // 直接访问值（即键）
    }
    cout << endl;

    // 5. 删除值
    num_set.erase(9);

    return 0;
}

1.4 与普通 map/set的区别

在 C++ STL 中，unordered_map/unordered_set（统称「unordered 系列」）与 map/set（统称「普通有序系列」）是两组功能相似但设计原理完全不同的关联式容器，他们的底层差异是他们差异的根源：

容器系列	底层数据结构	核心设计目标
unordered 系列	链式哈希表（哈希桶）	追求「快速增删查」，以空间换时间
普通有序系列	红黑树（自平衡二叉搜索树）	追求「有序存储」，兼顾效率与排序

那么我们该如何根据使用场景去选择正确的容器呢？

两组容器没有「绝对优劣」，选择的核心是匹配需求优先级，以下是典型场景的选择逻辑：

需求优先级	推荐容器	原因
追求增删查的效率（O(1)）	unordered_map/unordered_set	哈希表平均复杂度更低，适合数据量大、对效率敏感的场景
需要有序存储（如按键排序遍历）	map/set	红黑树天然有序，支持 `lower_bound`/`upper_bound` 等范围查询接口
存储自定义类型且不愿写哈希函数	map/set	仅需重载 `<` 运算符（红黑树排序用），无需自定义哈希函数
内存占用敏感	map/set	哈希表需预留桶数组空间（通常有负载因子，如 0.7），内存利用率略低

2. 模拟实现unordered系列

我们已经对unordered系列有了了解，那么接下来我们就像之前用红黑树模拟实现map/set一样，我们使用哈希表来模拟实现一下unordered系列。

其次跟map和set相⽐⽽⾔unordered系列的模拟实现类结构更复杂⼀点，但是⼤框架和思路是完全类似的。因为HashTable实现了泛型不知道T参数到底是K，还是pair<K, V>，那么insert内部进⾏插⼊时要⽤K对象转换成整形取模和K⽐较相等，因为pair的value不参与计算取模，且默认⽀持的是key和value⼀起⽐较相等，我们需要时的任何时候只需要⽐较K对象，所以我们在unordered_map和unordered_set层分别实现⼀个MapKeyOfT和SetKeyOfT的仿函数传给HashTable的KeyOfT，然后HashTable中通过KeyOfT仿函数取出T类型对象中的K对象，再转换成整形取模和K⽐较相等。这一点与前面的map和set是类似的。

其实大体的思路与之前的思路相同，只是在封装迭代器以及细节方面与之前有些差异。因此我们重点来看一看迭代器的实现。

2.1 迭代器

unordered系列的iterator实现的⼤框架跟list的iterator思路是⼀致的，⽤⼀个类型封装结点的指针，再通过重载运算符实现，迭代器像指针⼀样访问的⾏为，要注意的是哈希表的迭代器是单向迭代器。这⾥的难点是operator++的实现。iterator中有⼀个指向结点的指针，如果当前桶下⾯还有结点，则结点的指针指向下⼀个结点即可。如果当前桶⾛完了，则需要想办法计算找到下⼀个桶。这⾥的难点是反⽽是结构设计的问题，因此在iterator中除了有结点的指针，还需要有哈希表对象的指针，这样当前桶⾛完了，要计算下⼀个桶就相对容易多了，⽤key值计算出当前桶位置，依次往后找下⼀个不为空的桶即可。下面是代码实例：

cpp 复制代码

template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
struct HTIterator
{
	typedef HashNode<T> Node;
	typedef HTIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;

	Node* _node;
	const HashTable<K, T, KeyOfT, Hash>* _pht;	
    
    //...
    
    Self& operator++()
    {
        if (_node->_next)
        {
            _node = _node->_next;
        }
        else
        {
            KeyOfT kot;
            Hash hs;
            size_t hashi = hs(kot(_node->_data)) % _pht->_tables.size() + 1;

            while (hashi < _pht->_tables.size())
            {
                if (_pht->_tables[hashi])
                {
                    _node = _pht->_tables[hashi];
                    break;
                }
                hashi++;
            }

            if (hashi == _pht->_tables.size())
            {
                _node = nullptr;
            }
        }
        return *this;
    }
}

begin()返回第⼀个桶中第⼀个节点指针构造的迭代器，这⾥end()返回迭代器可以⽤空表⽰。

至于其他的小细节例如通过模板参数Ref和Ptr来实现iterator和const_iterator以及使用模板参数T来代表底层的数据类型，模板参数K来进行find和erase等等细节与我们使用红黑树实现map和set是几乎完全一样的，这里就不再赘述，下面直接给出完整的代码实现：

hash_bucket.h

cpp 复制代码

#pragma once

#include<iostream>
#include<string>
#include<vector>
using namespace std;

template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

//特化
template<>
struct HashFunc<string>
{
	size_t operator()(const string& s)
	{
		size_t ret = 0;
		//这⾥我们使⽤BKDR哈希的思路，⽤上次的计算结果去乘以⼀个质数，这个质数⼀般取31, 131等效果会⽐较好
		for (auto e : s)
		{
			ret *= 31;
			ret += e;
		}
		return ret;
	}
};

namespace hash_bucket
{
	template<class T>
	struct HashNode
	{
		HashNode<T>* _next;
		T _data;

		HashNode(const T& data)
			: _data(data)
			, _next(nullptr)
		{}
	};

	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

	template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
	struct HTIterator
	{
		typedef HashNode<T> Node;
		typedef HTIterator<K, T, Ref, Ptr, KeyOfT, Hash> Self;

		Node* _node;
		const HashTable<K, T, KeyOfT, Hash>* _pht;

		HTIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht)
			: _node(node)
			, _pht(pht)
		{}

		Ref operator*()
		{
			return _node->_data;
		}

		Ptr operator->()
		{
			return &_node->_data;
		}

		//单向迭代器
		Self& operator++()
		{
			if (_node->_next)
			{
				_node = _node->_next;
			}
			else
			{
				KeyOfT kot;
				Hash hs;
				size_t hashi = hs(kot(_node->_data)) % _pht->_tables.size() + 1;

				while (hashi < _pht->_tables.size())
				{
					if (_pht->_tables[hashi])
					{
						_node = _pht->_tables[hashi];
						break;
					}
					hashi++;
				}

				if (hashi == _pht->_tables.size())
				{
					_node = nullptr;
				}
			}
			return *this;
		}

		bool operator!=(const Self& self)
		{
			return self._node != _node;
		}

		bool operator==(const Self& self)
		{
			return self._node == _node;
		}
	};

	template<class K, class T, class KeyOfT, class Hash>
	class HashTable
	{
		//友元声明
		template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
		friend struct HTIterator;

		typedef HashNode<T> Node;

		inline unsigned long __stl_next_prime(unsigned long n)
		{
			static const int __stl_num_primes = 28;
			static const unsigned long __stl_prime_list[__stl_num_primes] =
			{
				53,			97,			193,		389,		769,
				1543,		3079,		6151,		12289,		24593,
				49157,		98317,		196613,		393241,		786433,
				1572869,	3145739,	6291469,	12582917,	25165843,
				50331653,	100663319,	201326611,	402653189,	805306457,
				1610612741, 3221225473, 4294967291
			};
			const unsigned long* first = __stl_prime_list;
			const unsigned long* last = __stl_prime_list +__stl_num_primes;
			const unsigned long* pos = lower_bound(first, last, n);
			return pos == last ? *(last - 1) : *pos;
		}

	public:
		typedef HTIterator<K, T, T&, T*, KeyOfT, Hash> Iterator;
		typedef HTIterator<K, T, const T&, const T*, KeyOfT, Hash> ConstIterator;

		HashTable()
		{
			_tables.resize(__stl_next_prime(0), nullptr);
		}

		~HashTable()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					Node* tmp = cur;
					cur = cur->_next;
					delete tmp;
				}
				_tables[i] = nullptr;
			}
		}

		Iterator End()
		{
			return Iterator(nullptr, this);
		}

		Iterator Begin()
		{
			if (_n == 0)
			{
				return End();
			}

			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
				{
					return Iterator(_tables[i], this);
				}
			}
			return End();
		}

		

		ConstIterator End() const
		{
			return Iterator(nullptr, this);
		}

		ConstIterator Begin() const
		{
			if (_n == 0)
			{
				return End();
			}

			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
				{
					return Iterator(_tables[i], this);
				}
			}
			return End();
		}

		

		pair<Iterator, bool> Insert(const T& data)
		{
			KeyOfT kot;
			Hash hs;

			Iterator it = Find(kot(data));
			if (it != End())
			{
				return { it, false };
			}
			//扩容
			if ((double)_n / _tables.size() >= 0.75)
			{
				vector<Node*> newtables(__stl_next_prime(_tables.size() + 1), nullptr);

				for (int i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					while (cur)
					{
						Node* next = cur->_next;

						size_t hashi = hs(kot(cur->_data)) % newtables.size();

						cur->_next = newtables[hashi];
						newtables[hashi] = cur;

						cur = next;
					}
				}

				_tables.swap(newtables);
			}
	
			

			size_t hash_i = hs(kot(data)) % _tables.size();

			Node* newnode = new Node(data);
			newnode->_next = _tables[hash_i];
			_tables[hash_i] = newnode;
			++_n;

			return { Iterator(newnode, this), true};
		}

		Iterator Find(const K& key)
		{
			Hash hs;
			KeyOfT kot;
			size_t hashi = hs(key) % _tables.size();
			
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return Iterator(cur, this);
				}
				cur = cur->_next;
			}
			
			return End();
		}

		bool Erase(const K& key)
		{
			Hash hs;
			KeyOfT kot;
			size_t hashi = hs(key) % _tables.size();

			Node* cur = _tables[hashi];
			Node* prev = nullptr;
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					if (prev == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					delete cur;
					--_n;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}
			return false;
		}

	private: 
		//指针数组
		vector<Node*> _tables; 
		//表中存储数据个数
		size_t _n = 0; 
	};
}

unordered_map

cpp 复制代码

#pragma once
#include"hash_bucket.h"

namespace hcy
{
	template<class K, class V, class Hash = HashFunc<K>>
	class unordered_map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};
	public:
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::Iterator iterator;
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::ConstIterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		iterator find(const K& key)
		{
			return _ht.Find();
		}

		bool erase(const K& key)
		{
			return _ht.Erase();
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> ret = _ht.Insert({ key, V() });
			return ret.first->second;
		}
	private:
		hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
	};
}

unordered_set

cpp 复制代码

#pragma once
#include"hash_bucket.h"

namespace hcy
{
	template<class K, class Hash = HashFunc<K>>
	class unordered_set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::Iterator iterator;
		typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::ConstIterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const K& key)
		{	
			return _ht.Insert(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find();
		}

		bool erase(const K& key)
		{
			return _ht.Erase();
		}
	private:
		hash_bucket::HashTable<K, const K, SetKeyOfT, Hash> _ht;
	};
}

下面让我们通过一些示例来看一看我们自己模拟实现的unordered系列是否可以正常使用：

cpp 复制代码

#include"unordered_set.h"
#include"unordered_map.h"

void test_map()
{
	hcy::unordered_map<string, string> dict;
	dict.insert({ "sort", "排序" });
	dict.insert({ "left", "左边" });
	dict.insert({ "right", "右边" });
	dict["left"] = "左边";
	dict["insert"] = "插入";
	dict["string"];
	hcy::unordered_map<string, string>::iterator it = dict.begin();
	while (it != dict.end())
	{
		// 不能修改first，可以修改second
		//it->first += 'x';
		it->second += 'x';
		cout << it->first << ":" << it->second << endl;
		++it;
	}
	cout << endl;
}

void test_set()
{
	hcy::unordered_set<int> s;
	int a[] = { 4, 2, 6, 1, 3, 5, 15, 7, 16, 14, 3,3,15 };
	for (auto e : a)
	{
		s.insert(e);
	}
	for (auto e : s)
	{
		cout << e << " ";
	}
	cout << endl;
	hcy::unordered_set<int>::iterator it = s.begin();
	while (it != s.end())
	{
		// 不⽀持修改
		//*it += 1;
		cout << *it << " ";
		++it;
	}
	cout << endl;
}

int main()
{
	test_set();
	cout << endl;
	test_map();
	return 0;
}

我们来看一下运行的结果：

可以看出结果符合我们的预期。那么这就是unordered系列的模拟实现了，希望大家能有些收获。

尾声

若有纰漏或不足之处欢迎大家在评论区留言或者私信，同时也欢迎各位一起探讨学习。感谢您的观看！