【C++】STL — unordered_map 与 unordered_set 使用与模拟实现

本篇文章主要讲解 stl 中 unordered_map 与 unordered_set 的使用及其模拟实现

1 unordered_map 与 unordered_set 的使用

unordered_map 与 unordered_set 的使用与 map 和 set 的使用几乎相同，所以在这里只是简单介绍一下他的接口。我们知道 map 和 set 的底层是红黑树，unordered_map 和 unordered_set 的底层是哈希表，性能上肯定会有所差别，所以后面我们会写一个比较程序，来比较一下他们两种容器性能上的差别。

unordered_set

与 set 和 multiset 一样，unordered_set 头文件下也有 unordered_set 与 unordered_multiset 两个容器，差别与 set 和 multiset 是一样的，unordered_set 不允许插入相同的值，而 unordered_multiset 允许插入相同的值，其他用法都是类似的，为了简化，这里我们就只讲解 unordered_set。

unordered_set 的模板参数

在 unordered_set 中有四个模板参数，其中 Alloc 是空间配置器，这个我们不考虑。第一个模板参数 Key 与 set 中的 Key 相同，就是容器内实际存储的键值类型；Hash 与 Pred 模板参数就是跟底层的哈希表有关了。unordered_map 与 unordered_set 底层哈希表采用的哈希函数是除留余数法，在哈希表的实现中我们讲过，使用除留余数法，就必须将底层的 key 能够转换为 size_t 类型，所以 Hash 模板参数的作用就是提供一个可以将 key 值转换为 size_t 类型的仿函数；而在哈希表中需要查找一个元素，其实就是在进行 key 值的比较，所以Pred 模板参数就是用来提供一个可以进行 key 值相等比较逻辑的仿函数。

unordered_set 的接口

构造函数

其核心构造函数还是包括那几个：无参构造、迭代器区间构造、拷贝构造、initializer_list 构造，很简单，这里就不进行使用了。

容量接口

unordered_set 容量相关接口就很简单，只有三个接口，empty 只要是判断哈希表中有没有元素，size 就是看哈希表中的元素个数，max_size 就是看最多可以放多少个元素。

迭代器相关接口

unordered_set 的迭代器也很简单了，和之前容器迭代器的使用没有什么区别。但是有一点需要注意，unordered_set 与 unordered_map 的迭代器都是单向迭代器，也就是 forward iterator，而 map 和 set 是双向迭代器，也就是 bidirectional iterator。所以在 unordered_set 中并没有 rbegin 和 rend。

增删查改相关接口

这里的接口与 set 中的接口是一模一样的。但是由于底层结构的不同，其实现原理是不一样的。另外，equal_range 在 unordered_set 中与 find 的作用是相同的，所以平常我们都是用 find；但是在 unordered_multiset 中是作用比较大的，会返回与传入的 key 相等的起止迭代器区间，注意是左闭右开区间。

底层结构哈希表相关接口

其中 Buckets 就是与底层的哈希桶结构相关的接口，比如 bucket_count 就是返回底层哈希表中桶的个数，也就是底层 vector 的 size()；而 max_bucket_count就是能开辟的最多的桶的个数，也就是底层的 vector 最多有多少个元素；bucket_size 函数是返回传入的第 n 个桶中的元素个数，就相当于底层 vector 的 n 下标位置下的链表元素个数；bucket是返回传入的 key 值经过哈希函数计算之后的索引。

Hash policy 系列接口就是跟哈希表性能相关的接口。其中 load_factor 就是返回当前哈希表的负载因子；max_load_factor就是返回最大的负载因子，也就是负载因子达到多少时，哈希表会发生扩容；rehash 与reserve是哈希表的扩容接口，其中rehash 的参数 n 是指桶的个数，当 n > bucket_count 时，其会进行扩容，将 bucket_count 扩容到 n 或者多于 n，根据编译器而定，底层的元素也会重新进行哈希函数计算索引值，若 n <= bucket_count，可能会减少同的个数，但是一般是不会改变桶的个数的；reserve 的参数 n 是指哈希表中的元素个数，也就是底层那些链表节点个数，当 n > bucket_count * max_load_factor 时，其会进行扩容并且重新计算索引，当 n <= bucket_count * max_load_factor 时，其可能不会执行任何动作。

cpp 复制代码

#include <iostream>
#include <unordered_set>

using namespace std;

int main()
{
	unordered_set<int> us = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
	cout << "bucket count: " << us.bucket_count() << endl; //vector.size()
	cout << "max bucket count: " << us.max_bucket_count() << endl;

	cout << "2 index: " << us.bucket(2) << endl;
	cout << "2 index size count: " << us.bucket_size(us.bucket(2)) << endl;

	cout << "load factor: " << us.load_factor() << endl; //负载因子
	cout << "max load factor: " << us.max_load_factor() << endl; //最大负载因子

	us.rehash(us.bucket_count() + 10);
	cout << "rehash bucket count: " << us.bucket_count() << endl;
	us.rehash(us.bucket_count() - 15);
	cout << "rehash bucket count: " << us.bucket_count() << endl;

	us.reserve(us.bucket_count() * us.max_load_factor() + 1);
	cout << "reserve bucket count: " << us.bucket_count() << endl;
	us.rehash(us.bucket_count() * us.max_load_factor() - 10);
	cout << "reserve bucket count: " << us.bucket_count() << endl;

	return 0;
}

执行结果：

cpp 复制代码

bucket count: 64
max bucket count: 536870911
2 index: 55
2 index size count: 1
load factor: 0.15625
max load factor: 1
rehash bucket count: 128
rehash bucket count: 128
reserve bucket count: 256
reserve bucket count: 256

unordered_map

与 map 和 multimap 一样，unordered_map 头文件下也有 unordered_map 与 unordered_multimap 两个容器，差别与 map 和 multimap 是一样的，unordered_map 不允许插入相同的键值对，而 unordered_multimap 允许插入相同的键值对，其他用法都是类似的，为了简化，这里我们就只讲解 unordered_map。

unordered_map 的模板参数

其中前两个模板参数与 map 是一样的，Key 就是键值的类型，T 就是值的类型，Hash 与 Pred 与 unordered_set 中类似，Hash 用来将 key 值转换为 size_t 类型，Pred 用来判断 key 值是否相等。

unordered_map 的接口

构造函数

与 unodered_set 构造相同，分别是无参构造、迭代器区间构造、拷贝构造与 initializer_list 构造。

容量接口

接口与 unordered_set 的接口相同，empty 用来判空，size 会返回底层的元素个数，也就是底层哈希表链表中节点的个数。

迭代器接口

unordered_map 的迭代器依旧是一个单向迭代器，也就是 forward iterator。

增删查改接口

与 map 一样，最好用的插入接口就是 operator\[\]，如果参数 key 不存在，那就是插入，如果 key 存在，那么就会返回 key 值对应 value 的引用。需要注意的是，unordered_multimap 中是没有 operator\[\] 与 at 接口的，因为 unordered_multimap 中有多个相同的元素，对于一个语句，比如 umm $"left"$ = "左边" 是进行插入还是修改是不确定的，所以就不支持 operator\[\] 函数了。剩下的都与 unordered_set 类似，这里就不过多赘述了。

底层结构哈希表相关接口

这里的借口也在 unordered_set 中讲解过了，这里不再赘述。

unordered_set 与 set 的性能对比

我们可以写下面这段代码来对比一下他们之间的性能差别：

cpp 复制代码

#include <iostream>
#include <set>
#include <unordered_set>
#include <vector>
#include <cstdlib>
#include <time.h>
using namespace std;

const int TEST_SIZE = 1000000;  // 测试 100 万数据

int main()
{
    // 准备随机数据
    vector<int> data;
    data.reserve(TEST_SIZE);
    for (int i = 0; i < TEST_SIZE; i++)
    {
        data.push_back(rand() % TEST_SIZE);
    }

    cout << "性能对比：set vs unordered_set" << endl;
    cout << "数据量：" << TEST_SIZE << endl;

    // 插入 set
    clock_t start, end;
    set<int> st;

    start = clock();
    for (int i = 0; i < data.size(); i++)
    {
        st.insert(data[i]);
    }
    end = clock();
    cout << "set 插入耗时：" << (end - start) << " ms" << endl;

    //插入 unordered_set
    unordered_set<int> ust;
    start = clock();
    for (int i = 0; i < data.size(); i++)
    {
        ust.insert(data[i]);
    }
    end = clock();
    cout << "unordered_set 插入耗时：" << (end - start) << " ms" << endl << endl;

    //查找 set 
    start = clock();
    for (int i = 0; i < data.size(); i++)
    {
        st.count(data[i]);
    }
    end = clock();
    cout << "set 查找耗时：" << (end - start) << " ms" << endl;

    //查找 unordered_set
    start = clock();
    for (int i = 0; i < data.size(); i++)
    {
        ust.count(data[i]);
    }
    end = clock();
    cout << "unordered_set 查找耗时：" << (end - start) << " ms" << endl << endl;

    //遍历 set
    long long sum = 0;
    start = clock();
    for (auto& x : st)
    {
        sum += x;
    }
    end = clock();
    cout << "set 遍历耗时：" << (end - start) << " ms" << endl;

    //遍历 unordered_set
    sum = 0;
    start = clock();
    for (auto& x : ust)
    {
        sum += x;
    }
    end = clock();
    cout << "unordered_set 遍历耗时：" << (end - start) << " ms" << endl;

    return 0;
}

运行结果：

cpp 复制代码

性能对比：set vs unordered_set
数据量：1000000
set 插入耗时：2262 ms
unordered_set 插入耗时：845 ms

set 查找耗时：1632 ms
unordered_set 查找耗时：248 ms

set 遍历耗时：4 ms
unordered_set 遍历耗时：3 ms

可以看到 unordered_set 效率是远高于 set 的，所以以后如果想要实现快速查找，那就使用 unordered_map 与 unordered_set，但是如果需要有序 + 快速查找，那就需要使用 map 和 set 了。

2 unordered_map 与 unordered_set 的模拟实现

STL 源码框架分析

cpp 复制代码

// stl_hash_set
template <class Value, class HashFcn = hash<Value>, class EqualKey = equal_to<Value>,
    class Alloc = alloc>
class hash_set
{
private:
    //封装 hashtable 实现的 hash_set
    typedef hashtable<Value, Value, HashFcn, identity<Value>, EqualKey, Alloc> ht;
    ht rep;
public:
    typedef typename ht::key_type key_type;
    typedef typename ht::value_type value_type;
    typedef typename ht::hasher hasher;
    typedef typename ht::key_equal key_equal;
    typedef typename ht::const_iterator iterator;
    typedef typename ht::const_iterator const_iterator;

    hasher hash_funct() const { return rep.hash_funct(); }
    key_equal key_eq() const { return rep.key_eq(); }
};

// stl_hash_map
template <class Key, class T, class HashFcn = hash<Key>, class EqualKey = equal_to<Key>,
    class Alloc = alloc>
class hash_map
{
private:
    //封装 hashtable 实现的 hash_map
    typedef hashtable<pair<const Key, T>, Key, HashFcn,
           select1st<pair<const Key, T> >, EqualKey, Alloc> ht;

    ht rep;
public:
    typedef typename ht::key_type key_type;
    typedef T data_type;
    typedef T mapped_type;
    typedef typename ht::value_type value_type;
    typedef typename ht::hasher hasher;
    typedef typename ht::key_equal key_equal;
    typedef typename ht::iterator iterator;
    typedef typename ht::const_iterator const_iterator;
};

// stl_hashtable.h
template <class Value>
struct __hashtable_node
{
    __hashtable_node* next;
    Value val;
};

template <class Value, class Key, class HashFcn, class ExtractKey, class EqualKey,
    class Alloc>
class hashtable 
{
public:
    typedef Key key_type;
    typedef Value value_type;
    typedef HashFcn hasher;
    typedef EqualKey key_equal;
private:
    hasher hash;
    key_equal equals;
    ExtractKey get_key;
    typedef __hashtable_node<Value> node;
    
    //核心成员，vector 代表哈希桶，num_elements 代表元素个数
    vector<node*,Alloc> buckets;
    size_type num_elements;
public:
typedef __hashtable_iterator<Value, Key, HashFcn, ExtractKey, EqualKey, Alloc>iterator;

    pair<iterator, bool> insert_unique(const value_type& obj);
    const_iterator find(const key_type& key) const;
};

所以 unordered_map 与 unordered_set 的实现框架跟我们之前实现的 map 和 set 几乎一样，我们只需要将我们写过的 HashTable 模板参数改为 K, T，其中 T 为实际存储类型，unordered_set 就是 K，而 unordered_map 就是 pair<K, V>，然后封装 HashTable 就可以了。

封装 HashTable 实现 unordered_map 与 unordered_set

封装 HashTable

我们按照源码中的框架将 HashTable 模板参数修改为 K、T、Hash，相应的，HTNode 的模板参数就应该修改为 T 代表节点实际存储的数据类型。为了防止与库中的有些命名冲突，我们使用命名空间封起来：

cpp 复制代码

//MyHashTable.hpp
#pragma once
#include <iostream>
#include <vector>
#include <string>

using namespace std;

namespace LTL
{
	//库中的计算 hashtable 容量大小的数组及函数
	static const int __stl_num_primes = 28;
	static const unsigned long __stl_prime_list[__stl_num_primes] =
	{
	  53,         97,         193,       389,       769,
	  1543,       3079,       6151,      12289,     24593,
	  49157,      98317,      196613,    393241,    786433,
	  1572869,    3145739,    6291469,   12582917,  25165843,
	  50331653,   100663319,  201326611, 402653189, 805306457,
	  1610612741, 3221225473, 4294967291
	};

	inline unsigned long __stl_next_prime(unsigned long n)
	{
		const unsigned long* first = __stl_prime_list;
		const unsigned long* last = __stl_prime_list + __stl_num_primes;
		const unsigned long* pos = std::lower_bound(first, last, n);
		return pos == last ? *(last - 1) : *pos;
	}

	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	//由于在日常生活中 string 非常常用，所以这里将模板进行特化
	template<>
	struct HashFunc<string>
	{
		size_t operator()(const string& key)
		{
			//这里对 string 进行了 BKDR_Hash 算法，避免不同字符串之间产生相同的整形值
			size_t hash = 0;
			for (auto ch : key)
			{
				hash *= 131;
				hash += ch;
			}

			return hash;
		}
	};

	template<class T>
	struct HTNode
	{
		T _data;
		HTNode<T>* _next;//需要有指向下一个节点的指针

		HTNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};


	template<class K, class T, class Hash = HashFunc<K>>
	class HashTable
	{
		//using 的另一用法，类似于 typedef
		using Node = HTNode<T>;

	public:
		HashTable(size_t size = __stl_next_prime(0))
			:_tables(size)
			, _n(0)
		{}

		bool Insert(const T& data)
		{
			Hash hs;
			//不能插入相同值
			if (Find(data.first))
				return false;

			//0.如果负载因子为1，那么就对当前的 hash 表进行扩容
			if (_n == _tables.size())
			{
				//将原哈希表中的所有节点移到新的哈希表中
				HashTable newht(__stl_next_prime(_tables.size() + 1));
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					Node* next = nullptr;
					while (cur)
					{
						next = cur->_next;
						int hashi = hs(cur->_data.first) % newht._tables.size();
						cur->_next = newht._tables[hashi];
						newht._tables[hashi] = cur;

						cur = next;
					}

					_tables[i] = nullptr;
				}

				//交换新表与旧表
				_tables.swap(newht._tables);
			}

			//1. 根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(data.first) % _tables.size();
			//2. 新添加一个节点，将该节点头插到对应的单链表中
			Node* newnode = new Node(data);
			Node* cur = _tables[hashi];
			newnode->_next = cur;
			_tables[hashi] = newnode;
			++_n;

			return true;
		}

		bool Erase(const K& key)
		{
			if (!Find(key))
				return false;

			//删掉该节点
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_data.first == key)
					break;

				prev = cur;
				cur = cur->_next;
			}

			if (prev == nullptr)
			{
				//cur 是头节点
				_tables[hashi] = cur->_next;
			}
			else
			{
				prev->_next = cur->_next;
			}

			delete cur;
			--_n;

			return true;
		}

		Node* Find(const K& key)
		{
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_data.first == key)
					return cur;

				cur = cur->_next;
			}

			return nullptr;
		}

		~HashTable()
		{
			for (auto& ptr : _tables)
			{
				Node* cur = ptr;
				Node* next = nullptr;
				while (cur)
				{
					next = cur->_next;
					//释放当前节点
					delete cur;
					cur = next;
				}

				ptr = nullptr;
			}
		}

	private:
		vector<Node*> _tables;
		size_t _n;
	};
}

有了 HashTable 之后，unordered_set 与 unordered_map 的实现就很简单了，只需要封装一下 HashTable 就可以了：

cpp 复制代码

//Myunordered_set.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K>
	class unordered_set
	{
		typedef HTNode<const K> Node;

	public:
		bool insert(const K& key)
		{
			return _ht.Insert(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		Node* find(const K& key)
		{
			return _ht.Find(key);
		}

	private:
		HashTable<K, K> _ht;
	};
}

//Myunordered_map.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K, class V>
	class unordered_map
	{
		typedef HTNode<pair<K, V>> Node;

	public:
		bool insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		Node* find(const K& key)
		{
			return _ht.Find(key);
		}

	private:
		HashTable<K, pair<K, V>> _ht;
	};
}

添加 KeyOfT 模板参数

与 map 和 set 一样，我们需要在 HashTable 中添加一个KeyOfT参数，来取出 T 模板参数的 key 值，因为 T 可能是 unordered_set 的 K，也有可能是 unordered_map 的 pair<K, V>，所以我们需要一个 KeyOfT 来取出其中的 key 值做比较才可以：

cpp 复制代码

//MyHashTable.hpp
#pragma once
#include <iostream>
#include <vector>
#include <string>

using namespace std;

namespace LTL
{
	//库中的计算 hashtable 容量大小的数组及函数
	static const int __stl_num_primes = 28;
	static const unsigned long __stl_prime_list[__stl_num_primes] =
	{
	  53,         97,         193,       389,       769,
	  1543,       3079,       6151,      12289,     24593,
	  49157,      98317,      196613,    393241,    786433,
	  1572869,    3145739,    6291469,   12582917,  25165843,
	  50331653,   100663319,  201326611, 402653189, 805306457,
	  1610612741, 3221225473, 4294967291
	};

	inline unsigned long __stl_next_prime(unsigned long n)
	{
		const unsigned long* first = __stl_prime_list;
		const unsigned long* last = __stl_prime_list + __stl_num_primes;
		const unsigned long* pos = std::lower_bound(first, last, n);
		return pos == last ? *(last - 1) : *pos;
	}

	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	//由于在日常生活中 string 非常常用，所以这里将模板进行特化
	template<>
	struct HashFunc<string>
	{
		size_t operator()(const string& key)
		{
			//这里对 string 进行了 BKDR_Hash 算法，避免不同字符串之间产生相同的整形值
			size_t hash = 0;
			for (auto ch : key)
			{
				hash *= 131;
				hash += ch;
			}

			return hash;
		}
	};

	template<class T>
	struct HTNode
	{
		T _data;
		HTNode<T>* _next;//需要有指向下一个节点的指针

		HTNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};


	template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
	class HashTable
	{
		//using 的另一用法，类似于 typedef
		using Node = HTNode<T>;

	public:
		HashTable(size_t size = __stl_next_prime(0))
			:_tables(size)
			, _n(0)
		{}

		bool Insert(const T& data)
		{
			KeyOfT kot;
			Hash hs;
			//不能插入相同值
			if (Find(kot(data)))
				return false;

			//0.如果负载因子为1，那么就对当前的 hash 表进行扩容
			if (_n == _tables.size())
			{
				//将原哈希表中的所有节点移到新的哈希表中
				HashTable newht(__stl_next_prime(_tables.size() + 1));
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					Node* next = nullptr;
					while (cur)
					{
						next = cur->_next;
						int hashi = hs(kot(cur->_data)) % newht._tables.size();
						cur->_next = newht._tables[hashi];
						newht._tables[hashi] = cur;

						cur = next;
					}

					_tables[i] = nullptr;
				}

				//交换新表与旧表
				_tables.swap(newht._tables);
			}

			//1. 根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(kot(data)) % _tables.size();
			//2. 新添加一个节点，将该节点头插到对应的单链表中
			Node* newnode = new Node(data);
			Node* cur = _tables[hashi];
			newnode->_next = cur;
			_tables[hashi] = newnode;
			++_n;

			return true;
		}

		bool Erase(const K& key)
		{
			if (!Find(key))
				return false;

			//删掉该节点
			KeyOfT kot;
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
					break;

				prev = cur;
				cur = cur->_next;
			}

			if (prev == nullptr)
			{
				//cur 是头节点
				_tables[hashi] = cur->_next;
			}
			else
			{
				prev->_next = cur->_next;
			}

			delete cur;
			--_n;

			return true;
		}

		Node* Find(const K& key)
		{
			KeyOfT kot;
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
					return cur;

				cur = cur->_next;
			}

			return nullptr;
		}

		~HashTable()
		{
			for (auto& ptr : _tables)
			{
				Node* cur = ptr;
				Node* next = nullptr;
				while (cur)
				{
					next = cur->_next;
					//释放当前节点
					delete cur;
					cur = next;
				}

				ptr = nullptr;
			}
		}

	private:
		vector<Node*> _tables;
		size_t _n;
	};
}

在 unordered_set 与 unordered_map 中我们需要实现对应的仿函数传入 HashTable 中：

cpp 复制代码

//Myunordered_set.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K>
	class unordered_set
	{
		typedef HTNode<const K> Node;

		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};

	public:
		bool insert(const K& key)
		{
			return _ht.Insert(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		Node* find(const K& key)
		{
			return _ht.Find(key);
		}

	private:
		HashTable<K, K, SetKeyOfT> _ht;
	};
}

//Myunordered_map.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K, class V>
	class unordered_map
	{
		typedef HTNode<pair<const K, V>> Node;

		struct MapKeyOfT
		{
			const K& operator()(const pair<K, V>& kv)
			{
				return kv.first;
			}
		};

	public:
		bool insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		Node* find(const K& key)
		{
			return _ht.Find(key);
		}

	private:
		HashTable<K, pair<K, V>, MapKeyOfT> _ht;
	};
}

实现 key 值不可修改

这一步很简单，只需要将 unordered_set 的成员变量 _ht 类型变为 HashTable<K, const K, SetKeyOfT> 就可以了，这样 HashTable 中的 T 就是 const K 类型，自然就不能修改了；将unordered_map 中的 _ht 类型修改为 HashTable<K, pair<const K, V>, MapKeyOfT> 。这一步很简单，就不列出代码了，需要可以查看后面的完整版代码。

实现迭代器

unordered_map 与 unordered_set 的迭代器是单向迭代器，所以在迭代器的实现中只会有 operator++()，而没有 operator--()。在实现之前我们需要先了解哈希表是如何遍历的：

哈希表的遍历就是从 0 号桶开始依次遍历完所有桶，哈希表的遍历就结束了。这也就是迭代器中 operator++ 的实现逻辑了，所以哈希表遍历出来并不一定是有序的。

迭代器的其他部分的实现。像 operator*，operator->，operator==，operator!=，普通迭代器与 const 迭代器如何实现，我们在 list 和 map 等容器中都实现过了，这里就不再赘述。

但是在哈希表的迭代器实现中有一个很重要的点，那就是我们需要在迭代器中保存一个 HashTable 容器的指针，因为在遍历哈希表的过程中，一个桶下的链表遍历完了，我们需要找到下一个不为空的桶，而这必须用到 HashTable。实现了迭代器之后，别忘记将 Find 函数的返回值修改为 iterator 哦。

cpp 复制代码

#pragma once
#include <iostream>
#include <vector>
#include <string>

using namespace std;

namespace LTL
{
	//......

	//提前声明 HashTable 类
	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

	template<class K, class T, class KeyOfT, class Hash, class Ptr, class Ref>
	struct _Ht_Iterator
	{
		typedef HTNode<T> Node;
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, Ptr, Ref> Self;


		Node* _node;
		HashTable<K, T, KeyOfT, Hash>* _ht;//需要有一个哈希表的指针来方便实现 operator++

		_Ht_Iterator(Node* node, HashTable<K, T, KeyOfT, Hash>* ht)
			:_node(node)
			, _ht(ht)
		{}

		Ptr operator->()
		{
			return &(_node->_data);
		}

		Ref operator*()
		{
			return _node->_data;
		}

		bool operator==(const Self& it)
		{
			return _node == it._node;
		}

		bool operator!=(const Self& it)
		{
			return _node != it._node;
		}

		//前置++
		Self& operator++()
		{
			if (_node == nullptr)
				return *this;

			//如果当前桶还有下一个节点，那就是下一个节点
			if (_node->_next)
			{
				_node = _node->_next;
			}
			else
			{
				//这个桶没有节点了，找下一个有节点的桶
				KeyOfT kot;
				Hash hs;
				size_t hashi = hs(kot(_node->_data)) % _ht->bucket_count();
				while (hashi < _ht->bucket_count())
				{
					++hashi;

					if (hashi < _ht->bucket_count() && _ht->_tables[hashi])
						break;
				}

				if (hashi == _ht->_tables.size())
				{
					_node = nullptr; //走到了 end()
				}
				else
				{
					_node = _ht->_tables[hashi];
				}
			}

			return *this;
		}

		//后置 ++
		Self operator++(int)
		{
			Self tmp = *this;
			this->operator++();
			return tmp;
		}
	};

	template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
	class HashTable
	{
		//using 的另一用法，类似于 typedef
		using Node = HTNode<T>;

		template<class K, class T, class KeyOfT, class Hash, class Ptr, class Ref>
		friend struct _Ht_Iterator;

	public:
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, T*, T&> Iterator;
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, const T*, const T&> Const_Iterator;

		Iterator Begin()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
					return { _tables[i], this };
			}

			return { nullptr, this };
		}

		Iterator End()
		{
			return { nullptr, this };
		}

		Const_Iterator Begin() const
		{
			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
					return { _tables[i], this };
			}

			return { nullptr, this };
		}

		Const_Iterator End() const
		{
			return { nullptr, this };
		}

		HashTable(size_t size = __stl_next_prime(0))
			:_tables(size)
			, _n(0)
		{}

		//......
        
        Iterator Find(const K& key)
        {
	        KeyOfT kot;
	        Hash hs;
	        //1.  根据哈希函数计算出当前 key 所存放的索引
	        size_t hashi = hs(key) % _tables.size();
	        //2. 遍历当前单链表
	        Node* cur = _tables[hashi];
	        while (cur)
	        {
		        if (kot(cur->_data) == key)
			        return { cur, this };
		        cur = cur->_next;
	        }

	        return { nullptr, this };
        }

        //......

	private:
		vector<Node*> _tables;
		size_t _n;
	};
}

cpp 复制代码

//Myunordered_set.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K>
	class unordered_set
	{
		typedef HTNode<const K> Node;

		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};

	public:
		//需要在前面加上 typename 关键字表明 Iterator 是一个类型
		typedef typename HashTable<K, const K, SetKeyOfT>::Iterator iterator;
		typedef typename HashTable<K, const K, SetKeyOfT>::Const_Iterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		//......

        iterator find(const K& key)
        {
	        return _ht.Find(key);
        }

	private:
		HashTable<K, const K, SetKeyOfT> _ht;
	};
}

//Myunordered_map.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K, class V>
	class unordered_map
	{
		typedef HTNode<pair<const K, V>> Node;

		struct MapKeyOfT
		{
			const K& operator()(const pair<const K, V>& kv)
			{
				return kv.first;
			}
		};

	public:
		//需要在前面加上 typename 关键字表明 Iterator 是一个类型
		typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::Iterator iterator;
		typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::Const_Iterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		//......

        iterator find(const K& key)
        {
	        return _ht.Find(key);
        }

	private:
		HashTable<K, pair<const K, V>, MapKeyOfT> _ht;
	};
}

实现 operator\[\]

在 unordered_map 中有一个核心接口 operator\[\]，这个接口的实现与 map 中的 operator\[\] 实现是相同的，只需要将 insert 的返回值改为 pair<iterator, bool>，只要在 insert 中返回找到的或者是新插入的迭代器，那么在 operator\[\] 中就可以利用 pair<iterator, bool>.first->second 做到对应节点 _data 的 second，也就是 value 值，返回其引用就可以了。

cpp 复制代码

#pragma once
#include <iostream>
#include <vector>
#include <string>

using namespace std;

namespace LTL
{
	//......

	template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
	class HashTable
	{
		//using 的另一用法，类似于 typedef
		using Node = HTNode<T>;

		template<class K, class T, class KeyOfT, class Hash, class Ptr, class Ref>
		friend struct _Ht_Iterator;

	public:
		//......

		pair<Iterator, bool> Insert(const T& data)
		{
			KeyOfT kot;
			Hash hs;
			Iterator it = Find(kot(data));
			//不能插入相同值
			if (it != End())
				return { it, false };

			//0.如果负载因子为1，那么就对当前的 hash 表进行扩容
			if (_n == _tables.size())
			{
				//将原哈希表中的所有节点移到新的哈希表中
				HashTable newht(__stl_next_prime(_tables.size() + 1));
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					Node* next = nullptr;
					while (cur)
					{
						next = cur->_next;
						int hashi = hs(kot(cur->_data)) % newht._tables.size();
						cur->_next = newht._tables[hashi];
						newht._tables[hashi] = cur;

						cur = next;
					}

					_tables[i] = nullptr;
				}

				//交换新表与旧表
				_tables.swap(newht._tables);
			}

			//1. 根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(kot(data)) % _tables.size();
			//2. 新添加一个节点，将该节点头插到对应的单链表中
			Node* newnode = new Node(data);
			Node* cur = _tables[hashi];
			newnode->_next = cur;
			_tables[hashi] = newnode;
			++_n;

			return { {newnode, this}, true };
		}

		//......

	private:
		vector<Node*> _tables;
		size_t _n;
	};
}

//Myunordered_set.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K>
	class unordered_set
	{
		//.......

	public:
		
        //......

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}

		//......

	private:
		HashTable<K, const K, SetKeyOfT> _ht;
	};
}

//Myunordered_map.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K, class V>
	class unordered_map
	{
		//......

	public:
		//......

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		//......

		V& operator[](const K& key)
		{
			pair<iterator, bool> it = insert({ key, V() });
			return it.first->second;
		}

	private:
		HashTable<K, pair<const K, V>, MapKeyOfT> _ht;
	};
}

完整代码

cpp 复制代码

//MyHashTable.hpp
#pragma once
#include <iostream>
#include <vector>
#include <string>

using namespace std;

namespace LTL
{
	//库中的计算 hashtable 容量大小的数组及函数
	static const int __stl_num_primes = 28;
	static const unsigned long __stl_prime_list[__stl_num_primes] =
	{
	  53,         97,         193,       389,       769,
	  1543,       3079,       6151,      12289,     24593,
	  49157,      98317,      196613,    393241,    786433,
	  1572869,    3145739,    6291469,   12582917,  25165843,
	  50331653,   100663319,  201326611, 402653189, 805306457,
	  1610612741, 3221225473, 4294967291
	};

	inline unsigned long __stl_next_prime(unsigned long n)
	{
		const unsigned long* first = __stl_prime_list;
		const unsigned long* last = __stl_prime_list + __stl_num_primes;
		const unsigned long* pos = std::lower_bound(first, last, n);
		return pos == last ? *(last - 1) : *pos;
	}

	template<class K>
	struct HashFunc
	{
		size_t operator()(const K& key)
		{
			return (size_t)key;
		}
	};

	//由于在日常生活中 string 非常常用，所以这里将模板进行特化
	template<>
	struct HashFunc<string>
	{
		size_t operator()(const string& key)
		{
			//这里对 string 进行了 BKDR_Hash 算法，避免不同字符串之间产生相同的整形值
			size_t hash = 0;
			for (auto ch : key)
			{
				hash *= 131;
				hash += ch;
			}

			return hash;
		}
	};

	template<class T>
	struct HTNode
	{
		T _data;
		HTNode<T>* _next;//需要有指向下一个节点的指针

		HTNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};

	//提前声明 HashTable 类
	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

	template<class K, class T, class KeyOfT, class Hash, class Ptr, class Ref>
	struct _Ht_Iterator
	{
		typedef HTNode<T> Node;
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, Ptr, Ref> Self;


		Node* _node;
		HashTable<K, T, KeyOfT, Hash>* _ht;//需要有一个哈希表的指针来方便实现 operator++

		_Ht_Iterator(Node* node, HashTable<K, T, KeyOfT, Hash>* ht)
			:_node(node)
			, _ht(ht)
		{}

		Ptr operator->()
		{
			return &(_node->_data);
		}

		Ref operator*()
		{
			return _node->_data;
		}

		bool operator==(const Self& it)
		{
			return _node == it._node;
		}

		bool operator!=(const Self& it)
		{
			return _node != it._node;
		}

		//前置++
		Self& operator++()
		{
			if (_node == nullptr)
				return *this;

			//如果当前桶还有下一个节点，那就是下一个节点
			if (_node->_next)
			{
				_node = _node->_next;
			}
			else
			{
				//这个桶没有节点了，找下一个有节点的桶
				KeyOfT kot;
				Hash hs;
				size_t hashi = hs(kot(_node->_data)) % _ht->bucket_count();
				while (hashi < _ht->bucket_count())
				{
					++hashi;

					if (hashi < _ht->bucket_count() && _ht->_tables[hashi])
						break;
				}

				if (hashi == _ht->_tables.size())
				{
					_node = nullptr; //走到了 end()
				}
				else
				{
					_node = _ht->_tables[hashi];
				}
			}

			return *this;
		}

		//后置 ++
		Self operator++(int)
		{
			Self tmp = *this;
			this->operator++();
			return tmp;
		}
	};

	template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
	class HashTable
	{
		//using 的另一用法，类似于 typedef
		using Node = HTNode<T>;

		template<class K, class T, class KeyOfT, class Hash, class Ptr, class Ref>
		friend struct _Ht_Iterator;

	public:
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, T*, T&> Iterator;
		typedef _Ht_Iterator<K, T, KeyOfT, Hash, const T*, const T&> Const_Iterator;

		Iterator Begin()
		{
			for (size_t i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
					return { _tables[i], this };
			}

			return { nullptr, this };
		}

		Iterator End()
		{
			return { nullptr, this };
		}

		Const_Iterator Begin() const
		{
			for (int i = 0; i < _tables.size(); i++)
			{
				if (_tables[i])
					return { _tables[i], this };
			}

			return { nullptr, this };
		}

		Const_Iterator End() const
		{
			return { nullptr, this };
		}

		HashTable(size_t size = __stl_next_prime(0))
			:_tables(size)
			, _n(0)
		{}

		pair<Iterator, bool> Insert(const T& data)
		{
			KeyOfT kot;
			Hash hs;
			Iterator it = Find(kot(data));
			//不能插入相同值
			if (it != End())
				return { it, false };

			//0.如果负载因子为1，那么就对当前的 hash 表进行扩容
			if (_n == _tables.size())
			{
				//将原哈希表中的所有节点移到新的哈希表中
				HashTable newht(__stl_next_prime(_tables.size() + 1));
				for (size_t i = 0; i < _tables.size(); i++)
				{
					Node* cur = _tables[i];
					Node* next = nullptr;
					while (cur)
					{
						next = cur->_next;
						int hashi = hs(kot(cur->_data)) % newht._tables.size();
						cur->_next = newht._tables[hashi];
						newht._tables[hashi] = cur;

						cur = next;
					}

					_tables[i] = nullptr;
				}

				//交换新表与旧表
				_tables.swap(newht._tables);
			}

			//1. 根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(kot(data)) % _tables.size();
			//2. 新添加一个节点，将该节点头插到对应的单链表中
			Node* newnode = new Node(data);
			Node* cur = _tables[hashi];
			newnode->_next = cur;
			_tables[hashi] = newnode;
			++_n;

			return { {newnode, this}, true };
		}

		bool Erase(const K& key)
		{
			Iterator it = Find(key);
			if (it == End())
				return false;

			//删掉该节点
			KeyOfT kot;
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
					break;

				prev = cur;
				cur = cur->_next;
			}

			if (prev == nullptr)
			{
				//cur 是头节点
				_tables[hashi] = cur->_next;
			}
			else
			{
				prev->_next = cur->_next;
			}

			delete cur;
			--_n;

			return true;
		}

		Iterator Find(const K& key)
		{
			KeyOfT kot;
			Hash hs;
			//1.  根据哈希函数计算出当前 key 所存放的索引
			size_t hashi = hs(key) % _tables.size();
			//2. 遍历当前单链表
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
					return { cur, this };
				cur = cur->_next;
			}

			return { nullptr, this };
		}

		size_t bucket_count() const
		{
			return  _tables.size();
		}

		~HashTable()
		{
			for (auto& ptr : _tables)
			{
				Node* cur = ptr;
				Node* next = nullptr;
				while (cur)
				{
					next = cur->_next;
					//释放当前节点
					delete cur;
					cur = next;
				}

				ptr = nullptr;
			}
		}

	private:
		vector<Node*> _tables;
		size_t _n;
	};
}


//Myunordered_set.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K>
	class unordered_set
	{
		typedef HTNode<const K> Node;

		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};

	public:
		//需要在前面加上 typename 关键字表明 Iterator 是一个类型
		typedef typename HashTable<K, const K, SetKeyOfT>::Iterator iterator;
		typedef typename HashTable<K, const K, SetKeyOfT>::Const_Iterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const K& key)
		{
			return _ht.Insert(key);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

	private:
		HashTable<K, const K, SetKeyOfT> _ht;
	};
}

//Myunordered_map.hpp
#pragma once
#include "MyHashTable.hpp"

namespace LTL
{
	template<class K, class V>
	class unordered_map
	{
		typedef HTNode<pair<const K, V>> Node;

		struct MapKeyOfT
		{
			const K& operator()(const pair<const K, V>& kv)
			{
				return kv.first;
			}
		};

	public:
		//需要在前面加上 typename 关键字表明 Iterator 是一个类型
		typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::Iterator iterator;
		typedef typename HashTable<K, pair<const K, V>, MapKeyOfT>::Const_Iterator const_iterator;

		iterator begin()
		{
			return _ht.Begin();
		}

		iterator end()
		{
			return _ht.End();
		}

		const_iterator begin() const
		{
			return _ht.Begin();
		}

		const_iterator end() const
		{
			return _ht.End();
		}

		pair<iterator, bool> insert(const pair<K, V>& kv)
		{
			return _ht.Insert(kv);
		}

		bool erase(const K& key)
		{
			return _ht.Erase(key);
		}

		iterator find(const K& key)
		{
			return _ht.Find(key);
		}

		V& operator[](const K& key)
		{
			pair<iterator, bool> it = insert({ key, V() });
			return it.first->second;
		}

	private:
		HashTable<K, pair<const K, V>, MapKeyOfT> _ht;
	};
}

cpp 复制代码

//Main.cc -- 测试代码

#include "Myunordered_map.hpp"
#include "Myunordered_set.hpp"
#include <string>

void test_set()
{
	LTL::unordered_set<int> s;
	int a[] = { 4, 2, 6, 1, 3, 5, 15, 7, 16, 14, 3,3,15 };
	for (auto e : a)
	{
		s.insert(e);
	}
	for (auto e : s)
	{
		cout << e << " ";
	}
	cout << endl;

	LTL::unordered_set<int>::iterator it = s.begin();
	while (it != s.end())
	{
		// key 不⽀持修改
		//*it += 1;
		cout << *it << " ";
		++it;
	}
	cout << endl;
}

void test_map()
{
	LTL::unordered_map<string, string> um;
	um.insert({ "sort", "排序" });
	um.insert({ "left", "左边" });
	um.insert({ "right", "右边" });
	um.insert({ "algorithm", "算法" });

	auto it = um.begin();
	while (it != um.end())
	{
		//key 值无法改变
		//it->first += 'x';
		it->second += "xxxxx";
		cout << it->first << ":" << it->second << endl;
		++it;
	}
	cout << endl;

	//测试 operator[]
	um["computer"];
	um["apple"] = "苹果";
	um["left"] = "左边、剩余";
	for (auto& it : um)
	{
		cout << it.first << ":" << it.second << endl;
	}
	cout << endl;
}

int main()
{
	//test_set();
	test_map();

	return 0;
}

总结

unordered_map 与 unordered_set 的使用和 map 与 set 的使用几乎是相同的，而且模拟实现几乎也是相同的，都是通过封装底层的核心数据结构实现的。其中差别最大的就是迭代器中的 operator++ 逻辑，哈希表的遍历逻辑是从 0 号桶开始，依次向后遍历每一个桶，如果桶为空，那就跳过，不为空那就遍历该桶下的链表。总之，封装了 map 和 set 之后，unordered_set 与 unordered_map 的实现其实就很简单了。