【数据结构手册006】映射关系 - map与unordered_map的深度解析

键值对的哲学：从字典到映射

在现实世界中，我们习惯于通过某种标识来查找信息：通过姓名查找电话号码，通过单词查找释义，通过身份证号查找个人信息。这种键到值的映射关系是计算机科学中最基础、最强大的抽象之一。

cpp 复制代码

// 现实世界的映射关系
// 字典: 单词 → 释义
// 电话簿: 姓名 → 电话号码  
// 数据库: 主键 → 记录
// 缓存: 键 → 计算结果

映射的基本概念：理解键值对

映射的抽象定义

映射（Map）是一种存储键值对（Key-Value Pair）的数据结构，支持以下核心操作：

插入：将键值对存入映射
查找：通过键查找对应的值
删除：通过键删除对应的键值对
更新：修改已有键对应的值

cpp 复制代码

// 键值对的基本概念
template<typename Key, typename Value>
struct KeyValuePair {
    Key first;
    Value second;
    
    KeyValuePair(const Key& k, const Value& v) : first(k), second(v) {}
};

std::map：基于红黑树的有序映射

map的底层实现：红黑树

map的底层是一棵红黑树（Red-Black Tree），这是一种自平衡的二叉搜索树。

cpp 复制代码

#include <map>
#include <iostream>

void mapBasicDemo() {
    std::map<std::string, int> studentScores;
    
    // 插入操作
    studentScores["Alice"] = 95;
    studentScores["Bob"] = 87;
    studentScores["Charlie"] = 92;
    studentScores.insert({"David", 78});
    
    // 查找操作
    std::cout << "Alice的分数: " << studentScores["Alice"] << std::endl;
    
    // 遍历（按键排序）
    std::cout << "所有学生成绩（按姓名排序）:" << std::endl;
    for (const auto& pair : studentScores) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
    // 输出顺序: Alice, Bob, Charlie, David (字母顺序)
}

map的特性分析

cpp 复制代码

void mapCharacteristics() {
    std::map<int, std::string> orderedMap;
    
    // 1. 自动排序
    orderedMap[3] = "Three";
    orderedMap[1] = "One";  
    orderedMap[2] = "Two";
    
    std::cout << "自动排序结果: ";
    for (const auto& p : orderedMap) {
        std::cout << p.first << " ";  // 1 2 3
    }
    std::cout << std::endl;
    
    // 2. 唯一键约束
    orderedMap[2] = "Two Modified";  // 更新已存在的键
    auto result = orderedMap.insert({1, "One Again"});  // 插入失败
    std::cout << "插入是否成功: " << result.second << std::endl;  // false
    
    // 3. 对数时间复杂度
    // 查找: O(log n)
    // 插入: O(log n) 
    // 删除: O(log n)
}

map的迭代器与查找

cpp 复制代码

void mapIterationAndSearch() {
    std::map<std::string, double> productPrices = {
        {"apple", 1.20},
        {"banana", 0.80},
        {"orange", 1.50},
        {"grape", 2.50}
    };
    
    // 迭代器遍历
    std::cout << "产品价格表:" << std::endl;
    for (auto it = productPrices.begin(); it != productPrices.end(); ++it) {
        std::cout << it->first << ": $" << it->second << std::endl;
    }
    
    // 查找操作
    auto findIt = productPrices.find("banana");
    if (findIt != productPrices.end()) {
        std::cout << "找到香蕉，价格: $" << findIt->second << std::endl;
    } else {
        std::cout << "未找到香蕉" << std::endl;
    }
    
    // 使用count检查存在性
    if (productPrices.count("watermelon") > 0) {
        std::cout << "有西瓜" << std::endl;
    } else {
        std::cout << "没有西瓜" << std::endl;
    }
    
    // 范围查找
    auto lower = productPrices.lower_bound("b");
    auto upper = productPrices.upper_bound("o");
    std::cout << "b到o之间的产品:" << std::endl;
    for (auto it = lower; it != upper; ++it) {
        std::cout << it->first << " ";
    }
    std::cout << std::endl;
}

std::unordered_map：基于哈希表的无序映射

unordered_map的底层实现：哈希表

unordered_map使用哈希表实现，通过哈希函数将键映射到数组索引。

cpp 复制代码

#include <unordered_map>
#include <iostream>

void unorderedMapBasicDemo() {
    std::unordered_map<std::string, int> wordFrequency;
    
    // 插入操作
    wordFrequency["the"] = 15;
    wordFrequency["and"] = 12;
    wordFrequency["is"] = 8;
    wordFrequency["of"] = 10;
    
    // 访问操作
    std::cout << "'the'出现次数: " << wordFrequency["the"] << std::endl;
    
    // 遍历（无序）
    std::cout << "单词频率统计:" << std::endl;
    for (const auto& pair : wordFrequency) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
    // 输出顺序不确定，取决于哈希函数和插入顺序
}

哈希表的内部机制

cpp 复制代码

void hashTableMechanism() {
    // 哈希表的基本原理：
    // 1. 哈希函数：key → hash_value → bucket_index
    // 2. 解决冲突：链地址法（每个桶是一个链表）
    // 3. 动态扩容：当负载因子过高时重新哈希
    
    std::unordered_map<std::string, int> hashMap;
    
    // 查看哈希表状态
    std::cout << "桶数量: " << hashMap.bucket_count() << std::endl;
    std::cout << "负载因子: " << hashMap.load_factor() << std::endl;
    std::cout << "最大负载因子: " << hashMap.max_load_factor() << std::endl;
    
    // 插入一些数据
    for (int i = 0; i < 100; ++i) {
        hashMap["key" + std::to_string(i)] = i;
    }
    
    std::cout << "插入后桶数量: " << hashMap.bucket_count() << std::endl;
    std::cout << "插入后负载因子: " << hashMap.load_factor() << std::endl;
    
    // 遍历桶
    for (size_t i = 0; i < hashMap.bucket_count(); ++i) {
        std::cout << "桶 " << i << " 有 " << hashMap.bucket_size(i) << " 个元素" << std::endl;
    }
}

自定义哈希函数和相等比较

cpp 复制代码

// 自定义键类型
struct Person {
    std::string name;
    int age;
    
    bool operator==(const Person& other) const {
        return name == other.name && age == other.age;
    }
};

// 自定义哈希函数
struct PersonHash {
    std::size_t operator()(const Person& p) const {
        std::size_t h1 = std::hash<std::string>{}(p.name);
        std::size_t h2 = std::hash<int>{}(p.age);
        return h1 ^ (h2 << 1);  // 组合哈希值
    }
};

void customHashDemo() {
    std::unordered_map<Person, std::string, PersonHash> personJob;
    
    Person alice{"Alice", 25};
    Person bob{"Bob", 30};
    
    personJob[alice] = "Engineer";
    personJob[bob] = "Designer";
    
    std::cout << "Alice的工作: " << personJob[alice] << std::endl;
    std::cout << "Bob的工作: " << personJob[bob] << std::endl;
}

性能深度对比：map vs unordered_map

时间复杂度分析

cpp 复制代码

#include <chrono>
#include <random>

void performanceComparison() {
    const int ELEMENT_COUNT = 100000;
    
    std::vector<int> keys;
    for (int i = 0; i < ELEMENT_COUNT; ++i) {
        keys.push_back(i);
    }
    
    // 打乱键的顺序
    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(keys.begin(), keys.end(), g);
    
    // 测试map插入性能
    auto start = std::chrono::high_resolution_clock::now();
    std::map<int, int> orderedMap;
    for (int key : keys) {
        orderedMap[key] = key * 2;
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto mapInsertTime = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    // 测试unordered_map插入性能
    start = std::chrono::high_resolution_clock::now();
    std::unordered_map<int, int> unorderedMap;
    for (int key : keys) {
        unorderedMap[key] = key * 2;
    }
    end = std::chrono::high_resolution_clock::now();
    auto unorderedMapInsertTime = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    // 测试查找性能
    start = std::chrono::high_resolution_clock::now();
    for (int key : keys) {
        volatile int value = orderedMap[key];  // 防止优化
    }
    end = std::chrono::high_resolution_clock::now();
    auto mapLookupTime = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    start = std::chrono::high_resolution_clock::now();
    for (int key : keys) {
        volatile int value = unorderedMap[key];
    }
    end = std::chrono::high_resolution_clock::now();
    auto unorderedMapLookupTime = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "性能测试结果 (" << ELEMENT_COUNT << " 个元素):" << std::endl;
    std::cout << "map 插入时间: " << mapInsertTime.count() << "ms" << std::endl;
    std::cout << "unordered_map 插入时间: " << unorderedMapInsertTime.count() << "ms" << std::endl;
    std::cout << "map 查找时间: " << mapLookupTime.count() << "ms" << std::endl;
    std::cout << "unordered_map 查找时间: " << unorderedMapLookupTime.count() << "ms" << std::endl;
}

内存使用对比

cpp 复制代码

void memoryUsageAnalysis() {
    // map的内存特性：
    // - 每个节点需要存储左右子节点指针、颜色标志等
    // - 内存开销相对固定，与元素数量成比例
    
    // unordered_map的内存特性：
    // - 需要维护桶数组和链表节点
    // - 存在内存浪费（未使用的桶）
    // - 动态扩容时可能暂时使用双倍内存
    
    std::map<int, int> orderedMap;
    std::unordered_map<int, int> unorderedMap;
    
    const int COUNT = 1000;
    for (int i = 0; i < COUNT; ++i) {
        orderedMap[i] = i;
        unorderedMap[i] = i;
    }
    
    std::cout << "map 大小: " << orderedMap.size() << std::endl;
    std::cout << "unordered_map 大小: " << unorderedMap.size() << std::endl;
    std::cout << "unordered_map 桶数量: " << unorderedMap.bucket_count() << std::endl;
    std::cout << "unordered_map 负载因子: " << unorderedMap.load_factor() << std::endl;
    
    // 理论上，unordered_map在负载因子低时内存使用更多
    // 但查找性能更好
}

高级特性与技巧

插入操作的返回值与效率

cpp 复制代码

void insertionTechniques() {
    std::map<std::string, int> scores;
    
    // 方法1: operator[] （可能低效）
    scores["Alice"] = 95;  // 如果Alice不存在，先默认构造再赋值
    
    // 方法2: insert （更高效）
    auto result = scores.insert({"Bob", 87});
    if (result.second) {
        std::cout << "插入成功" << std::endl;
    } else {
        std::cout << "键已存在，插入失败" << std::endl;
    }
    
    // 方法3: emplace （最高效，避免临时对象）
    auto emplaceResult = scores.emplace("Charlie", 92);
    if (emplaceResult.second) {
        std::cout << "emplace插入成功" << std::endl;
    }
    
    // 方法4: try_emplace (C++17)
    // 只有在键不存在时才构造值
    auto tryResult = scores.try_emplace("David", 78);
    if (tryResult.second) {
        std::cout << "try_emplace插入成功" << std::endl;
    }
    
    // 方法5: insert_or_assign (C++17)
    // 键不存在时插入，存在时更新
    auto updateResult = scores.insert_or_assign("Alice", 96);
    if (updateResult.second) {
        std::cout << "插入新值" << std::endl;
    } else {
        std::cout << "更新已有值" << std::endl;
    }
}

异常安全与事务性操作

cpp 复制代码

void exceptionSafety() {
    std::map<int, std::string> data;
    
    try {
        // 插入多个元素，保证异常安全
        data[1] = "One";
        data[2] = "Two";
        
        // 如果这里抛出异常，之前插入的数据仍然有效
        throw std::runtime_error("模拟异常");
        
        data[3] = "Three";
    } catch (const std::exception& e) {
        std::cout << "捕获异常: " << e.what() << std::endl;
        std::cout << "数据仍然完整，大小: " << data.size() << std::endl;  // 2
    }
}

实战应用场景

场景1：词频统计

cpp 复制代码

#include <sstream>
#include <algorithm>

std::map<std::string, int> countWordFrequency(const std::string& text) {
    std::map<std::string, int> frequency;
    std::istringstream iss(text);
    std::string word;
    
    while (iss >> word) {
        // 转换为小写
        std::transform(word.begin(), word.end(), word.begin(), ::tolower);
        // 移除标点
        word.erase(std::remove_if(word.begin(), word.end(), ::ispunct), word.end());
        
        if (!word.empty()) {
            frequency[word]++;
        }
    }
    
    return frequency;
}

void displayTopWords(const std::map<std::string, int>& frequency, int topN) {
    // 将map转换为vector以便排序
    std::vector<std::pair<std::string, int>> words(frequency.begin(), frequency.end());
    
    // 按频率降序排序
    std::sort(words.begin(), words.end(), 
              [](const auto& a, const auto& b) {
                  return a.second > b.second;
              });
    
    std::cout << "前 " << topN << " 个最常出现的单词:" << std::endl;
    for (int i = 0; i < std::min(topN, static_cast<int>(words.size())); ++i) {
        std::cout << words[i].first << ": " << words[i].second << std::endl;
    }
}

void wordFrequencyDemo() {
    std::string text = "Hello world hello there world hello again and again";
    auto frequency = countWordFrequency(text);
    displayTopWords(frequency, 5);
}

场景2：LRU缓存实现

cpp 复制代码

template<typename Key, typename Value>
class LRUCache {
private:
    size_t capacity_;
    
    // 链表维护访问顺序（最近访问的在头部）
    std::list<std::pair<Key, Value>> cacheList;
    
    // 哈希表提供快速查找
    std::unordered_map<Key, typename std::list<std::pair<Key, Value>>::iterator> cacheMap;

public:
    LRUCache(size_t capacity) : capacity_(capacity) {}
    
    Value get(const Key& key) {
        auto it = cacheMap.find(key);
        if (it == cacheMap.end()) {
            throw std::runtime_error("Key not found");
        }
        
        // 移动到链表头部（标记为最近使用）
        cacheList.splice(cacheList.begin(), cacheList, it->second);
        return it->second->second;
    }
    
    void put(const Key& key, const Value& value) {
        auto it = cacheMap.find(key);
        if (it != cacheMap.end()) {
            // 更新值并移动到头部
            it->second->second = value;
            cacheList.splice(cacheList.begin(), cacheList, it->second);
        } else {
            // 插入新节点
            if (cacheList.size() == capacity_) {
                // 删除最久未使用的节点（链表尾部）
                auto last = cacheList.back();
                cacheMap.erase(last.first);
                cacheList.pop_back();
            }
            
            cacheList.emplace_front(key, value);
            cacheMap[key] = cacheList.begin();
        }
    }
    
    bool contains(const Key& key) const {
        return cacheMap.find(key) != cacheMap.end();
    }
    
    size_t size() const { return cacheList.size(); }
    
    void display() const {
        std::cout << "LRU缓存内容(最近→最久): ";
        for (const auto& pair : cacheList) {
            std::cout << "{" << pair.first << ":" << pair.second << "} ";
        }
        std::cout << std::endl;
    }
};

void lruCacheDemo() {
    LRUCache<std::string, int> cache(3);
    
    cache.put("A", 1);
    cache.put("B", 2);
    cache.put("C", 3);
    cache.display();  // C:3, B:2, A:1
    
    cache.get("A");   // 访问A，移动到头部
    cache.display();  // A:1, C:3, B:2
    
    cache.put("D", 4); // 插入D，淘汰最久未使用的B
    cache.display();  // D:4, A:1, C:3
}

场景3：配置管理系统

cpp 复制代码

class ConfigurationManager {
private:
    std::unordered_map<std::string, std::string> configMap;
    std::map<std::string, std::string> defaultConfig;
    
public:
    ConfigurationManager() {
        // 设置默认配置
        defaultConfig = {
            {"server.host", "localhost"},
            {"server.port", "8080"},
            {"database.url", "jdbc:mysql://localhost:3306/mydb"},
            {"log.level", "INFO"},
            {"cache.size", "1000"}
        };
    }
    
    void loadConfig(const std::string& filename) {
        // 模拟从文件加载配置
        configMap["server.host"] = "192.168.1.100";
        configMap["server.port"] = "9090";
        configMap["log.level"] = "DEBUG";
    }
    
    std::string getConfig(const std::string& key) {
        // 优先返回用户配置，否则返回默认配置
        auto it = configMap.find(key);
        if (it != configMap.end()) {
            return it->second;
        }
        
        auto defaultIt = defaultConfig.find(key);
        if (defaultIt != defaultConfig.end()) {
            return defaultIt->second;
        }
        
        throw std::runtime_error("配置项不存在: " + key);
    }
    
    void setConfig(const std::string& key, const std::string& value) {
        configMap[key] = value;
    }
    
    void displayAllConfigs() {
        std::cout << "当前配置:" << std::endl;
        
        // 合并配置（用户配置覆盖默认配置）
        std::map<std::string, std::string> mergedConfig(defaultConfig);
        mergedConfig.insert(configMap.begin(), configMap.end());
        
        for (const auto& pair : mergedConfig) {
            std::cout << pair.first << " = " << pair.second << std::endl;
        }
    }
};

void configDemo() {
    ConfigurationManager configManager;
    configManager.loadConfig("app.conf");
    configManager.setConfig("cache.size", "5000");
    
    std::cout << "服务器端口: " << configManager.getConfig("server.port") << std::endl;
    std::cout << "日志级别: " << configManager.getConfig("log.level") << std::endl;
    
    configManager.displayAllConfigs();
}

特殊变体：multimap与unordered_multimap

允许重复键的映射

cpp 复制代码

void multiMapDemo() {
    // multimap：允许重复键的有序映射
    std::multimap<std::string, std::string> authorBooks;
    
    authorBooks.insert({"作者A", "书籍1"});
    authorBooks.insert({"作者A", "书籍2"});
    authorBooks.insert({"作者B", "书籍3"});
    authorBooks.insert({"作者A", "书籍4"});
    
    std::cout << "作者A的所有书籍:" << std::endl;
    auto range = authorBooks.equal_range("作者A");
    for (auto it = range.first; it != range.second; ++it) {
        std::cout << " - " << it->second << std::endl;
    }
    
    // unordered_multimap：允许重复键的无序映射
    std::unordered_multimap<std::string, int> studentScores;
    studentScores.insert({"Alice", 95});
    studentScores.insert({"Alice", 88});
    studentScores.insert({"Bob", 92});
    
    std::cout << "Alice的所有成绩:" << std::endl;
    auto aliceScores = studentScores.equal_range("Alice");
    for (auto it = aliceScores.first; it != aliceScores.second; ++it) {
        std::cout << " - " << it->second << std::endl;
    }
}

性能优化技巧

1. 预留空间减少重新哈希

cpp 复制代码

void reserveOptimization() {
    std::unordered_map<int, std::string> largeMap;
    
    // 预先分配足够空间，避免多次重新哈希
    largeMap.reserve(100000);
    
    for (int i = 0; i < 100000; ++i) {
        largeMap[i] = "value" + std::to_string(i);
    }
    
    std::cout << "最终桶数量: " << largeMap.bucket_count() << std::endl;
    std::cout << "负载因子: " << largeMap.load_factor() << std::endl;
}

2. 使用自定义分配器

cpp 复制代码

#include <memory>

void customAllocatorDemo() {
    // 使用内存池分配器提高性能
    std::unordered_map<int, int, std::hash<int>, std::equal_to<int>,
                      std::allocator<std::pair<const int, int>>> optimizedMap;
    
    // 在实际应用中，可以使用更高效的自定义分配器
    // 如boost::pool_allocator或tcmalloc
}

3. 键选择优化

cpp 复制代码

void keyOptimization() {
    // 使用小且简单的键类型
    std::unordered_map<int, std::string> good;        // 好：整型键
    std::unordered_map<std::string, int> acceptable;  // 可接受：字符串键
    
    // 避免使用复杂键类型
    struct ComplexKey {
        std::string name;
        std::vector<int> data;
        // 需要自定义哈希函数和相等比较
    };
    
    // 如果必须使用复杂键，考虑使用指针或引用
    std::unordered_map<std::string*, int> keyByPointer;
}

选择指南：map vs unordered_map

决策流程图

复制代码

需要按键排序遍历？
    ↓是
选择 std::map
    ↓
    ↓否  
性能是关键因素，且不关心顺序？
    ↓是
选择 std::unordered_map
    ↓
    ↓否
需要稳定的性能表现？
    ↓是
选择 std::map (避免哈希冲突的最坏情况)
    ↓  
    ↓否
内存使用是关键因素？
    ↓是
选择 std::map (通常内存开销更小)
    ↓
    ↓否
选择 std::unordered_map (通常更快)

具体场景建议

cpp 复制代码

void usageRecommendations() {
    // 使用map的场景：
    // 1. 需要有序遍历
    std::map<std::string, int> sortedDictionary;
    
    // 2. 需要范围查询
    std::map<int, std::string> sortedRecords;
    auto start = sortedRecords.lower_bound(100);
    auto end = sortedRecords.upper_bound(200);
    
    // 3. 键比较操作昂贵，但比较次数少
    
    // 使用unordered_map的场景：
    // 1. 只需要快速查找，不关心顺序
    std::unordered_map<int, double> cache;
    
    // 2. 键的哈希计算快速且分布均匀
    std::unordered_map<int, std::string> idToName;
    
    // 3. 数据量很大，且不常遍历
    std::unordered_map<std::string, UserProfile> userDatabase;
}

现代C++特性应用

结构化绑定

cpp 复制代码

void structuredBinding() {
    std::map<std::string, int> scores = {{"Alice", 95}, {"Bob", 87}, {"Charlie", 92}};
    
    // C++17 结构化绑定
    for (const auto& [name, score] : scores) {
        std::cout << name << ": " << score << std::endl;
    }
    
    // 传统方式
    for (const auto& pair : scores) {
        std::cout << pair.first << ": " << pair.second << std::endl;
    }
}

透明比较器

cpp 复制代码

void transparentComparator() {
    // C++14 透明比较器
    std::map<std::string, int, std::less<>> transparentMap;
    
    transparentMap["hello"] = 42;
    
    // 可以直接使用字符串字面量查找，避免构造std::string
    auto it = transparentMap.find("hello");
    if (it != transparentMap.end()) {
        std::cout << "找到: " << it->second << std::endl;
    }
}

总结

map和unordered_map代表了两种不同的设计哲学和性能特征：

std::map的优势：

保证元素有序，支持范围查询
性能稳定，不受哈希函数影响
迭代器稳定性好
内存使用相对可预测

std::unordered_map的优势：

平均情况下更快的查找和插入
适合只需要快速访问的场景
大数据量时性能优势明显

核心选择原则：

要顺序 → map
要速度 → unordered_map
要稳定 → map
要内存效率 → 根据具体情况测试

理解这两种映射的内部机制和性能特征，能够帮助我们在面对具体问题时做出最合适的选择，写出既高效又可靠的代码。

下一章预告：《数据结构手册007：集合结构 - set与unordered_set的专精解析》

我们将深入探索集合这一重要的抽象数据类型，理解set和unordered_set在去重、集合运算和成员检查等方面的独特优势，以及它们与映射结构的紧密关系。