【数据结构】哈希表（Hash Table）详解

[1. 核心概念](#1. 核心概念)

核心特性：

[2. 哈希表的工作原理](#2. 哈希表的工作原理)

[2.1 基本工作流程](#2.1 基本工作流程)

[2.2 简单示例](#2.2 简单示例)

[3. 哈希冲突（Hash Collision）](#3. 哈希冲突（Hash Collision）)

冲突示例：

[4. 解决哈希冲突的方法](#4. 解决哈希冲突的方法)

[4.1 链地址法（Separate Chaining）★最常用](#4.1 链地址法（Separate Chaining）★最常用)

[4.2 开放地址法（Open Addressing）](#4.2 开放地址法（Open Addressing）)

[4.2.1 线性探测（Linear Probing）](#4.2.1 线性探测（Linear Probing）)

[4.2.2 平方探测（Quadratic Probing）](#4.2.2 平方探测（Quadratic Probing）)

[4.2.3 双重散列（Double Hashing）](#4.2.3 双重散列（Double Hashing）)

[5. 哈希函数的设计](#5. 哈希函数的设计)

[5.1 常见哈希函数设计](#5.1 常见哈希函数设计)

[6. Java中的HashMap实现](#6. Java中的HashMap实现)

[6.1 HashMap的核心结构](#6.1 HashMap的核心结构)

[6.2 HashMap的重要参数](#6.2 HashMap的重要参数)

[6.3 HashMap的扩容机制](#6.3 HashMap的扩容机制)

[7. 时间复杂度分析](#7. 时间复杂度分析)

[8. 哈希表的应用场景](#8. 哈希表的应用场景)

[8.1 缓存系统（LRU Cache）](#8.1 缓存系统（LRU Cache）)

[8.2 词频统计](#8.2 词频统计)

[8.3 快速查找表](#8.3 快速查找表)

[8.4 重复检测](#8.4 重复检测)

[8.5 两数之和问题](#8.5 两数之和问题)

[9. 哈希表实现示例](#9. 哈希表实现示例)

[9.1 简单哈希表实现](#9.1 简单哈希表实现)

[10. 哈希表相关问题与解决方案](#10. 哈希表相关问题与解决方案)

[10.1 哈希攻击（Hash Flooding Attack）问题与解决方案](#10.1 哈希攻击（Hash Flooding Attack）问题与解决方案)

问题描述

攻击原理分析

解决方案

[10.2 内存使用优化问题与解决方案](#10.2 内存使用优化问题与解决方案)

问题分析

解决方案

[10.3 一致性哈希（Consistent Hashing）问题与解决方案](#10.3 一致性哈希（Consistent Hashing）问题与解决方案)

1. 核心概念

哈希表 是一种通过键（Key） 直接访问值（Value） 的数据结构。它的核心思想是：通过哈希函数将键映射到数组的特定位置，从而实现快速访问。

核心特性：

键值对存储：存储的是键值对（Key-Value Pair）
快速访问 ：平均情况下，插入、删除、查找的时间复杂度都是 O(1)
无序性：元素没有固定的顺序（某些实现保持插入顺序）
键的唯一性：每个键在表中是唯一的

2. 哈希表的工作原理

2.1 基本工作流程

复制代码

输入键(Key) 
    ↓ 
哈希函数(Hash Function) 
    ↓ 
哈希值(Hash Code) 
    ↓ 
数组索引(Index) 
    ↓ 
访问数组对应位置

2.2 简单示例

复制代码

// 假设我们有一个大小为10的数组
String[] table = new String[10];

// 哈希函数：取字符串首字母的ASCII码对10取模
public int hash(String key) {
    return key.charAt(0) % 10;
}

// 存储键值对
table[hash("apple")] = "苹果";
table[hash("banana")] = "香蕉";

// 查找
String value = table[hash("apple")]; // 快速找到"苹果"

3. 哈希冲突（Hash Collision）

当两个不同的键经过哈希函数计算后得到相同的索引时，就发生了哈希冲突。这是哈希表设计中的核心问题。

冲突示例：

复制代码

// "apple" 和 "avocado" 首字母都是'a'
hash("apple")   = 'a' % 10 = 97 % 10 = 7
hash("avocado") = 'a' % 10 = 97 % 10 = 7 ← 冲突！

4. 解决哈希冲突的方法

4.1 链地址法（Separate Chaining）★最常用

原理：每个数组位置不直接存储元素，而是存储一个链表（或其他数据结构），所有哈希到同一位置的元素都放在这个链表中。

复制代码

// 链地址法的简化实现
class HashMap<K, V> {
    class Node<K, V> {
        K key;
        V value;
        Node<K, V> next;
        
        Node(K key, V value) {
            this.key = key;
            this.value = value;
        }
    }
    
    private Node<K, V>[] table;
    private int capacity;
    
    // 插入元素
    public void put(K key, V value) {
        int index = hash(key) % capacity;
        Node<K, V> newNode = new Node<>(key, value);
        
        if (table[index] == null) {
            table[index] = newNode; // 第一个节点
        } else {
            // 添加到链表头部
            newNode.next = table[index];
            table[index] = newNode;
        }
    }
    
    // 查找元素
    public V get(K key) {
        int index = hash(key) % capacity;
        Node<K, V> current = table[index];
        
        while (current != null) {
            if (current.key.equals(key)) {
                return current.value;
            }
            current = current.next;
        }
        return null;
    }
}

Java的HashMap就是采用链地址法，在JDK 1.8之后，当链表长度超过8时，会转换为红黑树以提高性能。

4.2 开放地址法（Open Addressing）

原理：当发生冲突时，按照某种探测序列寻找下一个空闲位置。

4.2.1 线性探测（Linear Probing）

复制代码

// 线性探测：依次检查下一个位置
public int linearProbe(int index, int attempt, int capacity) {
    return (index + attempt) % capacity;
}

// 示例：插入键"apple"和"avocado"（哈希值都是7）
// table[7] = "apple"
// table[8] = "avocado"（因为7被占用，检查下一个位置8）

4.2.2 平方探测（Quadratic Probing）

复制代码

// 平方探测：按平方数跳跃检查
public int quadraticProbe(int index, int attempt, int capacity) {
    return (index + attempt * attempt) % capacity;
}

4.2.3 双重散列（Double Hashing）

复制代码

// 使用第二个哈希函数
public int doubleHash(int index, int attempt, int capacity, K key) {
    int hash2 = secondaryHash(key);
    return (index + attempt * hash2) % capacity;
}

5. 哈希函数的设计

一个好的哈希函数应该满足：

均匀性：键应均匀分布在数组中
高效性：计算速度快
确定性：相同的键必须产生相同的哈希值

5.1 常见哈希函数设计

复制代码

// 1. 除法哈希法（最常用）
public int hashByDivision(K key, int capacity) {
    return key.hashCode() % capacity;
}

// 2. 乘法哈希法
public int hashByMultiplication(K key, int capacity) {
    double A = 0.6180339887; // 黄金比例的分数部分
    double hash = key.hashCode() * A;
    return (int)(capacity * (hash - Math.floor(hash)));
}

// 3. Java String的哈希函数（实际实现）
public int javaStringHash(String s) {
    int hash = 0;
    for (int i = 0; i < s.length(); i++) {
        hash = 31 * hash + s.charAt(i);
    }
    return hash;
}

6. Java中的HashMap实现

6.1 HashMap的核心结构

复制代码

// JDK 1.8+ 的HashMap简化结构
class HashMap<K, V> {
    static class Node<K, V> {
        final int hash;
        final K key;
        V value;
        Node<K, V> next;
    }
    
    // 在JDK 1.8中，当链表长度 ≥ 8时转换为红黑树
    static final class TreeNode<K, V> extends LinkedHashMap.Entry<K, V> {
        TreeNode<K, V> parent;
        TreeNode<K, V> left;
        TreeNode<K, V> right;
    }
    
    transient Node<K, V>[] table; // 哈希桶数组
    int size;                     // 键值对数量
    int threshold;                // 扩容阈值（容量 * 负载因子）
    final float loadFactor;       // 负载因子（默认0.75）
}

6.2 HashMap的重要参数

复制代码

// 默认初始容量：16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;

// 最大容量：2^30
static final int MAXIMUM_CAPACITY = 1 << 30;

// 默认负载因子：0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 树化阈值：链表长度 ≥ 8时转为红黑树
static final int TREEIFY_THRESHOLD = 8;

// 链化阈值：树节点数 ≤ 6时转回链表
static final int UNTREEIFY_THRESHOLD = 6;

// 最小树化容量：64
static final int MIN_TREEIFY_CAPACITY = 64;

6.3 HashMap的扩容机制

复制代码

// 当 size >= capacity * loadFactor 时触发扩容
HashMap<String, Integer> map = new HashMap<>(16, 0.75f);
// 初始阈值 = 16 * 0.75 = 12
// 当插入第13个元素时，触发扩容

// 扩容过程：
// 1. 创建新数组（通常是原容量的2倍）
// 2. 重新计算所有元素的位置（rehash）
// 3. 将元素转移到新数组

7. 时间复杂度分析

操作	平均情况	最坏情况	说明
插入	O(1)	O(n)或O(log n)	最坏情况：所有元素哈希到同一位置
查找	O(1)	O(n)或O(log n)	链地址法最坏O(n)，树化后O(log n)
删除	O(1)	O(n)或O(log n)	同查找
空间	O(n)	O(n)	存储n个元素

注意：Java HashMap通过树化机制将最坏情况从O(n)优化到O(log n)

8. 哈希表的应用场景

8.1 缓存系统（LRU Cache）

复制代码

class LRUCache<K, V> {
    private HashMap<K, Node> map;
    private DoublyLinkedList list;
    private int capacity;
    
    // 结合哈希表（快速查找）和双向链表（维护顺序）
    public V get(K key) {
        if (!map.containsKey(key)) return null;
        Node node = map.get(key);
        moveToHead(node); // 移动到头部表示最近使用
        return node.value;
    }
}

8.2 词频统计

复制代码

public Map<String, Integer> wordFrequency(String text) {
    Map<String, Integer> freqMap = new HashMap<>();
    String[] words = text.split("\\s+");
    
    for (String word : words) {
        freqMap.put(word, freqMap.getOrDefault(word, 0) + 1);
    }
    return freqMap;
}

// 使用：统计"hello world hello" → {"hello":2, "world":1}

8.3 快速查找表

复制代码

// 数据库索引的内存模拟
class UserDatabase {
    private Map<Integer, User> idIndex = new HashMap<>();
    private Map<String, User> emailIndex = new HashMap<>();
    
    public void addUser(User user) {
        idIndex.put(user.getId(), user);
        emailIndex.put(user.getEmail(), user);
    }
    
    // O(1)时间通过ID或邮箱查找用户
    public User findById(int id) { return idIndex.get(id); }
    public User findByEmail(String email) { return emailIndex.get(email); }
}

8.4 重复检测

复制代码

public boolean hasDuplicate(int[] nums) {
    Set<Integer> set = new HashSet<>();
    for (int num : nums) {
        if (set.contains(num)) return true; // O(1)检测
        set.add(num);
    }
    return false;
}

8.5 两数之和问题

复制代码

public int[] twoSum(int[] nums, int target) {
    Map<Integer, Integer> map = new HashMap<>(); // 值->索引的映射
    
    for (int i = 0; i < nums.length; i++) {
        int complement = target - nums[i];
        if (map.containsKey(complement)) {
            return new int[]{map.get(complement), i};
        }
        map.put(nums[i], i);
    }
    return new int[0];
}
// 时间复杂度：O(n)，空间复杂度：O(n)

9. 哈希表实现示例

9.1 简单哈希表实现

复制代码

class SimpleHashMap<K, V> {
    private static final int DEFAULT_CAPACITY = 16;
    private static final double LOAD_FACTOR = 0.75;
    
    static class Entry<K, V> {
        K key;
        V value;
        Entry<K, V> next;
        
        Entry(K key, V value) {
            this.key = key;
            this.value = value;
        }
    }
    
    private Entry<K, V>[] table;
    private int size;
    
    @SuppressWarnings("unchecked")
    public SimpleHashMap() {
        table = new Entry[DEFAULT_CAPACITY];
        size = 0;
    }
    
    private int hash(K key) {
        return key == null ? 0 : Math.abs(key.hashCode()) % table.length;
    }
    
    public void put(K key, V value) {
        if (size >= table.length * LOAD_FACTOR) {
            resize();
        }
        
        int index = hash(key);
        Entry<K, V> newEntry = new Entry<>(key, value);
        
        if (table[index] == null) {
            table[index] = newEntry;
        } else {
            Entry<K, V> current = table[index];
            while (current.next != null) {
                if (current.key.equals(key)) {
                    current.value = value; // 更新现有键
                    return;
                }
                current = current.next;
            }
            current.next = newEntry;
        }
        size++;
    }
    
    public V get(K key) {
        int index = hash(key);
        Entry<K, V> current = table[index];
        
        while (current != null) {
            if (current.key.equals(key)) {
                return current.value;
            }
            current = current.next;
        }
        return null;
    }
    
    @SuppressWarnings("unchecked")
    private void resize() {
        Entry<K, V>[] oldTable = table;
        table = new Entry[oldTable.length * 2];
        size = 0;
        
        for (Entry<K, V> entry : oldTable) {
            while (entry != null) {
                put(entry.key, entry.value);
                entry = entry.next;
            }
        }
    }
}

10. 哈希表相关问题与解决方案

10.1 哈希攻击（Hash Flooding Attack）问题与解决方案

问题描述

哈希攻击是一种拒绝服务攻击（DoS），攻击者故意构造大量具有相同哈希值的键，使哈希表性能急剧下降。

攻击原理分析

复制代码

// 攻击者可以构造大量相同哈希值的键
String[] attackKeys = {"a1", "a2", "a3", ..., "a10000"};

// 被攻击后：所有键都哈希到同一个桶
Map<String, String> map = new HashMap<>();
for (String key : attackKeys) {
    map.put(key, "value"); // 时间复杂度从 O(1) 退化为 O(n)
}

解决方案

使用加密级哈希函数
随机化哈希种子（主要防御手段）
树化优化（Java HashMap的终极防御）
请求频率限制

10.2 内存使用优化问题与解决方案

问题分析

哈希表的内存浪费主要来自数组空间浪费、链表/树节点开销和指针大小等方面。

复制代码

// 负载因子为0.75时，数组有25%的空间始终空闲
HashMap<String, String> map = new HashMap<>(16, 0.75f);
// 当包含12个元素时就会触发扩容，数组有4个位置（25%）未被使用

解决方案

选择合适的负载因子
预分配合适容量
使用专用数据结构
压缩存储技术

10.3 一致性哈希（Consistent Hashing）问题与解决方案

分布式环境中的问题

传统哈希在节点数量变化时，大部分数据需要重新分布：

复制代码

// 从3节点扩展到4节点时，大约75%的数据需要重新分布

一致性哈希解决方案

复制代码

public class ConsistentHash<T> {
    private final SortedMap<Integer, T> circle = new TreeMap<>();
    
    // 添加物理节点
    public void addNode(T node) {
        // 为每个物理节点创建多个虚拟节点
    }
    
    // 根据键获取对应节点
    public T getNode(Object key) {
        // 在环上顺时针查找第一个≥当前哈希值的节点
    }
}

实际应用场景

分布式缓存（Redis Cluster）
负载均衡器
分布式数据库分片

哈希表的优势：

极快的操作速度：平均O(1)的插入、删除、查找
灵活性：可以存储各种类型的键值对
高效内存使用：负载因子平衡了时间和空间

哈希表的劣势：

无序性：元素没有固定顺序（LinkedHashMap可解决）
空间开销：数组通常不会完全利用
哈希冲突：设计不良的哈希函数会影响性能

Java中的选择：

HashMap：最常用，非线程安全，允许null键值
Hashtable：线程安全但性能较差，已过时
ConcurrentHashMap：高并发场景的最佳选择
LinkedHashMap：保持插入顺序或访问顺序
TreeMap：按键排序，基于红黑树

哈希表是现代编程中最重要的数据结构之一，几乎所有的编程语言都内置了哈希表实现。理解其原理和特性，对于编写高效程序至关重要！