【附C源码】C语言实现散列表

散列表（Hash Table）作为基础数据结构之一，在实际工程中应用极为广泛。无论是编译器的符号表、数据库的索引实现，还是缓存系统的设计，都能看到它的身影。本文将介绍一种基于链地址法的散列表实现，包含动态扩容机制，代码约400行，适合用于学习和理解哈希结构的内部工作原理。

散列表的基本原理

散列表的核心思想是通过哈希函数将键映射到数组的特定位置，从而实现平均O(1)时间复杂度的查找。然而，由于键空间通常远大于数组容量，不同的键可能映射到同一位置，这种现象称为哈希冲突。

处理冲突的常见策略有两种：

开放寻址法：在数组内部寻找下一个空闲位置
链地址法：在每个桶位置维护一个链表，冲突的元素依次存入链表

本文采用链地址法，其实现相对直观，且删除操作更为简便。
散列表结构
Hash函数
Key
索引
桶数组
链表1
链表2
...
链表n

数据结构定义

首先定义两个核心结构体：

c 复制代码

// 键值对节点
typedef struct HashNode {
    char* key;              // 键（动态字符串）
    int value;              // 值
    struct HashNode* next;  // 链表下一个节点
} HashNode;

// 散列表结构
typedef struct {
    HashNode** buckets;     // 桶数组（存储链表头指针）
    int size;               // 当前元素数量
    int capacity;           // 桶数组容量
} HashTable;

这里有几个设计考量：

键使用动态字符串 ：通过malloc分配内存，确保键的生命周期独立于调用者传入的字符串
桶数组存储指针 ：HashNode**的设计使得空桶仅占用一个指针的内存，空间效率较好
维护size和capacity：便于计算负载因子，为动态扩容提供依据

哈希函数的选择

哈希函数的优劣直接影响散列表的性能。理想的哈希函数应当满足：

计算速度快
输出分布均匀，减少冲突

本实现采用经典的DJB2算法，由Daniel J. Bernstein设计：

c 复制代码

static unsigned long hashString(const char* str) {
    unsigned long hash = 5381;
    int c;
    while ((c = *str++)) {
        hash = ((hash << 5) + hash) + c; // hash * 33 + c
    }
    return hash;
}

该算法的核心在于hash * 33 + c，其中33的选择经过实践验证，在字符串哈希场景下表现良好。初始值5381也是一个经验值，有助于减少小字符串的碰撞。

动态扩容机制

散列表的性能与负载因子（Load Factor，即元素数量/桶数量）密切相关。当负载因子过高时，链表长度增加，查找效率退化；当负载因子过低时，空间利用率不足。

本实现设置了两个阈值：

扩容阈值：0.75
缩容阈值：0.125

c 复制代码

#define LOAD_FACTOR_THRESHOLD 0.75
#define SHRINK_FACTOR_THRESHOLD 0.125

扩容操作需要重新计算所有元素的哈希值，因为桶数量变化后，取模运算的结果也会改变：

c 复制代码

static bool hashTableResize(HashTable* ht, int newCapacity) {
    HashNode** oldBuckets = ht->buckets;
    int oldCapacity = ht->capacity;
    
    // 创建新的桶数组
    HashNode** newBuckets = (HashNode**)calloc(newCapacity, sizeof(HashNode*));
    if (!newBuckets) return false;
    
    ht->buckets = newBuckets;
    ht->capacity = newCapacity;
    
    // 重新插入所有元素
    for (int i = 0; i < oldCapacity; i++) {
        HashNode* node = oldBuckets[i];
        while (node) {
            HashNode* next = node->next;
            
            // 计算新索引并头插法插入
            int newIndex = getIndex(ht, node->key);
            node->next = ht->buckets[newIndex];
            ht->buckets[newIndex] = node;
            
            node = next;
        }
    }
    
    free(oldBuckets);
    return true;
}

扩容流程如下：
是
否
是
否
否
是
触发扩容条件
创建2倍容量的新桶数组
遍历旧桶数组
当前桶为空?
继续下一个桶
遍历链表节点
计算新索引
头插法插入新桶
链表还有节点?
遍历完成?
释放旧桶数组
完成

核心操作实现

插入操作

插入时需要先检查是否需要扩容，然后处理两种场景：键已存在则更新值，键不存在则创建新节点：

c 复制代码

bool hashTablePut(HashTable* ht, const char* key, int value) {
    if (!ht || !key) return false;
    
    // 检查是否需要扩容
    if (getLoadFactor(ht) >= LOAD_FACTOR_THRESHOLD) {
        if (!hashTableResize(ht, ht->capacity * 2)) {
            return false;
        }
    }
    
    int index = getIndex(ht, key);
    HashNode* current = ht->buckets[index];
    
    // 检查是否已存在该键
    while (current) {
        if (strcmp(current->key, key) == 0) {
            current->value = value;  // 更新值
            return true;
        }
        current = current->next;
    }
    
    // 创建新节点并头插
    HashNode* newNode = createNode(key, value);
    if (!newNode) return false;
    
    newNode->next = ht->buckets[index];
    ht->buckets[index] = newNode;
    ht->size++;
    
    return true;
}

采用头插法将新节点插入链表，时间复杂度为O(1)，无需遍历到链表尾部。

查找与删除

查找操作遍历对应桶的链表，进行字符串比较：

c 复制代码

bool hashTableGet(HashTable* ht, const char* key, int* outValue) {
    if (!ht || !key || !outValue) return false;
    
    int index = getIndex(ht, key);
    HashNode* current = ht->buckets[index];
    
    while (current) {
        if (strcmp(current->key, key) == 0) {
            *outValue = current->value;
            return true;
        }
        current = current->next;
    }
    
    return false;
}

删除操作需要维护前驱指针，以正确调整链表连接关系：

c 复制代码

bool hashTableRemove(HashTable* ht, const char* key) {
    if (!ht || !key) return false;
    
    int index = getIndex(ht, key);
    HashNode* current = ht->buckets[index];
    HashNode* prev = NULL;
    
    while (current) {
        if (strcmp(current->key, key) == 0) {
            if (prev) {
                prev->next = current->next;
            } else {
                ht->buckets[index] = current->next;
            }
            
            free(current->key);
            free(current);
            ht->size--;
            
            // 检查是否需要缩容
            if (ht->capacity > INITIAL_CAPACITY && 
                getLoadFactor(ht) <= SHRINK_FACTOR_THRESHOLD) {
                hashTableResize(ht, ht->capacity / 2);
            }
            
            return true;
        }
        prev = current;
        current = current->next;
    }
    
    return false;
}

删除后检查缩容条件，避免内存浪费。

内存管理

本实现中所有动态分配的内存都有明确的释放路径：

节点销毁 ：destroyNodeChain函数遍历链表，依次释放键字符串和节点本身
表销毁 ：hashTableDestroy遍历所有桶，调用节点销毁函数，最后释放桶数组和表结构

c 复制代码

static void destroyNodeChain(HashNode* node) {
    while (node) {
        HashNode* temp = node;
        node = node->next;
        free(temp->key);
        free(temp);
    }
}

void hashTableDestroy(HashTable* ht) {
    if (!ht) return;
    
    for (int i = 0; i < ht->capacity; i++) {
        destroyNodeChain(ht->buckets[i]);
    }
    
    free(ht->buckets);
    free(ht);
}

测试验证

代码包含完整的测试用例，覆盖以下场景：

基本插入和查询
值更新
删除操作
扩容触发与验证
键遍历
冲突处理验证
空值查询

通过hashTablePrint函数可以直观观察散列表的内部状态，包括各桶的链表分布情况，便于调试和理解扩容前后的变化。

总结

本文介绍的散列表实现虽然代码量不大，但涵盖了哈希数据结构的核心要素：哈希函数设计、冲突处理、动态扩容、内存管理等。对于需要理解散列表内部机制的开发者，或者需要在嵌入式等场景下使用轻量级哈希实现的场景，这份代码具有一定的参考价值。

完整代码已开源，如有问题欢迎交流讨论。

⚠️源码地址：https://github.com/anjuxi/C-hash_table