【字典树 C++ 实现】

文章目录

前言
一、字典树（Trie）是什么？
二、基本操作与算法思路
- [1. 插入（Insert）](#1. 插入（Insert）)
- [2. 查找（Search）](#2. 查找（Search）)
- [3. 前缀判断（StartsWith）](#3. 前缀判断（StartsWith）)
- [4. 删除（Remove）](#4. 删除（Remove）)
- [5. 自动补全（Autocomplete）](#5. 自动补全（Autocomplete）)
[三、C++ 实现](#三、C++ 实现)
四、测试
五、复杂度分析

前言

字典树（Trie，也叫前缀树）适合用于实现自动补全、前缀搜索、单词字典、敏感词过滤等功能。

一、字典树（Trie）是什么？

Trie 是一棵多叉树（每个结点代表一个字符），从根节点到某个结点的路径表示一个字符串的前缀或整个单词。常见特征：

根节点代表空字符串。
每个边对应一个字符（例如英语小写字母 a--z），结点可以有多个子结点。
结点通常保存"是否是单词结尾"的标记（isEnd）。
查询单词或判断前缀都可以在树上沿字符逐层访问完成，时间复杂度与单词长度成线性关系。

优点：查询、插入、前缀查询时间优秀（O(L)），适合海量字符串前缀操作。缺点：内存消耗可能较大（尤其当字母表大或字符串稀疏时）。

二、基本操作与算法思路

1. 插入（Insert）

从根开始，对单词的每个字符：

若当前结点没有对应子结点则创建。
移动到子结点，处理下一个字符。
最后标记当前结点为单词结尾。

时间复杂度：对单词长度为 L，插入为 O(L)。

2. 查找（Search）

类似插入，不创建结点，只沿字符查找：

若任一字符对应子结点缺失，则单词不存在。
如果遍历完字符且当前结点 isEnd 为真，则单词存在。

时间复杂度：O(L)。

3. 前缀判断（StartsWith）

沿字符查找，若能走完前缀字符则存在前缀。时间复杂度：O§，P 为前缀长度。

4. 删除（Remove）

删除相对复杂，要保证只删除不再被任何单词使用的结点。常用办法是递归：

递归到单词末尾，取消 isEnd 标记。
如果该结点没有子结点，则返回 true 表示该结点可删除，父结点将其指针置空并继续判断。
否则不可删除，返回 false。

时间复杂度：O(L)。

5. 自动补全（Autocomplete）

先定位到前缀结点，然后对该子树做 DFS/BFS 收集最多 k 个单词或全量单词。时间复杂度：查找前缀 O§ + 遍历匹配单词的复杂度（取决于输出量与深度）。

三、C++ 实现

cpp 复制代码

#include <bits/stdc++.h>
using namespace std;

/*
 Trie 实现（26 小写字母）
 提供：insert, search, startsWith, remove, autocomplete
*/
class TrieNode {
public:
    bool isEnd;
    // 子结点指针数组（26）
    array<TrieNode*, 26> children;
    TrieNode() : isEnd(false) {
        children.fill(nullptr);
    }
    ~TrieNode() {
        for (auto p : children) {
            if (p) delete p;
        }
    }
};

class Trie {
private:
    TrieNode* root;

    // 删除单词的递归函数，返回是否可以删除当前节点
    bool removeHelper(TrieNode* node, const string& word, int depth) {
        if (!node) return false;
        if (depth == (int)word.size()) {
            if (!node->isEnd) return false; // 单词不存在
            node->isEnd = false;
            // 如果没有子节点，则可以删除该节点
            for (auto ch : node->children) if (ch) return false;
            return true;
        }
        int idx = word[depth] - 'a';
        TrieNode* child = node->children[idx];
        if (!child) return false;
        bool shouldDeleteChild = removeHelper(child, word, depth + 1);
        if (shouldDeleteChild) {
            delete child;
            node->children[idx] = nullptr;
            // 判断当前节点是否能被删除：非单词结尾且无任何子节点
            if (!node->isEnd) {
                for (auto ch : node->children) if (ch) return false;
                return true;
            } else {
                return false;
            }
        }
        return false;
    }

    // 自动补全：从 node 开始 DFS 收集单词
    void dfsCollect(TrieNode* node, string& path, vector<string>& out, int limit) {
        if (!node) return;
        if ((int)out.size() >= limit) return;
        if (node->isEnd) out.push_back(path);
        for (int i = 0; i < 26 && (int)out.size() < limit; ++i) {
            if (node->children[i]) {
                path.push_back('a' + i);
                dfsCollect(node->children[i], path, out, limit);
                path.pop_back();
            }
        }
    }

public:
    Trie() {
        root = new TrieNode();
    }
    ~Trie() {
        delete root;
    }

    void insert(const string& word) {
        TrieNode* cur = root;
        for (char c : word) {
            int idx = c - 'a';
            if (idx < 0 || idx >= 26) {
                // 简化：本实现只支持小写字母，遇到其他字符可以选择跳过或抛错
                continue;
            }
            if (!cur->children[idx]) cur->children[idx] = new TrieNode();
            cur = cur->children[idx];
        }
        cur->isEnd = true;
    }

    bool search(const string& word) const {
        TrieNode* cur = root;
        for (char c : word) {
            int idx = c - 'a';
            if (idx < 0 || idx >= 26) return false;
            if (!cur->children[idx]) return false;
            cur = cur->children[idx];
        }
        return cur->isEnd;
    }

    bool startsWith(const string& prefix) const {
        TrieNode* cur = root;
        for (char c : prefix) {
            int idx = c - 'a';
            if (idx < 0 || idx >= 26) return false;
            if (!cur->children[idx]) return false;
            cur = cur->children[idx];
        }
        return true;
    }

    void remove(const string& word) {
        removeHelper(root, word, 0);
    }

    // 返回最多 limit 个以 prefix 为前缀的单词（按字典序）
    vector<string> autocomplete(const string& prefix, int limit = 10) {
        vector<string> res;
        TrieNode* cur = root;
        for (char c : prefix) {
            int idx = c - 'a';
            if (idx < 0 || idx >= 26) return res;
            if (!cur->children[idx]) return res;
            cur = cur->children[idx];
        }
        string path = prefix;
        dfsCollect(cur, path, res, limit);
        return res;
    }
};

四、测试

cpp 复制代码

int main() {
    Trie trie;
    vector<string> words = {"apple", "app", "application", "apt", "banana", "band", "bandit", "bat"};
    for (auto &w : words) trie.insert(w);

    // 测试 search
    cout << boolalpha;
    cout << "search(\"app\") = " << trie.search("app") << "\n";         // true
    cout << "search(\"apply\") = " << trie.search("apply") << "\n";     // false

    // 测试 startsWith
    cout << "startsWith(\"ap\") = " << trie.startsWith("ap") << "\n";   // true
    cout << "startsWith(\"ba\") = " << trie.startsWith("ba") << "\n";   // true
    cout << "startsWith(\"cat\") = " << trie.startsWith("cat") << "\n"; // false

    // 自动补全
    auto cands = trie.autocomplete("ap", 5);
    cout << "autocomplete(\"ap\"):\n";
    for (auto &s : cands) cout << "  " << s << "\n";

    // 删除
    trie.remove("app");
    cout << "after remove(\"app\") search(\"app\") = " << trie.search("app") << "\n"; // depends: app was word, now false
    cout << "after remove(\"app\") startsWith(\"app\") = " << trie.startsWith("app") << "\n"; // true (because application, apple)

    // 删除 "application" 再测试
    trie.remove("application");
    cout << "after remove(\"application\") startsWith(\"app\") = " << trie.startsWith("app") << "\n"; // still true because "apple"
    trie.remove("apple");
    cout << "after remove(\"apple\") startsWith(\"app\") = " << trie.startsWith("app") << "\n"; // maybe false if none left

    return 0;
}

五、复杂度分析

插入 / 查找 / 前缀判断：对长度为 (L) 的单词为 (O(L))。

（逐字符访问，最多做 L 次指针查找与数组索引。）
删除：最坏情况也为 (O(L))，因为需要沿路径向下再递归回溯判断删除条件。
空间复杂度：取决于树中结点数量。最坏情况（没有共享前缀）结点数等于所有单词长度之和，即 (\sum_{w\in S} |w|)。每个结点保存 26 个指针（或使用 map/哈希表以节省稀疏树的内存）。
实际工程中可以通过以下方式优化内存：
1. 把 children 从 array<TrieNode*,26> 换成 vector<pair<char, TrieNode*>> 或 unordered_map<char, TrieNode*>（节省稀疏树内存，但查找成本上升）。
2. 使用内存池（pool allocator）减少频繁 new/delete 的开销。
3. 使用压缩字典树（Radix Tree / Patricia Trie）合并只有一个孩子的链，减少结点数。
4. 如果只处理小写字母且数据量大，使用 array+bitset 带来时间优先的实现。