【C++ 学习】单词统计器：从 “代码乱炖” 到 “清晰可品” 的复习笔记

前言

各位未来的自己（或者路过的 C++ 新手），大家好！今天这篇笔记，是为了拯救当初 "摆烂式编码" 的我 ------ 两周前写这个单词统计器时，我主打一个 "能跑就行"：函数名叫dataProcessing（跟 "张三" 似的，谁知道是干嘛的），注释基本等于没有，全局变量随手扔。现在回头看，差点以为是别人写的 "加密代码"。

为了避免未来的你对着屏幕发呆（甚至想揍当初的自己），我特意做了两件事：

① 给函数 / 变量起 "人话" 名字；

② 加了比代码还长的注释；

③ 把踩过的坑都记下来。

咱就是说，主打一个 "以后复习不骂街"！不至于以后看到骂自己写的代码是屎山代码！

一、先搞懂：这代码到底是干嘛的？

简单说，这是一个 **"文本文件单词处理小工具"**，核心流程就 4 步：

读指定文本文件（比如data.txt），先把文件内容展示给你看；
再读一遍文件，把里面的单词拆出来（去掉逗号、引号这些标点）；
用set给单词去重（避免重复统计），用map给单词计数（算每个词出现几次）；
最后把 "去重后的单词" 和 "每个单词的次数" 打印出来。

整体架构像个小工厂：Word类（做 "单词身份证"）→ WordSet类（单词去重小卫士）/WordMap类（单词计数小会计）→ 数据处理函数（车间流水线）→ 显示函数（结果展示员）。

二、核心重构：给 "无名英雄" 起大名 + 加注释

当初为了省事儿，函数名瞎凑，现在全改成 "动词 + 名词" 的格式，一眼就知道干嘛的。咱逐个拆解：

1. 基础类：Word------ 给单词发 "身份证"

Word类是最小单位，负责封装单个单词，还得帮set/map干活（因为set要排序、map要找 key，必须知道怎么比较单词）。

cpp

复制代码

#pragma once
#include <string>
#include "myCode.h"

// Word类：封装单个单词，提供比较能力（set/map的"入场券"）
class Word {
private:
    std::string word_content;  // 存储单词的实际内容（改了变量名，原s太抽象）

public:
    // 构造函数：用字符串初始化单词（相当于给单词"上户口"）
    Word(std::string s) : word_content(s) {}

    // 获取单词内容（只读！避免外部瞎改内部数据）
    std::string GetWordContent() const {
        return this->word_content;
    }

    // 重载operator<：set/map默认按"小于"排序，必须实现！
    // 作用：告诉set/map"两个单词谁排在前面"（按字典序）
    bool operator<(const Word& other_word) const {
        // 比较当前单词和另一个单词的内容（用GetWordContent避免直接访问private）
        return this->word_content < other_word.GetWordContent();
    }

    // 重载operator==：判断两个单词是否相等（比如"apple"和"apple"是同一个）
    bool operator==(std::string target_word) {
        return this->word_content == target_word;
    }
};

复习重点：

set和map是 "有序容器"，必须知道怎么比较元素，所以Word类一定要重载operator<；
operator==是为了判断 "这个单词是不是目标单词"，避免统计错。

2. 去重类：WordSet------ 单词 "查重小卫士"

WordSet底层用set，核心功能就一个：把单词存进去，自动去重（set的特性：相同元素只存一次）。

cpp

复制代码

#pragma once
#include "myCode.h"
#include "Word.h"
#include <set>

// WordSet类：管理去重后的单词集合（set的封装）
class WordSet {
private:
    std::set<Word> unique_word_set;  // 底层容器：自动去重+按字典序排序

public:
    // 函数名重构：原wordset_add → AddWordToUniqueSet（明确"添加到去重集合"）
    // 功能：把单词添加到集合，自动去重
    bool AddWordToUniqueSet(std::string word) {
        // set.insert()会返回一个pair：first是迭代器，second是"是否插入成功"
        // 这里先简单返回true，实际可以用second判断是否是新单词（以后优化）
        unique_word_set.insert(Word(word));
        return true;
    }

    // 函数名重构：原show → ShowAllUniqueWords（明确"显示所有去重单词"）
    // 功能：打印集合里所有去重后的单词
    void ShowAllUniqueWords() {
        // 迭代器遍历set（set是有序的，所以打印出来也是按字典序）
        for (auto it = unique_word_set.begin(); it != unique_word_set.end(); it++) {
            // 用GetWordContent()获取单词内容，避免直接访问private
            std::cout << it->GetWordContent() << " ";
        }
    }
};

复习重点：

set的insert方法自带去重，不用我们手动判断 "这个单词是不是已经有了"；
遍历set必须用迭代器（或范围 for），不能用下标（因为set不是连续存储的）。

3. 计数类：WordMap------ 单词 "记账小会计"

WordMap底层用map（键值对：key=Word，value = 出现次数），核心功能：统计每个单词出现多少次。

cpp

复制代码

#pragma once
#include <map>
#include "myCode.h"
#include "Word.h"

// WordMap类：统计单词出现频率（map的封装）
class WordMap {
private:
    // 底层容器：key=Word（单词，自动去重），value=int（出现次数）
    std::map<Word, int> word_frequency_map;

public:
    // 函数名重构：原wordmap_add → AddWordToFrequencyMap（明确"添加到频率映射"）
    // 功能：把单词加入map，存在则次数+1，不存在则新建记录（次数=1）
    bool AddWordToFrequencyMap(std::string word) {
        // 1. 先找这个单词在map里有没有（用find方法，key是Word类型）
        auto it = word_frequency_map.find(Word(word));

        // 2. 如果没找到（it指向map末尾）：新建键值对，次数设为1
        if (it == word_frequency_map.end()) {
            // 构造一个pair：key=Word(word)，value=1
            std::pair<Word, int> new_word_record(Word(word), 1);
            word_frequency_map.insert(new_word_record);
        }
        // 3. 如果找到了：次数+1（it->second就是value，也就是次数）
        else {
            it->second++;
        }

        return true;
    }

    // 函数名重构：原show → ShowWordFrequency（明确"显示单词频率"）
    // 功能：打印每个单词及其出现次数
    void ShowWordFrequency() {
        // 迭代器遍历map（map也是有序的，按key的字典序）
        for (auto it = word_frequency_map.begin(); it != word_frequency_map.end(); it++) {
            // it->first是key（Word类型），it->second是value（次数）
            std::cout << it->first.GetWordContent() 
                      << "  出现了 " 
                      << it->second << "  次" << std::endl;
        }
    }
};

复习重点：

map的find方法按key查找，返回迭代器；没找到则返回map.end()；
it->first是key（这里是Word对象），it->second是value（这里是次数），别搞反了！

4. 数据处理函数：从 "瞎忙活" 到 "有条理"

原函数名dataProcessing太抽象，改成ProcessTextFileAndExtractWords（"处理文本文件并提取单词"），还拆成步骤注释，跟 "做菜教程" 似的。

cpp

复制代码

#include "test.h"
// 全局变量：原set1/map1 → 改了变量名，加注释（但注意：全局变量尽量少用，以后可优化）
WordSet global_unique_word_set;  // 全局去重单词集合
WordMap global_word_frequency_map;  // 全局单词频率映射

/*
 * 函数名：ProcessTextFileAndExtractWords
 * 功能：1. 读取data.txt并展示内容；2. 再次读取文件，拆分单词并存入set和map
 * 坑点：当初两次打开文件（有点费资源），但先保持原逻辑，以后可优化成一次处理
 */
void ProcessTextFileAndExtractWords() {
    std::string current_line = "";  // 存储当前读取的一行文本（原s，改后更明确）
    // 分隔符集合：要去掉的标点（原delimet，改delimiters，加了分号和括号，更全面）
    std::string delimiters = ".,?!'\";()[]";  
    int current_find_pos = 0;  // 当前查找分隔符的位置（原pos，改后不抽象）

    // -------------------------- 步骤1：打开文件，先展示文件内容 --------------------------
    std::ifstream text_file_reader1;  // 文件读取流1（专门用来展示内容）
    // 打开文件：只读模式（ios::in），文件名为data.txt
    text_file_reader1.open("data.txt", std::ios::in);
    // 检查文件是否打开成功（比如文件不存在就会失败，必须判断！）
    if (!text_file_reader1.is_open()) {
        std::cerr << "错误：文件data.txt打开失败！可能文件不在当前目录～" << std::endl;
        return;  // 打开失败就别往下走了
    }

    std::string temp_line;  // 临时存储每行内容（用于展示）
    std::cout << "=== 正在展示文件内容 ===" << std::endl;
    // 逐行读取文件（getline：读整行，包括空格，直到换行符）
    while (std::getline(text_file_reader1, temp_line)) {
        std::cout << temp_line << std::endl;  // 打印每行内容
    }
    text_file_reader1.close();  // 关闭文件流1（用完要关，不然占资源）
    std::cout << "=== 文件内容展示完毕 ===" << std::endl;

    // -------------------------- 步骤2：再次打开文件，拆分单词并统计 --------------------------
    std::ifstream text_file_reader2;  // 文件读取流2（专门用来处理单词）
    text_file_reader2.open("data.txt", std::ios::in);
    if (!text_file_reader2.is_open()) {
        std::cerr << "错误：文件data.txt再次打开失败！" << std::endl;
        return;
    }

    // 逐行读取文件（直到文件末尾）
    while (std::getline(text_file_reader2, current_line)) {
        // 跳过空行或只有空格的行（原bug：只判断s==" "，现在判断更全面）
        if (current_line.empty() || IsAllWhitespace(current_line)) {
            continue;  // 是空行就跳过，不处理
        }

        // 关键！每次处理新行前，把查找位置重置为0（原bug：没重置，导致下一行漏处理分隔符）
        current_find_pos = 0;
        // 循环查找当前行里的分隔符，直到找不到（string::npos表示"没找到"）
        while ((current_find_pos = current_line.find_first_of(delimiters, current_find_pos)) != std::string::npos) {
            // 把找到的分隔符替换成空格（方便后续按空格拆单词）
            current_line.replace(current_find_pos, 1, " ");
            current_find_pos++;  // 移动到下一个位置，避免重复处理同一个分隔符
        }

        // 用istringstream拆分单词：把处理后的行转换成"字符串流"，像读文件一样读单词
        std::istringstream line_stream(current_line);
        std::string extracted_word;  // 存储拆分出来的单个单词（原word，改后更明确）

        // 循环读取单词（直到流结束）：iss >> word会自动跳过空格，不用手动处理
        while (!line_stream.eof()) {
            line_stream >> extracted_word;  // 从流里读一个单词

            // 处理流错误（比如流末尾有无效字符，避免统计错单词）
            if (line_stream.fail()) {
                break;  // 出错就退出循环
            }

            // 跳过空单词（理论上iss >>不会读空，但保险起见）
            if (extracted_word == " " || extracted_word.empty()) {
                continue;
            }

            // 把拆分好的单词交给set去重、map计数
            global_unique_word_set.AddWordToUniqueSet(extracted_word);
            global_word_frequency_map.AddWordToFrequencyMap(extracted_word);
        }
    }

    text_file_reader2.close();  // 关闭文件流2
    std::cout << "=== 单词拆分与统计完成 ===" << std::endl;
}

/*
 * 辅助函数：IsAllWhitespace
 * 功能：判断一个字符串是否全是空白字符（空格、制表符等），用于跳过空行
 */
bool IsAllWhitespace(const std::string& str) {
    // 遍历字符串，只要有一个不是空白字符，就返回false
    for (char c : str) {
        if (!std::isspace(static_cast<unsigned char>(c))) {
            return false;
        }
    }
    return true;
}

复习重点（踩坑记录！）：

坑 1：current_find_pos 没重置 ：当初处理下一行时，pos还是上一行的最后位置，导致分隔符漏处理。解决方案：每次读新行后，current_find_pos = 0；
坑 2：空行判断不完整 ：只判断current_line == " "，全空格的行（比如" "）没跳过。解决方案：加IsAllWhitespace辅助函数；
坑 3：文件没判断是否打开成功 ：当初没加is_open()判断，文件不存在时直接崩溃。解决方案：每次open后都要检查；
istringstream 的使用 ：用它拆单词比手动找空格方便太多，>>会自动跳过所有空白字符（空格、制表符等）。

5. 显示函数：让结果 "说话"

原函数名showData改成ShowWordStatisticResults（"显示单词统计结果"），加了分隔线，结果更清晰。

cpp

复制代码

/*
 * 函数名：ShowWordStatisticResults
 * 功能：打印最终统计结果（去重后的单词 + 每个单词的出现次数）
 */
void ShowWordStatisticResults() {
    std::cout << "\n==================== 最终统计结果 ====================" << std::endl;
    std::cout << "1. 去重后的单词集合（共" << global_unique_word_set.GetSetSize() << "个）：" << std::endl;
    global_unique_word_set.ShowAllUniqueWords();  // 显示去重单词

    std::cout << "\n\n2. 单词出现频率统计：" << std::endl;
    global_word_frequency_map.ShowWordFrequency();  // 显示单词次数

    std::cout << "======================================================" << std::endl;
}

// 给WordSet加个获取大小的方法（原代码没有，方便显示统计数量）
size_t WordSet::GetSetSize() const {
    return unique_word_set.size();
}

三、当初踩过的坑，现在记下来（复习重点！）

坑位编号	当初的蠢操作	后果	解决方案
1	函数名叫`dataProcessing`、`wordmap_add`	隔两周就忘了这函数干嘛的	按 "动词 + 名词" 起名，比如`ProcessTextFileAndExtractWords`
2	处理分隔符时`pos`没重置	下一行的分隔符漏处理，单词带标点	每次读新行后，`current_find_pos = 0`
3	空行判断只用`current_line == " "`	全空格的行没跳过，统计空单词	写`IsAllWhitespace`辅助函数，判断全空白
4	没判断文件是否打开成功	文件不存在时直接崩溃	每次`open`后用`is_open()`检查
5	用`!line_stream.eof()`判断流结束，没加`fail()`	可能多读一次无效单词	加`if (line_stream.fail()) break;`，或直接用`while (line_stream >> extracted_word)`

四、复习小贴士（未来的你看这里！）

如果你忘了set/map为什么能排序：回头看Word类的operator<，这是它们的 "排序说明书"；
如果你忘了怎么拆单词：找istringstream的部分，它是 "单词拆分神器"，自动跳过空格；
如果你遇到文件打开失败：先检查文件是不是在程序运行目录（比如 VS 的 Debug 目录）；
如果你想优化代码：可以把 "两次打开文件" 改成 "一次打开"，边展示边处理，省资源。

结尾

现在这代码终于 "人模狗样" 了，以后复习的时候，看到这些注释和函数名，应该会感谢现在认真的自己。C++ 学习就是这样，一开始难免 "摆烂"，但只要及时整理、记好坑，慢慢就能写出 "自己看得懂" 的代码～

下次再写类似工具，可别再给函数起 "张三李四" 式的名字啦！

下面附上源码（希望以后得自己看到不要骂街）：

头文件：

cpp 复制代码


#pragma once
#include <string>
#include "myCode.h"

//Word 单词基础类声明
class Word {
private:
	string s;

public:
	Word(string s);

	string getWord() const;

	bool operator<(const Word& a)const;

	bool operator==(string s);

};


#pragma once
#include <map>
#include "myCode.h"
#include "Word.h"

//统计单词个数
class WordMap {

private:
	map<Word, int> map_1;

public:

	//添加map
	bool wordmap_add(string s);
	//显示函数
	void show();

};

#pragma once
#include "myCode.h"
#include "Word.h"
#include <set>

class WordSet {

private:
	set<Word> set_1;

public:
	//添加word集合元素
	bool wordset_add(string s);
	//显示函数
	void show();
};

#pragma once
#include "myCode.h"
#include "Word.h"
#include "WordSet.h"
#include "WordMap.h"

#include <sstream>
#include <fstream>
#include <string>
#include <algorithm>
#include <numeric>
#include <iterator>

/*
	函数声明
*/
//数据处理函数
void dataProcessing();
//数据显示函数
void showData();

源文件：

cpp 复制代码

#include "Word.h"

/*
	Word类的定义实现
*/
Word::Word(string s):s(s){}

string Word::getWord()const {
	return this->s;
}

//bool Word::operator(const Word& a) const {
bool Word::operator<(const Word & a)const{
	return s < a.getWord();

}

bool Word::operator==(string s) {
	return this->s == s;
}


#include "WordMap.h"


//添加map
bool WordMap::wordmap_add(string s) {

	//map<Word, int>::iterator it = map_1.find(s);
	auto it = map_1.find(s);

	//如果map_1中没有找到s,创建新的键值对
	if (it == map_1.end()) {
		pair<Word, int> p(Word(s), 1);
		map_1.insert(p);
	}
	//map_1中找到s后就将值加1
	else {
		//map_1[s]++;	这种方法访问key值为s的value
		it->second++;
	}
	return true;
}
//显示函数 使用迭代器
void WordMap::show() {
	for (auto it = map_1.begin(); it != map_1.end(); it++) {
		cout << it->first.getWord() 
			<< "  出现了"
			<< it->second <<"  次" << endl;
	}
}



#include "WordSet.h"
#include "WordSet.h"

/*
	类的定义实现
*/
//添加word集合元素
bool WordSet::wordset_add(string s) {
	set_1.insert(Word(s));
	return true;//1
}
//显示函数
void WordSet::show() {

	//for (auto c : set_1) {
	//	cout << c.getWord() << endl;
	//}

	//使用迭代器
	for (auto i = set_1.begin(); i != set_1.end(); i++) {
		cout << i->getWord() << " ";
	}
}



#include "test.h"

//全局变量
WordSet set1;
WordMap map1;

/*
	函数定义实现
*/
//数据处理函数
void dataProcessing() {
	string s = "";//初始值为空
	string delimet = ".,?!'\"";//留下个！的bug
	int pos = 0;
	//ifstream input1("data.txt",ios::in);

	//把data.txt文件的内容直接展示出来
	ifstream input1;
	input1.open("data.txt", ios::in);
	char ch;
	//while (input1.peek() != EOF) {
	//	input1.read(&ch, 1);
	//	cout << ch;
	//}
	//cout << endl;

	//char buf[1024];
	//while (input1.getline(buf,sizeof(buf))){
	//	cout << buf << endl;
	//}

	string buf;
	while (getline(input1, buf)) {
		cout << buf << endl;
	}

	//关闭文件
	input1.close();

	//把data.txt文件里的单词进行分割，装入set和map中
	ifstream inf("data.txt", ios::in);	//打开文件
	//inf没有到文件尾
	while (getline(inf, s)) {//一行一行的读取
		if (s==" ") { //如果是空格，就继续读取下一行
			continue;
		}
		//pos的初始值为0，每读一行后pos重置为0
		pos = 0;
		//每行一个符号一个符号的找
		//while (pos = s.find_first_of(delimet, pos) != string::npos) {  //bug代码
		while ( (pos = s.find_first_of(delimet, pos))!=string::npos ) {
			s.replace(pos, 1, " ");//找到delinet对应的符号，就替换成空格
		}

		istringstream strings(s);
		string word;
		while (!strings.eof()) {
			strings >> word;//读取s存到word里面

			//修改存在的bug
			if (strings.fail()) {
				break;
			}

			if (word == " ") {
				continue;
			}
			set1.wordset_add(word);
			map1.wordmap_add(word);
		}

	}

	//关闭文件
	inf.close();
	

}
//数据显示函数
void showData() {
	cout << endl;
	cout << "单词的集合为：" << endl;
	set1.show();

	cout << endl;

	cout << "单词集合及出现的次数：" << endl;
	map1.show();
	cout << endl;
}