一键语法错误增强工具

欢迎使用我最近开源的使用一键语法错误增强工具，该工具可以进行14种语法错误的增强，不同行业可以根据自己的数据进行错误替换，来训练自己的语法和拼写模型，希望推动行业文本纠错的发展，欢迎Star，14种错误如下所示：

每种错误类型，对应的使用方法，如下所示：

环境的安装

复制代码

pip install ChineseErrorCorrector

不同类型的数据增强

1.缺字漏字

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.lack_word("小明住在北京"))

# 输出：小明在北京

2.错别字错误

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.wrong_word("小明住在北京"))
# 输出：小明住在北鲸

3.缺少标点

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.lack_char("小明住在北京，热爱NLP。"))
# 输出：小明住在北京热爱NLP。

4.错用标点

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.wrong_char("小明住在北京"))
# 输出：小明住在北京。热爱NLP。

5.主语不明

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.unknow_sub("小明住在北京"))
# 输出：住在北京

6.谓语残缺

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.unknow_pred("小明住在北京"))
# 输出：小明在北京

7.宾语残缺

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.lack_obj("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，热爱。

8.其他成分残缺

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.lack_others("小明住在北京，热爱NLP。"))
# 输出：小明住北京，热爱NLP。

9.虚词多余

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.red_fun("小明住在北京，热爱NLP。"))
# 输出：小明所住的在北京，热爱NLP。

10.其他成分多余

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.red_component("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，热爱NLP。，看着

11.主语多余

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.red_sub("小明住在北京，热爱NLP。"))
# 输出：小明住在北京，小明热爱NLP。

12.语序不当

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.wrong_sentence_order("小明住在北京，热爱NLP。"))
# 输出：热爱NLP。，小明住在北京

13.动宾搭配不当

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.wrong_ver_obj("小明住在北京，热爱NLP。"))
# 输出：None ，即无法进行此类错误的增强

14.其他搭配不当

复制代码

from ChineseErrorCorrector.dat import GrammarErrorDat

cged_tool = GrammarErrorDat()
print(cged_tool.other_wrong("小明住在北京，热爱NLP。"))
# 输出：None, 即无法进行此类错误的增强

**代码地址：**https://github.com/TW-NLP/ChineseErrorCorrector

一键语法错误增强工具 ChineseErrorCorrector

一键语法错误增强工具

环境的安装

不同类型的数据增强

1.缺字漏字

2.错别字错误

3.缺少标点

4.错用标点

5.主语不明

6.谓语残缺

7.宾语残缺

8.其他成分残缺

9.虚词多余

10.其他成分多余

11.主语多余

12.语序不当

13.动宾搭配不当

14.其他搭配不当