基础用法
敏感词过滤实现步骤
引入敏感词过滤库依赖
项目的pom.xml文件中添加以下依赖:
xml
<dependency>
<groupId>com.github.houbb</groupId>
<artifactId>sensitive-word</artifactId>
<version>0.21.0</version>
</dependency>
创建工具类封装核心功能
通过SensitiveWordUtil类提供敏感词检测、替换和查询功能:
java
@Component
public class SensitiveWordUtil {
@Autowired
private SensitiveWordBs sensitiveWordBs;
public void refresh() {
sensitiveWordBs.init();
}
public boolean contains(String text) {
return sensitiveWordBs.contains(text);
}
public String replace(String text) {
return sensitiveWordBs.replace(text);
}
public String replace(String text, char replaceChar) {
return sensitiveWordBs.replace(text, replaceChar);
}
public List<String> findAll(String text) {
return sensitiveWordBs.findAll(text);
}
}
配置敏感词检测策略
在SensitiveConfig中定义检测规则和词库:
java
@Configuration
public class SensitiveConfig {
@Bean
public SensitiveWordBs sensitiveWordBs() {
return SensitiveWordBs.newInstance()
.ignoreCase(true)
.ignoreWidth(true)
.ignoreNumStyle(true)
.ignoreChineseStyle(true)
.ignoreEnglishStyle(true)
.ignoreRepeat(false)
.enableNumCheck(true)
.enableEmailCheck(true)
.enableUrlCheck(true)
.numCheckLen(8)
.wordDeny(WordDenys.chains(WordDenys.defaults()))
.wordAllow(WordAllows.chains(WordAllows.defaults()))
.init();
}
}
动态更新词库方法
支持运行时更新敏感词库:
java
public void updateWordDeny(List<String> newWords) {
IWordDeny wordDeny = () -> newWords;
sensitiveWordBs.getContext().wordDeny(wordDeny);
refresh();
}
实际应用场景示例
在内容审核场景中的典型用法:
java
public void checkContent(String content) {
List<String> sensitiveWords = sensitiveWordUtil.findAll(content);
if (!sensitiveWords.isEmpty()) {
String maskedContent = sensitiveWordUtil.replace(content, '*');
log.warn("敏感词触发: {} -> {}", sensitiveWords, maskedContent);
throw new ContentViolationException("包含敏感词汇");
}
}
高级配置选项说明
可通过以下参数调整检测行为:
ignoreRepeat: 控制是否检测连续重复字符numCheckLen: 设置数字敏感词的最小长度enableUrlCheck: 启用URL链接检测enableEmailCheck: 启用邮箱格式检测
自定义敏感词格式
支持多种词库加载方式:
java
// 从文件加载
IWordDeny fileWordDeny = () -> Files.readAllLines(Paths.get("sensitive_words.txt"));
// 从数据库加载
IWordDeny dbWordDeny = () -> jdbcTemplate.queryForList("SELECT word FROM sensitive_words", String.class);
进阶使用
- 全局aop过滤
- 数据库自定义增删改敏感词
实现敏感词过滤系统
以下是从数据库加载敏感词和白名单,并通过AOP实现敏感词检测的完整实现方案:
数据库模型设计
java
@Data
public class KeyWord {
private Long id;
private String content;
private String sensitiveType; // BLACK/WHITE
}
敏感词加载接口实现
java
@Slf4j
public class MyWordAllow implements IWordAllow {
@Override
public List<String> allow() {
return SpringUtil.getBean(MongoTemplate.class)
.find(new Query(Criteria.where("sensitiveType").is(Constant.SensitiveType.WHITE)),
KeyWord.class)
.stream().map(KeyWord::getContent).collect(Collectors.toList());
}
}
@Slf4j
public class MyWordDeny implements IWordDeny {
@Override
public List<String> deny() {
return SpringUtil.getBean(MongoTemplate.class)
.find(new Query(Criteria.where("sensitiveType").is(Constant.SensitiveType.BLACK)),
KeyWord.class)
.stream().map(KeyWord::getContent).collect(Collectors.toList());
}
}
初始化敏感词数据
java
@SpringBootTest
class SensitiveWordInitTest {
@Autowired
MongoTemplate mongoTemplate;
void initBlackWords() {
List<String> words = StreamUtil.readAllLines("/sensitive_word_dict.txt");
words.addAll(StreamUtil.readAllLines("/sensitive_word_deny.txt"));
mongoTemplate.insertAll(words.stream().map(s -> {
KeyWord kw = new KeyWord();
kw.setContent(s);
kw.setSensitiveType(Constant.SensitiveType.BLACK);
return kw;
}).collect(Collectors.toList()));
}
void initWhiteWords() {
// 自定义白名单
List<String> words = FileUtil.readUtf8Lines(
ResourceUtil.getResource("mySensitiveWordsAllow.txt"));
mongoTemplate.insertAll(words.stream().map(s -> {
KeyWord kw = new KeyWord();
kw.setContent(s);
kw.setSensitiveType(Constant.SensitiveType.WHITE);
return kw;
}).collect(Collectors.toList()));
}
}
敏感词检测配置
java
@Configuration
public class SensitiveConfig {
@Bean
public SensitiveWordBs sensitiveWordBs() {
return SensitiveWordBs.newInstance()
.wordDeny(new MyWordDeny())
.wordAllow(new MyWordAllow())
.ignoreCase(true)
.ignoreWidth(true)
.ignoreNumStyle(true)
.enableNumCheck(true)
.enableUrlCheck(true)
.init();
}
}
AOP切面实现
java
@Aspect
@Component
@Slf4j
public class SensitiveAspect {
@Autowired
private SensitiveWordBs sensitiveWordBs;
@Before("@annotation(sensitive)")
public void checkSensitive(JoinPoint jp, Sensitive sensitive) {
String text = generateKeyBySpEL(sensitive.value(), jp);
List<String> words = sensitiveWordBs.findAll(text);
if (CollUtil.isNotEmpty(words)) {
throw new BusinessException("包含敏感词: " + words);
}
}
public String generateKeyBySpEL(String key, JoinPoint pjp) {
Expression expression = parserSpEL.parseExpression(key);
EvaluationContext context = new StandardEvaluationContext();
MethodSignature methodSignature = (MethodSignature) pjp.getSignature();
Object[] args = pjp.getArgs();
String[] paramNames = parameterNameDiscoverer.getParameterNames(methodSignature.getMethod());
for (int i = 0; i < args.length; i++) {
context.setVariable(Objects.requireNonNull(paramNames)[i], args[i]);
}
return Objects.requireNonNull(expression.getValue(context)).toString();
}
}
自定义注解
java
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Sensitive {
String value() default "";
}
使用示例
java
@RestController
public class TestController {
@Sensitive(question = "#dto.question")
@Operation(summary = "对话")
@PostMapping
public String duihua(@Valid @RequestBody ReqDto dto) {
return "ok";
}
}
敏感词库的维护
- 增删改查
更新敏感词库后,记得刷新
java
sensitiveWordUtil.refresh();
关键点说明
-
数据库存储方案使用MongoDB实现,通过sensitiveType字段区分黑白名单, 可以自行选择其他数据库
-
敏感词检测使用hutool的SensitiveWordBs工具,支持多种过滤策略
-
AOP通过SpEL表达式动态获取需要检测的文本内容
-
初始化时可以从多个文件加载基础敏感词库
-
白名单机制允许特定词汇绕过敏感词检测