SpringBoot2.3整合HanLP1.7.7

AskHarries2024-01-04 16:35

一、需求场景

提取地址字符串中的区、街乡镇、村的字段，使用获取到的字段向经信局发起请求获取经纬度和点位名称，保存至数据库。

HanLP+SpringBoot

pom引入jar

com.hankcs hanlp 1.7.7 com.hankcs hanlp-sources 1.7.7
自定义词典，设置自定义词典中的词属性为na：

#自定义词典路径，用;隔开多个自定义词典，空格开头表示在同一个目录，使用"文件名词性"形式则表示这个词典的词性默认是该词性。优先级递减。 #所有词典统一使用UTF-8编码，每一行代表一个单词，格式遵从 $单词$ $词性A$ $A的频次$ $词性B$ $B的频次$ ... 如果不填词性则表示采用词典的默认词性。 CustomDictionaryPath=data/dictionary/custom/add_place.txt na; non-place.txt n;
获取属性为na的字符串，拼接到一起：

/**
- 获取ns nt类型字符串，一旦遇到非规定类型直接结束
- ns 地名
- nt 机构团体名 */ public static String\[\] getNaStr(String address) { List termList = StandardTokenizer.segment(address); String word = ""; String hasNa = "false"; for (Term term : termList) { if ("na".equals(term.nature.toString())) { if (word.length() <= term.word.length()) { word = term.word; hasNa = "true"; } } } // 结果第一个是是否找到na属性字符串，第二个是最长的na字符串 return new String\[\]{hasNa, word}; }