验证ElasticSearch 分词的BUG
环境介绍
ElasticSearch
版本号: 6.7.0
BUG 重现
创建测试案例索引
json
PUT test_2022
{
"settings": {
"analysis": {
"filter": {
"pinyin_filter": {
"type": "pinyin"
}
},
"analyzer": {
"custome_standard": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
},
"custome_chinese": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"lowercase",
"asciifolding"
]
},
"custome_chinese_search":{
"type": "custom",
"tokenizer": "ik_smart",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"normalizer": {
"custome_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"index.mapping.coerce": false,
"index.mapping.ignore_malformed":false,
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "long"
},
"workContent": {
"type": "text",
"analyzer": "custome_standard",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "custome_normalizer"
},
"chinese": {
"type": "text",
"analyzer": "custome_chinese",
"search_analyzer":"custome_chinese_search"
}
}
}
}
}
}
}
执行导致异常的BUG命令
json
POST test_2022/_doc/1
{
"id": 1,
"workContent": "<span>0-1冷启动阶段,构建小鹏汽车社群营销体系,通过内容营销、活动营销两只抓手吸引沉淀潜在车主。结合公司重大传播节点,策划事件营销活动占领用户心智,在 G3 未上市之前进行品牌产品背书,提高潜客对产品的信任度。<br> <br>业绩荣誉: <br>1、配合 G3 发布会节奏,策划组织执行小鹏汽车《一路向北》品牌事件营销,通过 6 天 8 人 2553 公里的极客朝圣之旅,为小鹏汽车制造品质背书,铺垫口碑。这是国内新造车 势力的超长途自驾第一测。视频分别于小鹏汽车品牌媒体发布会、G3 发布会何小鹏讲演前及小鹏汽车各城市端展厅播放,获得行业及公司内部好评,相关链接: <br>A、外部视频链接:https://chejiahao.autohome.com.cn/info/2290990 <br>B、内部报道链接:https://mp.weixin.qq.com/s/A2AfghYRJ8Hc2gMMr_C8dQ <br> <br>2、配合 G3 发布会节奏,联合汽车之家资源组织策划《G3 价格竞猜》项目。该项目以极底成本,获得较好的曝光声量及活动参与效果。项目外部论坛总点击量环比 增长 108.9%,总回复数环比增长 97.3% ,成果如下:<br>A、新闻曝光举例:https://www.autohome.com.cn/news/201804/915688.html <br>B、论坛活动地址: https://club.autohome.com.cn/bbs/thread/7fc721404489d922/72434411-1.html <br> <br>3、撰写原创内容作品,抢占汽车之家首页首屏文字链和论坛相关位置,争取免费露出位:<br>A、小鹏汽车 1.0 评测:宝剑锋从磨砺出 https://chejiahao.autohome.com.cn/info/2239269 <br>B、一路向北 小鹏汽车广州-北京自驾游记 https://chejiahao.autohome.com.cn/info/2290598 </span>"
}
执行结果
json
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=750,endOffset=751,lastStartOffset=751 for field 'workContent.chinese'"
}
],
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=750,endOffset=751,lastStartOffset=751 for field 'workContent.chinese'"
},
"status": 400
}
BUG分析
从错误的描述可以得出,应该是分词出现了异常,因此手动用字段的分词配置来对文本进行分词,找出异常的文本内容。
查询字段分词结果,定位异常的文本内容
将待分词的内容用指定字段执行分词处理,获取分词结果
json
POST test_2022/_analyze
{
"field": "workContent.chinese",
"text": "<span>0-1冷启动阶段,构建小鹏汽车社群营销体系,通过内容营销、活动营销两只抓手吸引沉淀潜在车主。结合公司重大传播节点,策划事件营销活动占领用户心智,在 G3 未上市之前进行品牌产品背书,提高潜客对产品的信任度。<br> <br>业绩荣誉: <br>1、配合 G3 发布会节奏,策划组织执行小鹏汽车《一路向北》品牌事件营销,通过 6 天 8 人 2553 公里的极客朝圣之旅,为小鹏汽车制造品质背书,铺垫口碑。这是国内新造车 势力的超长途自驾第一测。视频分别于小鹏汽车品牌媒体发布会、G3 发布会何小鹏讲演前及小鹏汽车各城市端展厅播放,获得行业及公司内部好评,相关链接: <br>A、外部视频链接:https://chejiahao.autohome.com.cn/info/2290990 <br>B、内部报道链接:https://mp.weixin.qq.com/s/A2AfghYRJ8Hc2gMMr_C8dQ <br> <br>2、配合 G3 发布会节奏,联合汽车之家资源组织策划《G3 价格竞猜》项目。该项目以极底成本,获得较好的曝光声量及活动参与效果。项目外部论坛总点击量环比 增长 108.9%,总回复数环比增长 97.3% ,成果如下:<br>A、新闻曝光举例:https://www.autohome.com.cn/news/201804/915688.html <br>B、论坛活动地址: https://club.autohome.com.cn/bbs/thread/7fc721404489d922/72434411-1.html <br> <br>3、撰写原创内容作品,抢占汽车之家首页首屏文字链和论坛相关位置,争取免费露出位:<br>A、小鹏汽车 1.0 评测:宝剑锋从磨砺出 https://chejiahao.autohome.com.cn/info/2239269 <br>B、一路向北 小鹏汽车广州-北京自驾游记 https://chejiahao.autohome.com.cn/info/2290598 </span>"
}
分词的结果为:
json
{
"tokens" : [
{
"token" : "span",
"start_offset" : 1,
"end_offset" : 5,
"type" : "ENGLISH",
"position" : 0
},
{
"token" : "0-1",
"start_offset" : 6,
"end_offset" : 9,
"type" : "LETTER",
"position" : 1
},
{
"token" : "0",
"start_offset" : 6,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 2
},
{
"token" : "1",
"start_offset" : 8,
"end_offset" : 9,
"type" : "ARABIC",
"position" : 3
},
{
"token" : "冷启动",
"start_offset" : 9,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "冷",
"start_offset" : 9,
"end_offset" : 10,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "启动",
"start_offset" : 10,
"end_offset" : 12,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "阶段",
"start_offset" : 12,
"end_offset" : 14,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "构建",
"start_offset" : 15,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 8
},
{
"token" : "小",
"start_offset" : 17,
"end_offset" : 18,
"type" : "CN_CHAR",
"position" : 9
},
{
"token" : "鹏",
"start_offset" : 18,
"end_offset" : 19,
"type" : "CN_CHAR",
"position" : 10
},
{
"token" : "汽车",
"start_offset" : 19,
"end_offset" : 21,
"type" : "CN_WORD",
"position" : 11
},
{
"token" : "社群",
"start_offset" : 21,
"end_offset" : 23,
"type" : "CN_WORD",
"position" : 12
},
{
"token" : "营销",
"start_offset" : 23,
"end_offset" : 25,
"type" : "CN_WORD",
"position" : 13
},
{
"token" : "体系",
"start_offset" : 25,
"end_offset" : 27,
"type" : "CN_WORD",
"position" : 14
},
{
"token" : "通过",
"start_offset" : 28,
"end_offset" : 30,
"type" : "CN_WORD",
"position" : 15
},
{
"token" : "内容",
"start_offset" : 30,
"end_offset" : 32,
"type" : "CN_WORD",
"position" : 16
},
{
"token" : "营销",
"start_offset" : 32,
"end_offset" : 34,
"type" : "CN_WORD",
"position" : 17
},
{
"token" : "活动",
"start_offset" : 35,
"end_offset" : 37,
"type" : "CN_WORD",
"position" : 18
},
{
"token" : "营销",
"start_offset" : 37,
"end_offset" : 39,
"type" : "CN_WORD",
"position" : 19
},
{
"token" : "两只",
"start_offset" : 39,
"end_offset" : 41,
"type" : "CN_WORD",
"position" : 20
},
{
"token" : "两",
"start_offset" : 39,
"end_offset" : 40,
"type" : "COUNT",
"position" : 21
},
{
"token" : "只",
"start_offset" : 40,
"end_offset" : 41,
"type" : "CN_CHAR",
"position" : 22
},
{
"token" : "抓手",
"start_offset" : 41,
"end_offset" : 43,
"type" : "CN_WORD",
"position" : 23
},
{
"token" : "吸引",
"start_offset" : 43,
"end_offset" : 45,
"type" : "CN_WORD",
"position" : 24
},
{
"token" : "沉淀",
"start_offset" : 45,
"end_offset" : 47,
"type" : "CN_WORD",
"position" : 25
},
{
"token" : "潜在",
"start_offset" : 47,
"end_offset" : 49,
"type" : "CN_WORD",
"position" : 26
},
{
"token" : "潜",
"start_offset" : 47,
"end_offset" : 48,
"type" : "CN_CHAR",
"position" : 27
},
{
"token" : "在车",
"start_offset" : 48,
"end_offset" : 50,
"type" : "CN_WORD",
"position" : 28
},
{
"token" : "车主",
"start_offset" : 49,
"end_offset" : 51,
"type" : "CN_WORD",
"position" : 29
},
{
"token" : "结合",
"start_offset" : 52,
"end_offset" : 54,
"type" : "CN_WORD",
"position" : 30
},
{
"token" : "公司",
"start_offset" : 54,
"end_offset" : 56,
"type" : "CN_WORD",
"position" : 31
},
{
"token" : "重大",
"start_offset" : 56,
"end_offset" : 58,
"type" : "CN_WORD",
"position" : 32
},
{
"token" : "传播",
"start_offset" : 58,
"end_offset" : 60,
"type" : "CN_WORD",
"position" : 33
},
{
"token" : "节点",
"start_offset" : 60,
"end_offset" : 62,
"type" : "CN_WORD",
"position" : 34
},
{
"token" : "策划",
"start_offset" : 63,
"end_offset" : 65,
"type" : "CN_WORD",
"position" : 35
},
{
"token" : "事件",
"start_offset" : 65,
"end_offset" : 67,
"type" : "CN_WORD",
"position" : 36
},
{
"token" : "营销",
"start_offset" : 67,
"end_offset" : 69,
"type" : "CN_WORD",
"position" : 37
},
{
"token" : "活动",
"start_offset" : 69,
"end_offset" : 71,
"type" : "CN_WORD",
"position" : 38
},
{
"token" : "占领",
"start_offset" : 71,
"end_offset" : 73,
"type" : "CN_WORD",
"position" : 39
},
{
"token" : "占",
"start_offset" : 71,
"end_offset" : 72,
"type" : "CN_CHAR",
"position" : 40
},
{
"token" : "领用",
"start_offset" : 72,
"end_offset" : 74,
"type" : "CN_WORD",
"position" : 41
},
{
"token" : "用户",
"start_offset" : 73,
"end_offset" : 75,
"type" : "CN_WORD",
"position" : 42
},
{
"token" : "心智",
"start_offset" : 75,
"end_offset" : 77,
"type" : "CN_WORD",
"position" : 43
},
{
"token" : "在",
"start_offset" : 78,
"end_offset" : 79,
"type" : "CN_CHAR",
"position" : 44
},
{
"token" : "g3",
"start_offset" : 80,
"end_offset" : 82,
"type" : "LETTER",
"position" : 45
},
{
"token" : "g",
"start_offset" : 80,
"end_offset" : 81,
"type" : "ENGLISH",
"position" : 46
},
{
"token" : "3",
"start_offset" : 81,
"end_offset" : 82,
"type" : "ARABIC",
"position" : 47
},
{
"token" : "未上市",
"start_offset" : 83,
"end_offset" : 86,
"type" : "CN_WORD",
"position" : 48
},
{
"token" : "未上",
"start_offset" : 83,
"end_offset" : 85,
"type" : "CN_WORD",
"position" : 49
},
{
"token" : "上市",
"start_offset" : 84,
"end_offset" : 86,
"type" : "CN_WORD",
"position" : 50
},
{
"token" : "之前",
"start_offset" : 86,
"end_offset" : 88,
"type" : "CN_WORD",
"position" : 51
},
{
"token" : "之",
"start_offset" : 86,
"end_offset" : 87,
"type" : "CN_CHAR",
"position" : 52
},
{
"token" : "前进",
"start_offset" : 87,
"end_offset" : 89,
"type" : "CN_WORD",
"position" : 53
},
{
"token" : "进行",
"start_offset" : 88,
"end_offset" : 90,
"type" : "CN_WORD",
"position" : 54
},
{
"token" : "品牌",
"start_offset" : 90,
"end_offset" : 92,
"type" : "CN_WORD",
"position" : 55
},
{
"token" : "产品",
"start_offset" : 92,
"end_offset" : 94,
"type" : "CN_WORD",
"position" : 56
},
{
"token" : "背书",
"start_offset" : 94,
"end_offset" : 96,
"type" : "CN_WORD",
"position" : 57
},
{
"token" : "提高",
"start_offset" : 97,
"end_offset" : 99,
"type" : "CN_WORD",
"position" : 58
},
{
"token" : "潜",
"start_offset" : 99,
"end_offset" : 100,
"type" : "CN_CHAR",
"position" : 59
},
{
"token" : "客",
"start_offset" : 100,
"end_offset" : 101,
"type" : "CN_CHAR",
"position" : 60
},
{
"token" : "对",
"start_offset" : 101,
"end_offset" : 102,
"type" : "CN_CHAR",
"position" : 61
},
{
"token" : "产品",
"start_offset" : 102,
"end_offset" : 104,
"type" : "CN_WORD",
"position" : 62
},
{
"token" : "的",
"start_offset" : 104,
"end_offset" : 105,
"type" : "CN_CHAR",
"position" : 63
},
{
"token" : "信任",
"start_offset" : 105,
"end_offset" : 107,
"type" : "CN_WORD",
"position" : 64
},
{
"token" : "度",
"start_offset" : 107,
"end_offset" : 108,
"type" : "CN_CHAR",
"position" : 65
},
{
"token" : "br",
"start_offset" : 110,
"end_offset" : 112,
"type" : "ENGLISH",
"position" : 66
},
{
"token" : "br",
"start_offset" : 115,
"end_offset" : 117,
"type" : "ENGLISH",
"position" : 67
},
{
"token" : "业绩",
"start_offset" : 118,
"end_offset" : 120,
"type" : "CN_WORD",
"position" : 68
},
{
"token" : "荣誉",
"start_offset" : 120,
"end_offset" : 122,
"type" : "CN_WORD",
"position" : 69
},
{
"token" : "br",
"start_offset" : 125,
"end_offset" : 127,
"type" : "ENGLISH",
"position" : 70
},
{
"token" : "1",
"start_offset" : 128,
"end_offset" : 129,
"type" : "ARABIC",
"position" : 71
},
{
"token" : "配合",
"start_offset" : 130,
"end_offset" : 132,
"type" : "CN_WORD",
"position" : 72
},
{
"token" : "g3",
"start_offset" : 133,
"end_offset" : 135,
"type" : "LETTER",
"position" : 73
},
{
"token" : "g",
"start_offset" : 133,
"end_offset" : 134,
"type" : "ENGLISH",
"position" : 74
},
{
"token" : "3",
"start_offset" : 134,
"end_offset" : 135,
"type" : "ARABIC",
"position" : 75
},
{
"token" : "发布会",
"start_offset" : 136,
"end_offset" : 139,
"type" : "CN_WORD",
"position" : 76
},
{
"token" : "发布",
"start_offset" : 136,
"end_offset" : 138,
"type" : "CN_WORD",
"position" : 77
},
{
"token" : "会",
"start_offset" : 138,
"end_offset" : 139,
"type" : "CN_CHAR",
"position" : 78
},
{
"token" : "节奏",
"start_offset" : 139,
"end_offset" : 141,
"type" : "CN_WORD",
"position" : 79
},
{
"token" : "策划",
"start_offset" : 142,
"end_offset" : 144,
"type" : "CN_WORD",
"position" : 80
},
{
"token" : "组织",
"start_offset" : 144,
"end_offset" : 146,
"type" : "CN_WORD",
"position" : 81
},
{
"token" : "执行",
"start_offset" : 146,
"end_offset" : 148,
"type" : "CN_WORD",
"position" : 82
},
{
"token" : "小",
"start_offset" : 148,
"end_offset" : 149,
"type" : "CN_CHAR",
"position" : 83
},
{
"token" : "鹏",
"start_offset" : 149,
"end_offset" : 150,
"type" : "CN_CHAR",
"position" : 84
},
{
"token" : "汽车",
"start_offset" : 150,
"end_offset" : 152,
"type" : "CN_WORD",
"position" : 85
},
{
"token" : "一路",
"start_offset" : 153,
"end_offset" : 155,
"type" : "CN_WORD",
"position" : 86
},
{
"token" : "一",
"start_offset" : 153,
"end_offset" : 154,
"type" : "TYPE_CNUM",
"position" : 87
},
{
"token" : "路向",
"start_offset" : 154,
"end_offset" : 156,
"type" : "CN_WORD",
"position" : 88
},
{
"token" : "路",
"start_offset" : 154,
"end_offset" : 155,
"type" : "COUNT",
"position" : 89
},
{
"token" : "向北",
"start_offset" : 155,
"end_offset" : 157,
"type" : "CN_WORD",
"position" : 90
},
{
"token" : "品牌",
"start_offset" : 158,
"end_offset" : 160,
"type" : "CN_WORD",
"position" : 91
},
{
"token" : "事件",
"start_offset" : 160,
"end_offset" : 162,
"type" : "CN_WORD",
"position" : 92
},
{
"token" : "营销",
"start_offset" : 162,
"end_offset" : 164,
"type" : "CN_WORD",
"position" : 93
},
{
"token" : "通过",
"start_offset" : 165,
"end_offset" : 167,
"type" : "CN_WORD",
"position" : 94
},
{
"token" : "6",
"start_offset" : 168,
"end_offset" : 169,
"type" : "ARABIC",
"position" : 95
},
{
"token" : "天",
"start_offset" : 170,
"end_offset" : 171,
"type" : "CN_CHAR",
"position" : 96
},
{
"token" : "8",
"start_offset" : 172,
"end_offset" : 173,
"type" : "ARABIC",
"position" : 97
},
{
"token" : "人",
"start_offset" : 174,
"end_offset" : 175,
"type" : "CN_CHAR",
"position" : 98
},
{
"token" : "2553",
"start_offset" : 176,
"end_offset" : 180,
"type" : "ARABIC",
"position" : 99
},
{
"token" : "公里",
"start_offset" : 181,
"end_offset" : 183,
"type" : "CN_WORD",
"position" : 100
},
{
"token" : "的",
"start_offset" : 183,
"end_offset" : 184,
"type" : "CN_CHAR",
"position" : 101
},
{
"token" : "极",
"start_offset" : 184,
"end_offset" : 185,
"type" : "CN_CHAR",
"position" : 102
},
{
"token" : "客",
"start_offset" : 185,
"end_offset" : 186,
"type" : "CN_CHAR",
"position" : 103
},
{
"token" : "朝圣",
"start_offset" : 186,
"end_offset" : 188,
"type" : "CN_WORD",
"position" : 104
},
{
"token" : "之旅",
"start_offset" : 188,
"end_offset" : 190,
"type" : "CN_WORD",
"position" : 105
},
{
"token" : "为",
"start_offset" : 191,
"end_offset" : 192,
"type" : "CN_CHAR",
"position" : 106
},
{
"token" : "小",
"start_offset" : 192,
"end_offset" : 193,
"type" : "CN_CHAR",
"position" : 107
},
{
"token" : "鹏",
"start_offset" : 193,
"end_offset" : 194,
"type" : "CN_CHAR",
"position" : 108
},
{
"token" : "汽车",
"start_offset" : 194,
"end_offset" : 196,
"type" : "CN_WORD",
"position" : 109
},
{
"token" : "制造品",
"start_offset" : 196,
"end_offset" : 199,
"type" : "CN_WORD",
"position" : 110
},
{
"token" : "制造",
"start_offset" : 196,
"end_offset" : 198,
"type" : "CN_WORD",
"position" : 111
},
{
"token" : "品质",
"start_offset" : 198,
"end_offset" : 200,
"type" : "CN_WORD",
"position" : 112
},
{
"token" : "背书",
"start_offset" : 200,
"end_offset" : 202,
"type" : "CN_WORD",
"position" : 113
},
{
"token" : "铺垫",
"start_offset" : 203,
"end_offset" : 205,
"type" : "CN_WORD",
"position" : 114
},
{
"token" : "口碑",
"start_offset" : 205,
"end_offset" : 207,
"type" : "CN_WORD",
"position" : 115
},
{
"token" : "这是",
"start_offset" : 208,
"end_offset" : 210,
"type" : "CN_WORD",
"position" : 116
},
{
"token" : "国内",
"start_offset" : 210,
"end_offset" : 212,
"type" : "CN_WORD",
"position" : 117
},
{
"token" : "新造",
"start_offset" : 212,
"end_offset" : 214,
"type" : "CN_WORD",
"position" : 118
},
{
"token" : "车",
"start_offset" : 214,
"end_offset" : 215,
"type" : "CN_CHAR",
"position" : 119
},
{
"token" : "势力",
"start_offset" : 216,
"end_offset" : 218,
"type" : "CN_WORD",
"position" : 120
},
{
"token" : "的",
"start_offset" : 218,
"end_offset" : 219,
"type" : "CN_CHAR",
"position" : 121
},
{
"token" : "超长",
"start_offset" : 219,
"end_offset" : 221,
"type" : "CN_WORD",
"position" : 122
},
{
"token" : "超",
"start_offset" : 219,
"end_offset" : 220,
"type" : "CN_CHAR",
"position" : 123
},
{
"token" : "长途",
"start_offset" : 220,
"end_offset" : 222,
"type" : "CN_WORD",
"position" : 124
},
{
"token" : "自驾",
"start_offset" : 222,
"end_offset" : 224,
"type" : "CN_WORD",
"position" : 125
},
{
"token" : "第一",
"start_offset" : 224,
"end_offset" : 226,
"type" : "CN_WORD",
"position" : 126
},
{
"token" : "第",
"start_offset" : 224,
"end_offset" : 225,
"type" : "CN_CHAR",
"position" : 127
},
{
"token" : "一测",
"start_offset" : 225,
"end_offset" : 227,
"type" : "CN_WORD",
"position" : 128
},
{
"token" : "一",
"start_offset" : 225,
"end_offset" : 226,
"type" : "TYPE_CNUM",
"position" : 129
},
{
"token" : "测",
"start_offset" : 226,
"end_offset" : 227,
"type" : "CN_CHAR",
"position" : 130
},
{
"token" : "视频",
"start_offset" : 228,
"end_offset" : 230,
"type" : "CN_WORD",
"position" : 131
},
{
"token" : "分别",
"start_offset" : 230,
"end_offset" : 232,
"type" : "CN_WORD",
"position" : 132
},
{
"token" : "于",
"start_offset" : 232,
"end_offset" : 233,
"type" : "CN_CHAR",
"position" : 133
},
{
"token" : "小",
"start_offset" : 233,
"end_offset" : 234,
"type" : "CN_CHAR",
"position" : 134
},
{
"token" : "鹏",
"start_offset" : 234,
"end_offset" : 235,
"type" : "CN_CHAR",
"position" : 135
},
{
"token" : "汽车品牌",
"start_offset" : 235,
"end_offset" : 239,
"type" : "CN_WORD",
"position" : 136
},
{
"token" : "汽车",
"start_offset" : 235,
"end_offset" : 237,
"type" : "CN_WORD",
"position" : 137
},
{
"token" : "品牌",
"start_offset" : 237,
"end_offset" : 239,
"type" : "CN_WORD",
"position" : 138
},
{
"token" : "媒体",
"start_offset" : 239,
"end_offset" : 241,
"type" : "CN_WORD",
"position" : 139
},
{
"token" : "发布会",
"start_offset" : 241,
"end_offset" : 244,
"type" : "CN_WORD",
"position" : 140
},
{
"token" : "发布",
"start_offset" : 241,
"end_offset" : 243,
"type" : "CN_WORD",
"position" : 141
},
{
"token" : "会",
"start_offset" : 243,
"end_offset" : 244,
"type" : "CN_CHAR",
"position" : 142
},
{
"token" : "g3",
"start_offset" : 245,
"end_offset" : 247,
"type" : "LETTER",
"position" : 143
},
{
"token" : "g",
"start_offset" : 245,
"end_offset" : 246,
"type" : "ENGLISH",
"position" : 144
},
{
"token" : "3",
"start_offset" : 246,
"end_offset" : 247,
"type" : "ARABIC",
"position" : 145
},
{
"token" : "发布会",
"start_offset" : 248,
"end_offset" : 251,
"type" : "CN_WORD",
"position" : 146
},
{
"token" : "发布",
"start_offset" : 248,
"end_offset" : 250,
"type" : "CN_WORD",
"position" : 147
},
{
"token" : "会",
"start_offset" : 250,
"end_offset" : 251,
"type" : "CN_CHAR",
"position" : 148
},
{
"token" : "何",
"start_offset" : 251,
"end_offset" : 252,
"type" : "CN_CHAR",
"position" : 149
},
{
"token" : "小",
"start_offset" : 252,
"end_offset" : 253,
"type" : "CN_CHAR",
"position" : 150
},
{
"token" : "鹏",
"start_offset" : 253,
"end_offset" : 254,
"type" : "CN_CHAR",
"position" : 151
},
{
"token" : "讲演",
"start_offset" : 254,
"end_offset" : 256,
"type" : "CN_WORD",
"position" : 152
},
{
"token" : "前",
"start_offset" : 256,
"end_offset" : 257,
"type" : "CN_CHAR",
"position" : 153
},
{
"token" : "及",
"start_offset" : 257,
"end_offset" : 258,
"type" : "CN_CHAR",
"position" : 154
},
{
"token" : "小",
"start_offset" : 258,
"end_offset" : 259,
"type" : "CN_CHAR",
"position" : 155
},
{
"token" : "鹏",
"start_offset" : 259,
"end_offset" : 260,
"type" : "CN_CHAR",
"position" : 156
},
{
"token" : "汽车",
"start_offset" : 260,
"end_offset" : 262,
"type" : "CN_WORD",
"position" : 157
},
{
"token" : "各城市",
"start_offset" : 262,
"end_offset" : 265,
"type" : "CN_WORD",
"position" : 158
},
{
"token" : "各城",
"start_offset" : 262,
"end_offset" : 264,
"type" : "CN_WORD",
"position" : 159
},
{
"token" : "城市",
"start_offset" : 263,
"end_offset" : 265,
"type" : "CN_WORD",
"position" : 160
},
{
"token" : "端",
"start_offset" : 265,
"end_offset" : 266,
"type" : "CN_CHAR",
"position" : 161
},
{
"token" : "展厅",
"start_offset" : 266,
"end_offset" : 268,
"type" : "CN_WORD",
"position" : 162
},
{
"token" : "播放",
"start_offset" : 268,
"end_offset" : 270,
"type" : "CN_WORD",
"position" : 163
},
{
"token" : "获得",
"start_offset" : 271,
"end_offset" : 273,
"type" : "CN_WORD",
"position" : 164
},
{
"token" : "行业",
"start_offset" : 273,
"end_offset" : 275,
"type" : "CN_WORD",
"position" : 165
},
{
"token" : "及",
"start_offset" : 275,
"end_offset" : 276,
"type" : "CN_CHAR",
"position" : 166
},
{
"token" : "公司内部",
"start_offset" : 276,
"end_offset" : 280,
"type" : "CN_WORD",
"position" : 167
},
{
"token" : "公司",
"start_offset" : 276,
"end_offset" : 278,
"type" : "CN_WORD",
"position" : 168
},
{
"token" : "内部",
"start_offset" : 278,
"end_offset" : 280,
"type" : "CN_WORD",
"position" : 169
},
{
"token" : "好评",
"start_offset" : 280,
"end_offset" : 282,
"type" : "CN_WORD",
"position" : 170
},
{
"token" : "相关",
"start_offset" : 283,
"end_offset" : 285,
"type" : "CN_WORD",
"position" : 171
},
{
"token" : "链接",
"start_offset" : 285,
"end_offset" : 287,
"type" : "CN_WORD",
"position" : 172
},
{
"token" : "br",
"start_offset" : 290,
"end_offset" : 292,
"type" : "ENGLISH",
"position" : 173
},
{
"token" : "外部",
"start_offset" : 295,
"end_offset" : 297,
"type" : "CN_WORD",
"position" : 174
},
{
"token" : "视频",
"start_offset" : 297,
"end_offset" : 299,
"type" : "CN_WORD",
"position" : 175
},
{
"token" : "链接",
"start_offset" : 299,
"end_offset" : 301,
"type" : "CN_WORD",
"position" : 176
},
{
"token" : "https",
"start_offset" : 302,
"end_offset" : 307,
"type" : "ENGLISH",
"position" : 177
},
{
"token" : "chejiahao.autohome.com.cn",
"start_offset" : 310,
"end_offset" : 335,
"type" : "LETTER",
"position" : 178
},
{
"token" : "chejiahao",
"start_offset" : 310,
"end_offset" : 319,
"type" : "ENGLISH",
"position" : 179
},
{
"token" : "autohome",
"start_offset" : 320,
"end_offset" : 328,
"type" : "ENGLISH",
"position" : 180
},
{
"token" : "com",
"start_offset" : 329,
"end_offset" : 332,
"type" : "ENGLISH",
"position" : 181
},
{
"token" : "cn",
"start_offset" : 333,
"end_offset" : 335,
"type" : "ENGLISH",
"position" : 182
},
{
"token" : "info",
"start_offset" : 336,
"end_offset" : 340,
"type" : "ENGLISH",
"position" : 183
},
{
"token" : "2290990",
"start_offset" : 341,
"end_offset" : 348,
"type" : "ARABIC",
"position" : 184
},
{
"token" : "br",
"start_offset" : 350,
"end_offset" : 352,
"type" : "ENGLISH",
"position" : 185
},
{
"token" : "b",
"start_offset" : 353,
"end_offset" : 354,
"type" : "ENGLISH",
"position" : 186
},
{
"token" : "内部",
"start_offset" : 355,
"end_offset" : 357,
"type" : "CN_WORD",
"position" : 187
},
{
"token" : "报道",
"start_offset" : 357,
"end_offset" : 359,
"type" : "CN_WORD",
"position" : 188
},
{
"token" : "链接",
"start_offset" : 359,
"end_offset" : 361,
"type" : "CN_WORD",
"position" : 189
},
{
"token" : "https",
"start_offset" : 362,
"end_offset" : 367,
"type" : "ENGLISH",
"position" : 190
},
{
"token" : "mp.weixin.qq.com",
"start_offset" : 370,
"end_offset" : 386,
"type" : "LETTER",
"position" : 191
},
{
"token" : "mp",
"start_offset" : 370,
"end_offset" : 372,
"type" : "ENGLISH",
"position" : 192
},
{
"token" : "weixin",
"start_offset" : 373,
"end_offset" : 379,
"type" : "ENGLISH",
"position" : 193
},
{
"token" : "qq",
"start_offset" : 380,
"end_offset" : 382,
"type" : "ENGLISH",
"position" : 194
},
{
"token" : "com",
"start_offset" : 383,
"end_offset" : 386,
"type" : "ENGLISH",
"position" : 195
},
{
"token" : "s",
"start_offset" : 387,
"end_offset" : 388,
"type" : "ENGLISH",
"position" : 196
},
{
"token" : "a2afghyrj8hc2gmmr_c8dq",
"start_offset" : 389,
"end_offset" : 411,
"type" : "LETTER",
"position" : 197
},
{
"token" : "2",
"start_offset" : 390,
"end_offset" : 391,
"type" : "ARABIC",
"position" : 198
},
{
"token" : "afghyrj",
"start_offset" : 391,
"end_offset" : 398,
"type" : "ENGLISH",
"position" : 199
},
{
"token" : "8",
"start_offset" : 398,
"end_offset" : 399,
"type" : "ARABIC",
"position" : 200
},
{
"token" : "hc",
"start_offset" : 399,
"end_offset" : 401,
"type" : "ENGLISH",
"position" : 201
},
{
"token" : "2",
"start_offset" : 401,
"end_offset" : 402,
"type" : "ARABIC",
"position" : 202
},
{
"token" : "gmmr",
"start_offset" : 402,
"end_offset" : 406,
"type" : "ENGLISH",
"position" : 203
},
{
"token" : "c",
"start_offset" : 407,
"end_offset" : 408,
"type" : "ENGLISH",
"position" : 204
},
{
"token" : "8",
"start_offset" : 408,
"end_offset" : 409,
"type" : "ARABIC",
"position" : 205
},
{
"token" : "dq",
"start_offset" : 409,
"end_offset" : 411,
"type" : "ENGLISH",
"position" : 206
},
{
"token" : "br",
"start_offset" : 413,
"end_offset" : 415,
"type" : "ENGLISH",
"position" : 207
},
{
"token" : "br",
"start_offset" : 418,
"end_offset" : 420,
"type" : "ENGLISH",
"position" : 208
},
{
"token" : "2",
"start_offset" : 421,
"end_offset" : 422,
"type" : "ARABIC",
"position" : 209
},
{
"token" : "配合",
"start_offset" : 423,
"end_offset" : 425,
"type" : "CN_WORD",
"position" : 210
},
{
"token" : "g3",
"start_offset" : 426,
"end_offset" : 428,
"type" : "LETTER",
"position" : 211
},
{
"token" : "g",
"start_offset" : 426,
"end_offset" : 427,
"type" : "ENGLISH",
"position" : 212
},
{
"token" : "3",
"start_offset" : 427,
"end_offset" : 428,
"type" : "ARABIC",
"position" : 213
},
{
"token" : "发布会",
"start_offset" : 429,
"end_offset" : 432,
"type" : "CN_WORD",
"position" : 214
},
{
"token" : "发布",
"start_offset" : 429,
"end_offset" : 431,
"type" : "CN_WORD",
"position" : 215
},
{
"token" : "会",
"start_offset" : 431,
"end_offset" : 432,
"type" : "CN_CHAR",
"position" : 216
},
{
"token" : "节奏",
"start_offset" : 432,
"end_offset" : 434,
"type" : "CN_WORD",
"position" : 217
},
{
"token" : "联合",
"start_offset" : 435,
"end_offset" : 437,
"type" : "CN_WORD",
"position" : 218
},
{
"token" : "汽车",
"start_offset" : 437,
"end_offset" : 439,
"type" : "CN_WORD",
"position" : 219
},
{
"token" : "之家",
"start_offset" : 439,
"end_offset" : 441,
"type" : "CN_WORD",
"position" : 220
},
{
"token" : "之",
"start_offset" : 439,
"end_offset" : 440,
"type" : "CN_CHAR",
"position" : 221
},
{
"token" : "家资",
"start_offset" : 440,
"end_offset" : 442,
"type" : "CN_WORD",
"position" : 222
},
{
"token" : "资源",
"start_offset" : 441,
"end_offset" : 443,
"type" : "CN_WORD",
"position" : 223
},
{
"token" : "组织",
"start_offset" : 443,
"end_offset" : 445,
"type" : "CN_WORD",
"position" : 224
},
{
"token" : "策划",
"start_offset" : 445,
"end_offset" : 447,
"type" : "CN_WORD",
"position" : 225
},
{
"token" : "g3",
"start_offset" : 448,
"end_offset" : 450,
"type" : "LETTER",
"position" : 226
},
{
"token" : "g",
"start_offset" : 448,
"end_offset" : 449,
"type" : "ENGLISH",
"position" : 227
},
{
"token" : "3",
"start_offset" : 449,
"end_offset" : 450,
"type" : "ARABIC",
"position" : 228
},
{
"token" : "价格",
"start_offset" : 451,
"end_offset" : 453,
"type" : "CN_WORD",
"position" : 229
},
{
"token" : "竞猜",
"start_offset" : 453,
"end_offset" : 455,
"type" : "CN_WORD",
"position" : 230
},
{
"token" : "项目",
"start_offset" : 456,
"end_offset" : 458,
"type" : "CN_WORD",
"position" : 231
},
{
"token" : "该项",
"start_offset" : 459,
"end_offset" : 461,
"type" : "CN_WORD",
"position" : 232
},
{
"token" : "该",
"start_offset" : 459,
"end_offset" : 460,
"type" : "CN_CHAR",
"position" : 233
},
{
"token" : "项目",
"start_offset" : 460,
"end_offset" : 462,
"type" : "CN_WORD",
"position" : 234
},
{
"token" : "以",
"start_offset" : 462,
"end_offset" : 463,
"type" : "CN_CHAR",
"position" : 235
},
{
"token" : "极",
"start_offset" : 463,
"end_offset" : 464,
"type" : "CN_CHAR",
"position" : 236
},
{
"token" : "底",
"start_offset" : 464,
"end_offset" : 465,
"type" : "CN_CHAR",
"position" : 237
},
{
"token" : "成本",
"start_offset" : 465,
"end_offset" : 467,
"type" : "CN_WORD",
"position" : 238
},
{
"token" : "获得",
"start_offset" : 468,
"end_offset" : 470,
"type" : "CN_WORD",
"position" : 239
},
{
"token" : "较好",
"start_offset" : 470,
"end_offset" : 472,
"type" : "CN_WORD",
"position" : 240
},
{
"token" : "的",
"start_offset" : 472,
"end_offset" : 473,
"type" : "CN_CHAR",
"position" : 241
},
{
"token" : "曝光",
"start_offset" : 473,
"end_offset" : 475,
"type" : "CN_WORD",
"position" : 242
},
{
"token" : "声",
"start_offset" : 475,
"end_offset" : 476,
"type" : "CN_CHAR",
"position" : 243
},
{
"token" : "量",
"start_offset" : 476,
"end_offset" : 477,
"type" : "CN_CHAR",
"position" : 244
},
{
"token" : "及",
"start_offset" : 477,
"end_offset" : 478,
"type" : "CN_CHAR",
"position" : 245
},
{
"token" : "活动",
"start_offset" : 478,
"end_offset" : 480,
"type" : "CN_WORD",
"position" : 246
},
{
"token" : "参与",
"start_offset" : 480,
"end_offset" : 482,
"type" : "CN_WORD",
"position" : 247
},
{
"token" : "效果",
"start_offset" : 482,
"end_offset" : 484,
"type" : "CN_WORD",
"position" : 248
},
{
"token" : "项目",
"start_offset" : 485,
"end_offset" : 487,
"type" : "CN_WORD",
"position" : 249
},
{
"token" : "外部",
"start_offset" : 487,
"end_offset" : 489,
"type" : "CN_WORD",
"position" : 250
},
{
"token" : "论坛",
"start_offset" : 489,
"end_offset" : 491,
"type" : "CN_WORD",
"position" : 251
},
{
"token" : "总",
"start_offset" : 491,
"end_offset" : 492,
"type" : "CN_CHAR",
"position" : 252
},
{
"token" : "点击",
"start_offset" : 492,
"end_offset" : 494,
"type" : "CN_WORD",
"position" : 253
},
{
"token" : "量",
"start_offset" : 494,
"end_offset" : 495,
"type" : "CN_CHAR",
"position" : 254
},
{
"token" : "环",
"start_offset" : 495,
"end_offset" : 496,
"type" : "CN_CHAR",
"position" : 255
},
{
"token" : "比",
"start_offset" : 496,
"end_offset" : 497,
"type" : "CN_CHAR",
"position" : 256
},
{
"token" : "增长",
"start_offset" : 498,
"end_offset" : 500,
"type" : "CN_WORD",
"position" : 257
},
{
"token" : "108.9",
"start_offset" : 501,
"end_offset" : 506,
"type" : "ARABIC",
"position" : 258
},
{
"token" : "总",
"start_offset" : 508,
"end_offset" : 509,
"type" : "CN_CHAR",
"position" : 259
},
{
"token" : "回复数",
"start_offset" : 509,
"end_offset" : 512,
"type" : "CN_WORD",
"position" : 260
},
{
"token" : "回复",
"start_offset" : 509,
"end_offset" : 511,
"type" : "CN_WORD",
"position" : 261
},
{
"token" : "复数",
"start_offset" : 510,
"end_offset" : 512,
"type" : "CN_WORD",
"position" : 262
},
{
"token" : "环",
"start_offset" : 512,
"end_offset" : 513,
"type" : "CN_CHAR",
"position" : 263
},
{
"token" : "比增",
"start_offset" : 513,
"end_offset" : 515,
"type" : "CN_WORD",
"position" : 264
},
{
"token" : "比",
"start_offset" : 513,
"end_offset" : 514,
"type" : "CN_CHAR",
"position" : 265
},
{
"token" : "增长",
"start_offset" : 514,
"end_offset" : 516,
"type" : "CN_WORD",
"position" : 266
},
{
"token" : "97.3",
"start_offset" : 517,
"end_offset" : 521,
"type" : "ARABIC",
"position" : 267
},
{
"token" : "成果",
"start_offset" : 524,
"end_offset" : 526,
"type" : "CN_WORD",
"position" : 268
},
{
"token" : "成",
"start_offset" : 524,
"end_offset" : 525,
"type" : "CN_CHAR",
"position" : 269
},
{
"token" : "果如",
"start_offset" : 525,
"end_offset" : 527,
"type" : "CN_WORD",
"position" : 270
},
{
"token" : "如下",
"start_offset" : 526,
"end_offset" : 528,
"type" : "CN_WORD",
"position" : 271
},
{
"token" : "br",
"start_offset" : 530,
"end_offset" : 532,
"type" : "ENGLISH",
"position" : 272
},
{
"token" : "新闻",
"start_offset" : 535,
"end_offset" : 537,
"type" : "CN_WORD",
"position" : 273
},
{
"token" : "曝光",
"start_offset" : 537,
"end_offset" : 539,
"type" : "CN_WORD",
"position" : 274
},
{
"token" : "举例",
"start_offset" : 539,
"end_offset" : 541,
"type" : "CN_WORD",
"position" : 275
},
{
"token" : "https",
"start_offset" : 542,
"end_offset" : 547,
"type" : "ENGLISH",
"position" : 276
},
{
"token" : "www.autohome.com.cn",
"start_offset" : 550,
"end_offset" : 569,
"type" : "LETTER",
"position" : 277
},
{
"token" : "www",
"start_offset" : 550,
"end_offset" : 553,
"type" : "ENGLISH",
"position" : 278
},
{
"token" : "autohome",
"start_offset" : 554,
"end_offset" : 562,
"type" : "ENGLISH",
"position" : 279
},
{
"token" : "com",
"start_offset" : 563,
"end_offset" : 566,
"type" : "ENGLISH",
"position" : 280
},
{
"token" : "cn",
"start_offset" : 567,
"end_offset" : 569,
"type" : "ENGLISH",
"position" : 281
},
{
"token" : "news",
"start_offset" : 570,
"end_offset" : 574,
"type" : "ENGLISH",
"position" : 282
},
{
"token" : "201804",
"start_offset" : 575,
"end_offset" : 581,
"type" : "ARABIC",
"position" : 283
},
{
"token" : "915688.html",
"start_offset" : 582,
"end_offset" : 593,
"type" : "LETTER",
"position" : 284
},
{
"token" : "915688",
"start_offset" : 582,
"end_offset" : 588,
"type" : "ARABIC",
"position" : 285
},
{
"token" : "html",
"start_offset" : 589,
"end_offset" : 593,
"type" : "ENGLISH",
"position" : 286
},
{
"token" : "br",
"start_offset" : 595,
"end_offset" : 597,
"type" : "ENGLISH",
"position" : 287
},
{
"token" : "b",
"start_offset" : 598,
"end_offset" : 599,
"type" : "ENGLISH",
"position" : 288
},
{
"token" : "论坛",
"start_offset" : 600,
"end_offset" : 602,
"type" : "CN_WORD",
"position" : 289
},
{
"token" : "活动",
"start_offset" : 602,
"end_offset" : 604,
"type" : "CN_WORD",
"position" : 290
},
{
"token" : "活",
"start_offset" : 602,
"end_offset" : 603,
"type" : "CN_CHAR",
"position" : 291
},
{
"token" : "动地",
"start_offset" : 603,
"end_offset" : 605,
"type" : "CN_WORD",
"position" : 292
},
{
"token" : "地址",
"start_offset" : 604,
"end_offset" : 606,
"type" : "CN_WORD",
"position" : 293
},
{
"token" : "https",
"start_offset" : 608,
"end_offset" : 613,
"type" : "ENGLISH",
"position" : 294
},
{
"token" : "club.autohome.com.cn",
"start_offset" : 616,
"end_offset" : 636,
"type" : "LETTER",
"position" : 295
},
{
"token" : "club",
"start_offset" : 616,
"end_offset" : 620,
"type" : "ENGLISH",
"position" : 296
},
{
"token" : "autohome",
"start_offset" : 621,
"end_offset" : 629,
"type" : "ENGLISH",
"position" : 297
},
{
"token" : "com",
"start_offset" : 630,
"end_offset" : 633,
"type" : "ENGLISH",
"position" : 298
},
{
"token" : "cn",
"start_offset" : 634,
"end_offset" : 636,
"type" : "ENGLISH",
"position" : 299
},
{
"token" : "bbs",
"start_offset" : 637,
"end_offset" : 640,
"type" : "ENGLISH",
"position" : 300
},
{
"token" : "thread",
"start_offset" : 641,
"end_offset" : 647,
"type" : "ENGLISH",
"position" : 301
},
{
"token" : "7fc721404489d922",
"start_offset" : 648,
"end_offset" : 664,
"type" : "LETTER",
"position" : 302
},
{
"token" : "7",
"start_offset" : 648,
"end_offset" : 649,
"type" : "ARABIC",
"position" : 303
},
{
"token" : "fc",
"start_offset" : 649,
"end_offset" : 651,
"type" : "ENGLISH",
"position" : 304
},
{
"token" : "721404489",
"start_offset" : 651,
"end_offset" : 660,
"type" : "ARABIC",
"position" : 305
},
{
"token" : "d",
"start_offset" : 660,
"end_offset" : 661,
"type" : "ENGLISH",
"position" : 306
},
{
"token" : "922",
"start_offset" : 661,
"end_offset" : 664,
"type" : "ARABIC",
"position" : 307
},
{
"token" : "72434411-1.html",
"start_offset" : 665,
"end_offset" : 680,
"type" : "LETTER",
"position" : 308
},
{
"token" : "72434411",
"start_offset" : 665,
"end_offset" : 673,
"type" : "ARABIC",
"position" : 309
},
{
"token" : "1",
"start_offset" : 674,
"end_offset" : 675,
"type" : "ARABIC",
"position" : 310
},
{
"token" : "html",
"start_offset" : 676,
"end_offset" : 680,
"type" : "ENGLISH",
"position" : 311
},
{
"token" : "br",
"start_offset" : 682,
"end_offset" : 684,
"type" : "ENGLISH",
"position" : 312
},
{
"token" : "br",
"start_offset" : 687,
"end_offset" : 689,
"type" : "ENGLISH",
"position" : 313
},
{
"token" : "3",
"start_offset" : 690,
"end_offset" : 691,
"type" : "ARABIC",
"position" : 314
},
{
"token" : "撰写",
"start_offset" : 692,
"end_offset" : 694,
"type" : "CN_WORD",
"position" : 315
},
{
"token" : "原创",
"start_offset" : 694,
"end_offset" : 696,
"type" : "CN_WORD",
"position" : 316
},
{
"token" : "内容",
"start_offset" : 696,
"end_offset" : 698,
"type" : "CN_WORD",
"position" : 317
},
{
"token" : "作品",
"start_offset" : 698,
"end_offset" : 700,
"type" : "CN_WORD",
"position" : 318
},
{
"token" : "抢占",
"start_offset" : 701,
"end_offset" : 703,
"type" : "CN_WORD",
"position" : 319
},
{
"token" : "汽车",
"start_offset" : 703,
"end_offset" : 705,
"type" : "CN_WORD",
"position" : 320
},
{
"token" : "之家",
"start_offset" : 705,
"end_offset" : 707,
"type" : "CN_WORD",
"position" : 321
},
{
"token" : "首页",
"start_offset" : 707,
"end_offset" : 709,
"type" : "CN_WORD",
"position" : 322
},
{
"token" : "首",
"start_offset" : 709,
"end_offset" : 710,
"type" : "CN_CHAR",
"position" : 323
},
{
"token" : "屏",
"start_offset" : 710,
"end_offset" : 711,
"type" : "CN_CHAR",
"position" : 324
},
{
"token" : "文字",
"start_offset" : 711,
"end_offset" : 713,
"type" : "CN_WORD",
"position" : 325
},
{
"token" : "链",
"start_offset" : 713,
"end_offset" : 714,
"type" : "CN_CHAR",
"position" : 326
},
{
"token" : "和",
"start_offset" : 714,
"end_offset" : 715,
"type" : "CN_CHAR",
"position" : 327
},
{
"token" : "论坛",
"start_offset" : 715,
"end_offset" : 717,
"type" : "CN_WORD",
"position" : 328
},
{
"token" : "相关",
"start_offset" : 717,
"end_offset" : 719,
"type" : "CN_WORD",
"position" : 329
},
{
"token" : "位置",
"start_offset" : 719,
"end_offset" : 721,
"type" : "CN_WORD",
"position" : 330
},
{
"token" : "争取",
"start_offset" : 722,
"end_offset" : 724,
"type" : "CN_WORD",
"position" : 331
},
{
"token" : "免费",
"start_offset" : 724,
"end_offset" : 726,
"type" : "CN_WORD",
"position" : 332
},
{
"token" : "露出",
"start_offset" : 726,
"end_offset" : 728,
"type" : "CN_WORD",
"position" : 333
},
{
"token" : "露",
"start_offset" : 726,
"end_offset" : 727,
"type" : "CN_CHAR",
"position" : 334
},
{
"token" : "出位",
"start_offset" : 727,
"end_offset" : 729,
"type" : "CN_WORD",
"position" : 335
},
{
"token" : "br",
"start_offset" : 731,
"end_offset" : 733,
"type" : "ENGLISH",
"position" : 336
},
{
"token" : "小",
"start_offset" : 736,
"end_offset" : 737,
"type" : "CN_CHAR",
"position" : 337
},
{
"token" : "鹏",
"start_offset" : 737,
"end_offset" : 738,
"type" : "CN_CHAR",
"position" : 338
},
{
"token" : "汽车",
"start_offset" : 738,
"end_offset" : 740,
"type" : "CN_WORD",
"position" : 339
},
{
"token" : "1.0",
"start_offset" : 741,
"end_offset" : 744,
"type" : "ARABIC",
"position" : 340
},
{
"token" : "评测",
"start_offset" : 745,
"end_offset" : 747,
"type" : "CN_WORD",
"position" : 341
},
{
"token" : "宝剑锋从磨砺出",
"start_offset" : 748,
"end_offset" : 755,
"type" : "CN_WORD",
"position" : 342
},
{
"token" : "宝剑锋",
"start_offset" : 748,
"end_offset" : 751,
"type" : "CN_WORD",
"position" : 343
},
{
"token" : "宝剑",
"start_offset" : 748,
"end_offset" : 750,
"type" : "CN_WORD",
"position" : 344
},
{
"token" : "从",
"start_offset" : 751,
"end_offset" : 752,
"type" : "CN_CHAR",
"position" : 345
},
{
"token" : "锋",
"start_offset" : 750,
"end_offset" : 751,
"type" : "CN_CHAR",
"position" : 346
},
{
"token" : "从",
"start_offset" : 751,
"end_offset" : 752,
"type" : "CN_CHAR",
"position" : 347
},
{
"token" : "磨砺",
"start_offset" : 752,
"end_offset" : 754,
"type" : "CN_WORD",
"position" : 348
},
{
"token" : "出",
"start_offset" : 754,
"end_offset" : 755,
"type" : "CN_CHAR",
"position" : 349
},
{
"token" : "https",
"start_offset" : 756,
"end_offset" : 761,
"type" : "ENGLISH",
"position" : 350
},
{
"token" : "chejiahao.autohome.com.cn",
"start_offset" : 764,
"end_offset" : 789,
"type" : "LETTER",
"position" : 351
},
{
"token" : "chejiahao",
"start_offset" : 764,
"end_offset" : 773,
"type" : "ENGLISH",
"position" : 352
},
{
"token" : "autohome",
"start_offset" : 774,
"end_offset" : 782,
"type" : "ENGLISH",
"position" : 353
},
{
"token" : "com",
"start_offset" : 783,
"end_offset" : 786,
"type" : "ENGLISH",
"position" : 354
},
{
"token" : "cn",
"start_offset" : 787,
"end_offset" : 789,
"type" : "ENGLISH",
"position" : 355
},
{
"token" : "info",
"start_offset" : 790,
"end_offset" : 794,
"type" : "ENGLISH",
"position" : 356
},
{
"token" : "2239269",
"start_offset" : 795,
"end_offset" : 802,
"type" : "ARABIC",
"position" : 357
},
{
"token" : "br",
"start_offset" : 804,
"end_offset" : 806,
"type" : "ENGLISH",
"position" : 358
},
{
"token" : "b",
"start_offset" : 807,
"end_offset" : 808,
"type" : "ENGLISH",
"position" : 359
},
{
"token" : "一路",
"start_offset" : 809,
"end_offset" : 811,
"type" : "CN_WORD",
"position" : 360
},
{
"token" : "一",
"start_offset" : 809,
"end_offset" : 810,
"type" : "TYPE_CNUM",
"position" : 361
},
{
"token" : "路向",
"start_offset" : 810,
"end_offset" : 812,
"type" : "CN_WORD",
"position" : 362
},
{
"token" : "路",
"start_offset" : 810,
"end_offset" : 811,
"type" : "COUNT",
"position" : 363
},
{
"token" : "向北",
"start_offset" : 811,
"end_offset" : 813,
"type" : "CN_WORD",
"position" : 364
},
{
"token" : "小",
"start_offset" : 814,
"end_offset" : 815,
"type" : "CN_CHAR",
"position" : 365
},
{
"token" : "鹏",
"start_offset" : 815,
"end_offset" : 816,
"type" : "CN_CHAR",
"position" : 366
},
{
"token" : "汽车",
"start_offset" : 816,
"end_offset" : 818,
"type" : "CN_WORD",
"position" : 367
},
{
"token" : "广州",
"start_offset" : 818,
"end_offset" : 820,
"type" : "CN_WORD",
"position" : 368
},
{
"token" : "北京",
"start_offset" : 821,
"end_offset" : 823,
"type" : "CN_WORD",
"position" : 369
},
{
"token" : "自驾游",
"start_offset" : 823,
"end_offset" : 826,
"type" : "CN_WORD",
"position" : 370
},
{
"token" : "自驾",
"start_offset" : 823,
"end_offset" : 825,
"type" : "CN_WORD",
"position" : 371
},
{
"token" : "游记",
"start_offset" : 825,
"end_offset" : 827,
"type" : "CN_WORD",
"position" : 372
},
{
"token" : "https",
"start_offset" : 828,
"end_offset" : 833,
"type" : "ENGLISH",
"position" : 373
},
{
"token" : "chejiahao.autohome.com.cn",
"start_offset" : 836,
"end_offset" : 861,
"type" : "LETTER",
"position" : 374
},
{
"token" : "chejiahao",
"start_offset" : 836,
"end_offset" : 845,
"type" : "ENGLISH",
"position" : 375
},
{
"token" : "autohome",
"start_offset" : 846,
"end_offset" : 854,
"type" : "ENGLISH",
"position" : 376
},
{
"token" : "com",
"start_offset" : 855,
"end_offset" : 858,
"type" : "ENGLISH",
"position" : 377
},
{
"token" : "cn",
"start_offset" : 859,
"end_offset" : 861,
"type" : "ENGLISH",
"position" : 378
},
{
"token" : "info",
"start_offset" : 862,
"end_offset" : 866,
"type" : "ENGLISH",
"position" : 379
},
{
"token" : "2290598",
"start_offset" : 867,
"end_offset" : 874,
"type" : "ARABIC",
"position" : 380
},
{
"token" : "span",
"start_offset" : 877,
"end_offset" : 881,
"type" : "ENGLISH",
"position" : 381
}
]
}
其中有问题的点:
对应文本内容:
尝试修改文本内容,测试问题是否消失
删除文本中宝剑锋从磨砺出
中的从
字,再次执行,发现异常消失,内容可以正常分词。
尝试缩短文本,测试问题是否消失
json
POST test_2022/_analyze
{
"field": "workContent.chinese",
"text": "宝剑锋从磨砺出"
}
测试结果:
json
{
"tokens" : [
{
"token" : "宝剑锋从磨砺出",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "宝剑锋",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "宝剑",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "从",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "锋",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "从",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "磨砺",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "出",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 7
}
]
}
测试发现问题依然出现,说明问题和文本长度无关,仅仅只是因为这个一小段文本引起的。
进一步缩写异常文本范围
测试发现宝剑锋从磨砺出
已经是最短的能触发异常的文本了。
深入调研BUG的原因
BUG解决方案
方案1 上传自定义的分词库
将宝剑锋从磨砺出
设置为独立的分词,不再智能拆分。
新建一个extend.dic
的文本文件,里面添加一行内容宝剑锋从磨砺出
。
将文件放入Es安装目录,并修改IKAnalyzer.cfg.xml
文件,增加扩展字典配置。重启ES实例就可以。
已经创建的索引需要重新创建,否则分词不生效。
如果是阿里云的ES服务器,则直接使用阿里云的热更新
即可(IK热更新首次触发重启,之后更新同名词典底层不会触发重启。)。
实例重启后,重建索引,再次测试验证
json
POST test_2022/_analyze
{
"field": "workContent.chinese",
"text": "宝剑锋从磨砺出"
}
测试结果:
json
{
"tokens" : [
{
"token" : "宝剑锋从磨砺出",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "宝剑锋",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "宝剑",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "锋",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "从",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "磨砺",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "出",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 6
}
]
}