需求背景
es索引需新增字段,该字段初始值由某个旧字段确定。数据量大概1000w条文档。
两种方案:
1、直接使用update_by_query命令进行全量更新。
2、新建索引并进行全量迁移,迁移时更新新增字段的初始值。
经过测试,方案1的耗时较长(2小时左右),方案2耗时20分钟左右,所以采用方案2进行处理。
新建索引
新建目标索引
java
PUT /tbproj_1211
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis" : {
"normalizer" : {
"CustomNormalizer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom"
}
},
"analyzer" : {
"optimizeIK" : {
"type" : "custom",
"tokenizer" : "ik_max_word"
}
}
}
}
},
"mappings": {
"dynamic" : "true",
"properties" : {
"createTime" : {
"type" : "long",
"coerce" : false
},
"creator" : {
"type" : "keyword"
},
"credit" : {
"type" : "keyword"
},
"dbId" : {
"type" : "keyword"
},
"definition" : {
"type" : "keyword"
},
"fileName" : {
"type" : "keyword"
},
"fromDbId" : {
"type" : "keyword"
},
"id" : {
"type" : "keyword"
},
"isApproved" : {
"type" : "keyword"
},
"modifiedTime" : {
"type" : "long",
"coerce" : false
},
"modifier" : {
"type" : "keyword"
},
"note" : {
"type" : "keyword"
},
"original" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"normalizer" : {
"type" : "keyword",
"normalizer" : "CustomNormalizer"
},
"pattern" : {
"type" : "text",
"norms" : false,
"analyzer" : "pattern"
},
"standard" : {
"type" : "text",
"norms" : false,
"analyzer" : "standard"
}
},
"norms" : false,
"analyzer" : "optimizeIK"
},
"originalLang" : {
"type" : "keyword"
},
"remark" : {
"type" : "keyword"
},
"reviewTime" : {
"type" : "long",
"coerce" : false
},
"reviewer" : {
"type" : "keyword"
},
"translation" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
},
"normalizer" : {
"type" : "keyword",
"normalizer" : "CustomNormalizer"
},
"pattern" : {
"type" : "text",
"norms" : false,
"analyzer" : "pattern"
},
"standard" : {
"type" : "text",
"norms" : false,
"analyzer" : "standard"
}
},
"norms" : false,
"analyzer" : "optimizeIK"
},
"translationLang" : {
"type" : "keyword"
}
}
}
}
迁移数据
由于数据量较大,采用全量迁移和增量迁移。全量迁移在凌晨执行,增量迁移可在发版时执行。
全量迁移
迁移脚本
java
### 全量迁移
POST _reindex?wait_for_completion=false
{
"conflicts": "proceed",
"source": {
"index": "tbproj",
"size":10000
},
"dest": {
"index": "tbproj_1211",
"op_type": "index"
},
"script": {
"lang": "painless",
"params": {
"lv1_terms": ["", "译员术语"],
"lv2_terms": ["PM术语"],
"lv3_terms": ["客户术语", "行业术语"]
},
"source": """
// 处理字段不存在的情况
if (!ctx._source.containsKey('remark')) {
ctx._source.credit = 2;
return;
}
def remark = ctx._source.remark;
// 处理 null 值
if (remark == null) {
ctx._source.credit = 2;
return;
}
String remarkStr = remark.toString();
if (params.lv1_terms.contains(remarkStr)) {
ctx._source.credit = 2;
} else if (params.lv2_terms.contains(remarkStr)) {
ctx._source.credit = 3;
} else if (params.lv3_terms.contains(remarkStr)) {
ctx._source.credit = 4;
} else {
ctx._source.credit = 2;
}
"""
}
}
执行上述命令后,会返回任务id,可根据返回的任务id查询迁移的进度。
java
GET _tasks/ouuGEmiVSOaw2ih49hGepg:13942427
增量迁移
执行完全量迁移脚本后,可使用下方脚本继续迁移增量数据(在执行全量迁移命令后产生的数据)。
java
### 增量迁移
POST _reindex?wait_for_completion=false
{
"conflicts": "proceed", ###出现冲突时,继续迁移
"source": {
"index": "tbproj",
"size":10000,
"query": {
"range": {
"modifiedTime": {
"gte": 1764910963831 ###执行全量迁移操作的时间
}
}
}
},
"dest": {
"index": "tbproj_1211",
"op_type": "index" ###出现冲突时,覆盖已有数据
},
"script": {
"lang": "painless",
"params": {
"lv1_terms": ["", "译员术语"],
"lv2_terms": ["PM术语"],
"lv3_terms": ["客户术语", "行业术语"]
},
"source": """
// 处理字段不存在的情况
if (!ctx._source.containsKey('remark')) {
ctx._source.credit = 2;
return;
}
def remark = ctx._source.remark;
// 处理 null 值
if (remark == null) {
ctx._source.credit = 2;
return;
}
String remarkStr = remark.toString();
if (params.lv1_terms.contains(remarkStr)) {
ctx._source.credit = 2;
} else if (params.lv2_terms.contains(remarkStr)) {
ctx._source.credit = 3;
} else if (params.lv3_terms.contains(remarkStr)) {
ctx._source.credit = 4;
} else {
ctx._source.credit = 2;
}
"""
}
}
删除索引
迁移完成后,校验数据完整性,校验通过后可删除旧索引(如果不需要新索引名称与旧索引名称一致,可不用删除)
java
DELETE /tbproj
创建别名
旧索引删除后,可为新索引创建和旧索引名称一致的别名,后续就可以继续使用旧索引名称进行相关操作。
java
###查看已有别名
GET /tbproj_1211/_alias
###创建或删除别名
POST _aliases
{
"actions": [
{
"add": { ###创建别名
"index": "tb_1211",
"alias": "tb"
}
},
{
"add": {
"index": "tbproj_1211",
"alias": "tbproj"
}
},
{
"remove": { ###删除别名
"index": "tbproj_1211",
"alias": "tbproj_01"
}
}
]
}