ElasticSearch 7.x现网运行问题汇集1

问题描述:

现网ElasticSearch health状态变为red,有分片无法assign。如下摘录explain的结果部分:

复制代码
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": "demo-2022.02.06",
    "shard": 3,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "CLUSTER_RECOVERED",
        "at": "2023-05-29T08:08:22.697Z",
        "last_allocation_status": "no_valid_shard_copy"
    },
    "can_allocate": "no_valid_shard_copy",
    "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
。。。
"store": {
                "in_sync": true,
                "allocation_id": "82iRvG0KTTm9NT_5Fx8BRA",
                "store_exception": {
                    "type": "corrupt_index_exception",
                    "reason": "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
                    "caused_by": {
                        "type": "i_o_exception",
                        "reason": "failed engine (reason: [corrupt file (source: [start])])",
                        "caused_by": {
                            "type": "corrupt_index_exception",
                            "reason": "checksum passed (d87020fd). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/data/es/data/nodes/0/indices/dzcoAoZjSzGus0qj1sKTFg/3/index/segments_6\")))"
                        }
                    }
                }
            }

解决方案:

  1. 步骤1: 检查shard stores

GET /_shard_stores?pretty ,得到分片损坏的明细,以便进行修复,得到如图:

  1. 步骤2: reroute index

    POST /_cluster/reroute?master_timeout=5m
    {
    "commands": [
    {
    "allocate_empty_primary": {
    "index": "demo-2023.04.04",
    "shard": 2 ,
    "node": "{nodename}",
    "accept_data_loss": true
    }
    }
    ]
    }

相关推荐
代码匠心1 小时前
从零开始学Flink:TopN 榜单
大数据·后端·flink·flink sql·大数据处理
张较瘦_3 小时前
软件工程 | 需求三层次:用正反对比例子,把复杂概念讲明白
大数据·软件工程
袋鼠云数栈4 小时前
集团数字化统战实战:统一数据门户与全业态监管体系构建
大数据·数据结构·人工智能·多模态
TechubNews5 小时前
Jack Dorsey:告别传统公司层级,借助 AI 走向智能体架构
大数据·人工智能
onebound_noah5 小时前
【实战教程】如何通过API快速获取淘宝/天猫商品评论数据(含多语言Demo)
大数据·数据库
胡耀超6 小时前
Token的八副面孔:为什么“词元“不需要更好的翻译,而需要更多的读者
大数据·人工智能·python·agent·token·代币·词元
带娃的IT创业者6 小时前
WeClaw_42_Agent工具注册全链路:从BaseTool到意图识别的标准化接入
大数据·网络·人工智能·agent·意图识别·basetool·工具注册
TDengine (老段)8 小时前
以事件为核心 + 以资产为核心:工业数据中缺失的关键一环
大数据·数据库·人工智能·时序数据库·tdengine·涛思数据
阿里云大数据AI技术8 小时前
欣和大数据阿里云上升级,打造湖仓一体平台
大数据·人工智能
极创信息10 小时前
信创软件安全加固指南,信创软件的纵深防御体系
java·大数据·数据库·金融·php·mvc·软件工程