ElasticSearch 7.x现网运行问题汇集1

问题描述:

现网ElasticSearch health状态变为red,有分片无法assign。如下摘录explain的结果部分:

复制代码
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": "demo-2022.02.06",
    "shard": 3,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "CLUSTER_RECOVERED",
        "at": "2023-05-29T08:08:22.697Z",
        "last_allocation_status": "no_valid_shard_copy"
    },
    "can_allocate": "no_valid_shard_copy",
    "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
。。。
"store": {
                "in_sync": true,
                "allocation_id": "82iRvG0KTTm9NT_5Fx8BRA",
                "store_exception": {
                    "type": "corrupt_index_exception",
                    "reason": "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
                    "caused_by": {
                        "type": "i_o_exception",
                        "reason": "failed engine (reason: [corrupt file (source: [start])])",
                        "caused_by": {
                            "type": "corrupt_index_exception",
                            "reason": "checksum passed (d87020fd). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/data/es/data/nodes/0/indices/dzcoAoZjSzGus0qj1sKTFg/3/index/segments_6\")))"
                        }
                    }
                }
            }

解决方案:

  1. 步骤1: 检查shard stores

GET /_shard_stores?pretty ,得到分片损坏的明细,以便进行修复,得到如图:

  1. 步骤2: reroute index

    POST /_cluster/reroute?master_timeout=5m
    {
    "commands": [
    {
    "allocate_empty_primary": {
    "index": "demo-2023.04.04",
    "shard": 2 ,
    "node": "{nodename}",
    "accept_data_loss": true
    }
    }
    ]
    }

相关推荐
雪兽软件1 小时前
如何从目标到决策构建大数据战略?
大数据
数据皮皮侠2 小时前
中国城市间地理距离矩阵(2024)
大数据·数据库·人工智能·算法·制造
ToB营销学堂2 小时前
B2B营销自动化新解法:MarketUP聚焦高转化场景
大数据·运维·自动化
TK云大师-KK2 小时前
TikTok自动化直播遇到内容重复问题?这套技术方案了解一下
大数据·运维·人工智能·矩阵·自动化·新媒体运营·流量运营
小飞Coding5 小时前
ES 性能调优核心:读懂线程栈,告别“请求被拒绝”与“集群卡顿”
elasticsearch
昨夜见军贴06165 小时前
AI审核守护生命设备安全:IACheck成为呼吸机消毒效果检测报告的智能审核专家
大数据·人工智能·安全
Elastic 中国社区官方博客6 小时前
现已正式发布: Elastic Cloud Hosted 上的托管 OTLP Endpoint
大数据·运维·数据库·功能测试·elasticsearch·全文检索
D愿你归来仍是少年6 小时前
Flink 并行度变更时 RocksDB 状态迁移的关键机制与原理
大数据·flink·apache
小飞Coding6 小时前
一文吃透 Elasticsearch 索引模板+别名:零误导、可复现的生产级实践
elasticsearch
昨夜见军贴06166 小时前
AI审核守护透析安全:IACheck助力透析微生物检测报告精准合规
大数据·人工智能·安全