ElasticSearch 7.x现网运行问题汇集1

问题描述:

现网ElasticSearch health状态变为red,有分片无法assign。如下摘录explain的结果部分:

复制代码
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": "demo-2022.02.06",
    "shard": 3,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "CLUSTER_RECOVERED",
        "at": "2023-05-29T08:08:22.697Z",
        "last_allocation_status": "no_valid_shard_copy"
    },
    "can_allocate": "no_valid_shard_copy",
    "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
。。。
"store": {
                "in_sync": true,
                "allocation_id": "82iRvG0KTTm9NT_5Fx8BRA",
                "store_exception": {
                    "type": "corrupt_index_exception",
                    "reason": "failed engine (reason: [corrupt file (source: [start])]) (resource=preexisting_corruption)",
                    "caused_by": {
                        "type": "i_o_exception",
                        "reason": "failed engine (reason: [corrupt file (source: [start])])",
                        "caused_by": {
                            "type": "corrupt_index_exception",
                            "reason": "checksum passed (d87020fd). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/data/es/data/nodes/0/indices/dzcoAoZjSzGus0qj1sKTFg/3/index/segments_6\")))"
                        }
                    }
                }
            }

解决方案:

  1. 步骤1: 检查shard stores

GET /_shard_stores?pretty ,得到分片损坏的明细,以便进行修复,得到如图:

  1. 步骤2: reroute index

    POST /_cluster/reroute?master_timeout=5m
    {
    "commands": [
    {
    "allocate_empty_primary": {
    "index": "demo-2023.04.04",
    "shard": 2 ,
    "node": "{nodename}",
    "accept_data_loss": true
    }
    }
    ]
    }

相关推荐
Databend1 小时前
2KB histogram 背后:Databend 如何低成本追踪长尾延迟
大数据·数据分析·agent
Databend3 小时前
从湖仓升级为 Agent 时代的数据控制面,Snowflake 和 Databricks 有哪些布局
大数据·数据库·agent
Elasticsearch8 小时前
深入解析 simdvec:Elasticsearch 如何利用神经网络和视频编解码 CPU 指令实现向量搜索
elasticsearch
阿里云大数据AI技术1 天前
StarRocks x Fluss x Paimon湖流一体方案:构建秒级响应、湖流一体的实时数据引擎
大数据·人工智能
Databend1 天前
Agent 轨迹分析与归因的数据工程实践
大数据·数据库·agent
喵个咪1 天前
Go Wind UBA 拆解系列 - 架构总览:三服务、数据流与契约优先
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - 多租户与安全:两套隔离机制的边界
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - OLAP 与 SQL 硬核:25 个分析模型怎么落地
大数据·后端·go
喵个咪1 天前
Go Wind UBA 拆解系列 - SDK 与采集层:从浏览器到 Kafka
大数据·后端·go
Elasticsearch1 天前
一条命令。自然语言。你的 Elasticsearch 数据,直接进入终端
elasticsearch