【无标题】

在 Elasticsearch 8.17 上实现 dense 字段原地更新

背景：我们到底想解决什么问题

业务里有一类字段很「烦人」：PV、下载量、在线状态、实时计数------写得多、变得快、又要参与排序和聚合，但几乎不需要全文检索。

用 Elasticsearch 的标准路径处理这类数据，会遇到几个硬约束：

Lucene segment 不可变 ：文档一旦 flush 进 segment，字段值就「写死了」，改一个数往往意味着 reindex 或走 _update 重写文档。
_update / reindex 成本高：高频计数场景下，每次 +1 都触发文档级更新，IO 和 merge 压力都很大。
_source 与检索字段分离的需求 ：展示层可能仍读 _source，但 sort / agg / filter 希望读「最新计数」，且不希望为了改一个 long 去动整段 JSON。

我们的目标是：在 ES 8.17 上为数值型字段提供 docvalues 级的原地局部更新 ------写入索引后，通过 REST API 按 _id 或 Query 批量改值；查询、排序、聚合读 mmap 里的最新值，不修改 _source。

这就是 es_update 插件的由来。但在真正上线、加副本 时，我们又撞上了 ES 内核的另一堵墙：Store 层的 checksum 校验。最终方案分成两层：

插件层（es_update）：Lucene Codec + mmap + Transport 复制，解决「怎么改值、怎么查」。
Server 补丁层（elasticsearch-server-patch）：改 ES 源码，解决「改完值后 recovery 为什么把分片标成 corrupt」。

整体架构

#mermaid-svg-bhJLruZoFzPXqJna{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-bhJLruZoFzPXqJna .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-bhJLruZoFzPXqJna .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-bhJLruZoFzPXqJna .error-icon{fill:#552222;}#mermaid-svg-bhJLruZoFzPXqJna .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-bhJLruZoFzPXqJna .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-bhJLruZoFzPXqJna .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-bhJLruZoFzPXqJna .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-bhJLruZoFzPXqJna .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-bhJLruZoFzPXqJna .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-bhJLruZoFzPXqJna .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-bhJLruZoFzPXqJna .marker{fill:#333333;stroke:#333333;}#mermaid-svg-bhJLruZoFzPXqJna .marker.cross{stroke:#333333;}#mermaid-svg-bhJLruZoFzPXqJna svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-bhJLruZoFzPXqJna p{margin:0;}#mermaid-svg-bhJLruZoFzPXqJna .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-bhJLruZoFzPXqJna .cluster-label text{fill:#333;}#mermaid-svg-bhJLruZoFzPXqJna .cluster-label span{color:#333;}#mermaid-svg-bhJLruZoFzPXqJna .cluster-label span p{background-color:transparent;}#mermaid-svg-bhJLruZoFzPXqJna .label text,#mermaid-svg-bhJLruZoFzPXqJna span{fill:#333;color:#333;}#mermaid-svg-bhJLruZoFzPXqJna .node rect,#mermaid-svg-bhJLruZoFzPXqJna .node circle,#mermaid-svg-bhJLruZoFzPXqJna .node ellipse,#mermaid-svg-bhJLruZoFzPXqJna .node polygon,#mermaid-svg-bhJLruZoFzPXqJna .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-bhJLruZoFzPXqJna .rough-node .label text,#mermaid-svg-bhJLruZoFzPXqJna .node .label text,#mermaid-svg-bhJLruZoFzPXqJna .image-shape .label,#mermaid-svg-bhJLruZoFzPXqJna .icon-shape .label{text-anchor:middle;}#mermaid-svg-bhJLruZoFzPXqJna .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-bhJLruZoFzPXqJna .rough-node .label,#mermaid-svg-bhJLruZoFzPXqJna .node .label,#mermaid-svg-bhJLruZoFzPXqJna .image-shape .label,#mermaid-svg-bhJLruZoFzPXqJna .icon-shape .label{text-align:center;}#mermaid-svg-bhJLruZoFzPXqJna .node.clickable{cursor:pointer;}#mermaid-svg-bhJLruZoFzPXqJna .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-bhJLruZoFzPXqJna .arrowheadPath{fill:#333333;}#mermaid-svg-bhJLruZoFzPXqJna .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-bhJLruZoFzPXqJna .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-bhJLruZoFzPXqJna .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-bhJLruZoFzPXqJna .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-bhJLruZoFzPXqJna .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-bhJLruZoFzPXqJna .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-bhJLruZoFzPXqJna .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-bhJLruZoFzPXqJna .cluster text{fill:#333;}#mermaid-svg-bhJLruZoFzPXqJna .cluster span{color:#333;}#mermaid-svg-bhJLruZoFzPXqJna div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-bhJLruZoFzPXqJna .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-bhJLruZoFzPXqJna rect.text{fill:none;stroke-width:0;}#mermaid-svg-bhJLruZoFzPXqJna .icon-shape,#mermaid-svg-bhJLruZoFzPXqJna .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-bhJLruZoFzPXqJna .icon-shape p,#mermaid-svg-bhJLruZoFzPXqJna .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-bhJLruZoFzPXqJna .icon-shape .label rect,#mermaid-svg-bhJLruZoFzPXqJna .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-bhJLruZoFzPXqJna .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-bhJLruZoFzPXqJna .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-bhJLruZoFzPXqJna :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} elasticsearch-server-patch
Elasticsearch 内核
es_update 插件
业务层
patch jar
索引 / 全量写入
高频计数变更
dense_field Mapper
DenseField912Codec
旁路 mmap 文件

.dfd / .dfdc / .dfm
_dense_update REST
Transport 复制到 replica
Lucene segment 标准文件
Store 文件校验与 recovery
SkipChecksumFiles

跳过 dense 旁路 checksum

技术实现（一）：es_update 插件

1. 字段模型：docvalues-only 的 `dense_field`

我们没有再造一个「可索引的 numeric」，而是定义 dense_field：

仅 docvalues，无倒排 （不支持 index: true）。
支持 bool / byte / short / int / long 等 value_type。
索引写入时在 Lucene FieldInfo 上打标：dense_field=true、字节宽度、默认值等。

这样 sort、terms agg、range 过滤仍走 ES 熟悉的 docvalues 路径，但不会为高频变更字段维护 posting list。

2. 存储：旁路 fix-width 文件 + mmap

每个 segment 为 dense 字段生成旁路文件：

扩展名	含义
`.dfd`	dense 数值数据区（按 docId × 字段宽度排列）
`.dfm`	字段元数据（偏移、类型、每值字节数）

flush 时由 DenseDocValuesConsumer 写入；查询时 DenseDocValuesProducer 通过 DenseMappedBuffer mmap 打开，返回可更新的 ReadOrUpdateDenseNumericDocValues。

更新路径：

复制代码

REST _dense_update
  → TransportDenseUpdateAction（广播各 shard）
    → DenseUpdateShardService
      → acquireSearcher → Lucene Query 命中 docId
        → collector → updateValue(docId, value) 写 mmap

Fix-width 的设计思路

Lucene 标准 NumericDocValues 在 segment 里是压缩、只读 的：flush 后布局由 Codec 决定，没有「按 docId 随机改 8 字节」的公开 API。我们要的是 数组式存储 ，因此把 dense 数据单独放在旁路文件里，用 定长（fix-width） 编码。

核心公式（单个 segment、单个 dense 字段）：

复制代码

byteOffset(docId) = valuesOffset + docId × bytesPerValue
segment 数据区大小 = maxDoc × bytesPerValue

其中 bytesPerValue 由 mapping 的 value_type 决定（如 long → 8 字节），valuesOffset 写在 .dfm 里。这样 docId 与物理地址之间是 O(1) 线性映射 ，无需维护 _id → offset 的二级索引，也无需在更新时解析变长结构。
#mermaid-svg-XP7oMg8QLfd5i7a3{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XP7oMg8QLfd5i7a3 .error-icon{fill:#552222;}#mermaid-svg-XP7oMg8QLfd5i7a3 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XP7oMg8QLfd5i7a3 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .marker.cross{stroke:#333333;}#mermaid-svg-XP7oMg8QLfd5i7a3 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XP7oMg8QLfd5i7a3 p{margin:0;}#mermaid-svg-XP7oMg8QLfd5i7a3 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster-label text{fill:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster-label span{color:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster-label span p{background-color:transparent;}#mermaid-svg-XP7oMg8QLfd5i7a3 .label text,#mermaid-svg-XP7oMg8QLfd5i7a3 span{fill:#333;color:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .node rect,#mermaid-svg-XP7oMg8QLfd5i7a3 .node circle,#mermaid-svg-XP7oMg8QLfd5i7a3 .node ellipse,#mermaid-svg-XP7oMg8QLfd5i7a3 .node polygon,#mermaid-svg-XP7oMg8QLfd5i7a3 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .rough-node .label text,#mermaid-svg-XP7oMg8QLfd5i7a3 .node .label text,#mermaid-svg-XP7oMg8QLfd5i7a3 .image-shape .label,#mermaid-svg-XP7oMg8QLfd5i7a3 .icon-shape .label{text-anchor:middle;}#mermaid-svg-XP7oMg8QLfd5i7a3 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .rough-node .label,#mermaid-svg-XP7oMg8QLfd5i7a3 .node .label,#mermaid-svg-XP7oMg8QLfd5i7a3 .image-shape .label,#mermaid-svg-XP7oMg8QLfd5i7a3 .icon-shape .label{text-align:center;}#mermaid-svg-XP7oMg8QLfd5i7a3 .node.clickable{cursor:pointer;}#mermaid-svg-XP7oMg8QLfd5i7a3 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .arrowheadPath{fill:#333333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP7oMg8QLfd5i7a3 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XP7oMg8QLfd5i7a3 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP7oMg8QLfd5i7a3 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster text{fill:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 .cluster span{color:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XP7oMg8QLfd5i7a3 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XP7oMg8QLfd5i7a3 rect.text{fill:none;stroke-width:0;}#mermaid-svg-XP7oMg8QLfd5i7a3 .icon-shape,#mermaid-svg-XP7oMg8QLfd5i7a3 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP7oMg8QLfd5i7a3 .icon-shape p,#mermaid-svg-XP7oMg8QLfd5i7a3 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XP7oMg8QLfd5i7a3 .icon-shape .label rect,#mermaid-svg-XP7oMg8QLfd5i7a3 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP7oMg8QLfd5i7a3 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XP7oMg8QLfd5i7a3 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XP7oMg8QLfd5i7a3 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} .dfd 数据区 fix-width 数组
offset = off + docId×8
.dfm 元数据
valuesOffset
bytesPerValue = 8
value_type = long
docId=0 → 8 bytes
docId=1 → 8 bytes
docId=n → 8 bytes
_dense_update(docId)

为什么选 fix-width，而不是变长或稀疏结构？

布局	随机更新	与 Lucene docId 对齐	实现复杂度
变长（类似 `_source`）	需 realloc / 重写后续字节	差	高
稀疏 Hash / 外部 KV	可以	docId 变更（merge）需 remap	高
fix-width 稠密数组	O(1) 定位	与 segment maxDoc 一一对应	低

dense 字段的业务前提是：segment 内绝大多数 doc 都有该字段 （或至少有默认值 0）。这与「稠密数组」假设一致------空 slot 用 mapping 里的 default 在 flush 时填满，之后更新只改对应 slot。

索引期写入时，DenseDocValuesConsumer 按 docId 顺序扫描 NumericDocValues，以 16MiB 为块流式写入 .dfd，避免整块 byte[] OOM；.dfm 记录每个字段的 valuesOffset 与 valuesLength，并校验 valuesLength == maxDoc × bytesPerValue，从格式上 锁死 fix-width 约束。

配合 mmap 实现 docvalues-only 的优势

fix-width 解决的是 「改哪里」 ；mmap 解决的是 「怎么改才够快、并能被查询读到」。两者叠加，才构成完整的 docvalues-only 原地更新能力。

1. 更新成本：从「文档级」降到「几个字节」

标准 ES 更新路径：解析 _source → 合并 JSON → 写 translog → eventually 新 segment。

我们的路径：offset = valuesOffset + docId × bytesPerValue → MappedByteBuffer.putLong(offset, value)。

一次 _dense_update 本质是 常数时间、常数 IO 的随机写 ，与文档里其它字段数量、_source 大小无关。

2. 读路径统一：sort / agg / filter 仍走 docvalues

dense_field ** deliberately 不做倒排**。查询侧 DenseMappedFieldType 基于 NumericDocValues 提供 term / range / sort。

Producer 打开 mmap 后，longValue() 直接从 fix-width 槽位读------与 Lucene docvalues 调用模型一致，ES 聚合、排序代码几乎不用改。

也就是说：

写：旁路 fix-width + mmap 随机写（扩展能力）。
读：仍是 NumericDocValues，对上层透明（兼容能力）。

3. mmap 与 fix-width 的天然匹配

变长数据 mmap 后很难安全原地改长度；fix-width 槽位大小固定，每个 slot 的 mmap 写边界清晰 。

DenseMappedBuffer 按 1GiB 分片映射（对齐 Lucene MMapDirectory 策略），支持大 segment；对单槽位的 putLong / compareAndAdd（inc 增量）加细粒度写锁，避免并发更新同一 chunk 时撕裂。

4. 与 docvalues-only 互补：省倒排、省 _source 联动

能力	标准 numeric + index	dense_field + fix-width mmap
倒排	有 posting，更新语义等于 reindex	无 posting，不为计数维护 terms
`_source`	常一起变	刻意不变，检索面与展示面解耦
存储	Codec 压缩 docvalues	旁路 uncompressed 数组，换更新速度
适用	低变更、要高 QPS 检索	高变更、要高 QPS sort/agg

docvalues-only 去掉了倒排维护；fix-width + mmap 去掉了 docvalues 的「只读压缩」限制------两者目标一致：把资源集中在「列式读、单点写」，而不是通用文档引擎的全量 rewrite。

5. 代价（设计里必须接受的）

空间：maxDoc × bytesPerValue × 字段数，删除的 doc 仍占 slot（与 Lucene 软删 docId 一致）。
docId 绑定 ：更新 API 内部 _id → Query → docId，merge 后 docId 变但 segment 内仍自洽。
无倒排：纯 term 过滤走 docvalues 慢路径，适合 bool 后置 filter，不适合作为主检索键。

text 复制代码

# 单 segment 容量粗算（1 个 long dense 字段）
.dfd 大小 ≈ maxDoc × 8 字节
例：3000 万 doc ≈ 240 MB / segment / 字段

3. 为什么要把 dense 文件放在 compound 外面

Lucene 默认会把 segment 小文件打进 .cfe/.cfs compound。我们实现了 DenseCompoundFormat：compound 阶段把 .dfd/.dfm 复制到 compound 外部 （.dfdc/.dfmc），保证 mmap 始终落在真实文件路径上。

否则：compound 内部是只读视图，无法做原地 mmap 写；这也是「必须在 Codec 层动手」而不是纯 REST 能搞定的原因。

4. 副本一致性：Transport 复制

_dense_update 走 TransportReplicationAction：primary 改 mmap 后，同一请求下发到 replica 执行相同更新。这与 ES 标准写入复制模型一致，不依赖 reindex。

5. 运维兜底：`_dense_sync_footer`

mmap 改了 payload 后，Lucene footer 里的 CRC 可能与字节内容不一致。插件提供：

bash 复制代码

POST /{index}/_dense_sync_footer?primary_only=true

对每个已提交 segment 的 .dfdc 执行 force + refreshFooter，与 ES VerifyingIndexOutput 使用的 CRC 算法对齐。

在打上 server 补丁之前，加副本前必须跑这一步；否则就会踩下文说的 recovery 坑。

技术实现（二）：elasticsearch-server-patch

事故现象

一次典型线上路径是：

索引已有数据，执行过多次 _dense_update（mmap payload 已变，footer CRC 未刷新）。
没有先 _dense_sync_footer，直接 number_of_replicas: 1。
Peer recovery 拷贝 .dfdc 时，VerifyingIndexOutput.verify() 报错：

verification failed : expected=12l7j20 actual=1u0gvr9
resource=name [2...dfdc], length [122], checksum [...]
分片进入 ALLOCATION_FAILED / no_valid_shard_copy，Primary 也可能被标 corrupted_*。

根因不是「数据丢了」，而是 ES Store 认为文件 checksum 与 commit metadata 不一致 ------对我们这种 故意原地改文件 的场景，标准校验语义与需求冲突。

补丁做了什么

在 elasticsearch-server-patch 工程中，fork ES 8.17.0 的 Store 相关类，新增 SkipChecksumFiles，对以下扩展名 跳过 checksum，仅保留 length 校验：

.dfd / .dfdc / .dfm / .dfmc

修改类	作用
`SkipChecksumFiles`	识别 dense 旁路文件
`VerifyingIndexOutput`	recovery 写文件后不再比 CRC
`Store.checkIntegrity`	分片打开 / integrity 检查跳过 CRC
`Store.MetadataSnapshot.checksumFromLuceneFile`	构建 metadata 时不读 stale footer
`StoreFileMetadata.isSame`	recovery diff 对 skip 文件只比 name + length
`Store.VerifyingIndexInput`	snapshot restore 读路径同样 skip

部署方式（轻量）：mvn compile 后把 class 注入 lib/elasticsearch-8.17.0.jar，无需 clone 整个 ES 仓库。

bash 复制代码

cd elasticsearch-server-patch
mvn test compile
ES_HOME=/path/to/elasticsearch-8.17.0 ./scripts/patch-es-jar.sh
# 重启 ES → 再装 es_update 插件

为什么这样设计：几个关键取舍

1. 为什么用 mmap 原地写，而不是每次 `_update` 文档

fix-width 让 单次更新的地址可算 ；mmap 让 这次更新不必经过 Lucene IndexWriter。二者缺一不可：只有 fix-width 没有 mmap，仍要 rewrite 文件；只有 mmap 没有 fix-width，无法 O(1) 定位 docId 对应槽位。

方案	优点	缺点
标准 `_update` / reindex	不改 ES/Lucene	文档级 rewrite，高频计数扛不住
外部 Redis + 查询时合并	实现简单	sort/agg 难走 ES 原生路径
fix-width + mmap 改 docvalues	单次更新 O(1)，sort/agg 原生	要自定义 Codec，要处理 recovery/checksum

我们的核心诉求是 检索面在 ES 内、更新面足够轻。fix-width 数组 + mmap + docvalues-only，是在 Lucene 语义下能拿到的接近最优解。

2. 为什么不动 `_source`

_source 是 JSON 快照，改它等于文档级更新。业务上常出现：

检索 / 排序 / 聚合：要最新 PV（读 docvalues / mmap）。
GET /_doc 展示：可以容忍 _source 里是旧值，或由业务层读 Redis。

把 dense 字段从 _source 语义里「切开」，能避免为了计数把整文档重写一遍。

旁路文件若完全无格式，ES 既不知道文件边界，也难以在 recovery 时拷贝。我们选择在 插件侧 写标准 Lucene footer，并在更新后 refreshFooter，与 ES metadata 对齐。

这能缓解问题，但 无法从根上消除「忘记 sync 就加副本」的人为失误，也无法改变「payload 已变、metadata 里仍是旧 checksum」时 ES 的校验行为------后者必须动 Store。

4. 为什么 checksum 问题不能只靠插件解决

这是本文最想强调的一点。

Elasticsearch 插件能扩展什么？

MapperPlugin：新字段类型。
EnginePlugin：换 Codec、Engine 工厂。
ActionPlugin：新 REST / Transport Action。
Codec SPI：自定义 Lucene 存储。

插件扩展不了什么？

org.elasticsearch.index.store.Store 在 recovery 时如何校验拷贝的字节。
VerifyingIndexOutput.verify() 是否抛出 CorruptIndexException。
MetadataSnapshot 如何从 footer 计算 StoreFileMetadata.checksum。
分片分配时 checkIntegrity 的硬逻辑。

这些类在 server 模块里，没有 SPI、没有 Setting 开关、没有「按文件名后缀跳过校验」的钩子。Recovery 路径是：

复制代码

RecoverySourceHandler
  → MultiFileWriter.innerWriteFileChunk
    → Store.createVerifyingOutput
      → VerifyingIndexOutput.verify()   // 插件无法 intercept

我们在插件里已经做了：

更新后 refreshFooter（DenseFileFormat）。
运维 API _dense_sync_footer。
Transport 复制到 replica。

但只要 有一次 在 footer 过期状态下触发 peer recovery，ES 内核仍会判 corrupt------这不是插件多写一个 REST 能修的，因为 校验发生在插件 classpath 之外的 server 代码里。

5. 为什么选「改源码 + 注入 jar」，而不是别的方式

我们也评估过其它路：

思路	结论
加副本前强制 `_dense_sync_footer`	有效，但依赖运维 SOP，漏一步就炸
旁路文件不进 commit / 无 footer	要自建 recovery 灌数据，复杂度更高
用 REST「修复 checksum」事后补救	对已 unassigned 的分片往往调不到 API
改 Store，skip dense 旁路 checksum	从根上允许 mmap 与 recovery 共存

最终采用 server_jar_only ：只 fork 4 个 Java 文件，编译后 jar uf 进官方 ES 安装包。代价是 ES 小版本升级要 rebase 补丁；收益是 加副本不再绑死 sync_footer 时序，与 mmap 更新模型一致。

trade-off 必须说清楚 ：skip checksum 后，ES 不再帮我们发现 dense 旁路文件的 bit rot；_dense_sync_footer 退化为可选对账手段，而不是 recovery 的前置门禁。

小结

层次	职责	能否只用插件
es_update	`dense_field`、fix-width 旁路存储、mmap 更新、docvalues 读路径、Transport 复制	---
elasticsearch-server-patch	recovery / Store 对 dense 旁路文件 skip checksum	否，必须改 server 源码

一句话：fix-width + mmap 解决了「怎么 O(1) 改 docvalues」；改源码解决了「ES 文件校验模型与原地改文件不兼容」。两层叠加，才是在 ES 8.17 上可运维、可加副本的完整方案。

在 Elasticsearch 8.17 上实现 dense 字段原地更新

背景：我们到底想解决什么问题

整体架构

技术实现（一）：es_update 插件

1. 字段模型：docvalues-only 的 `dense_field`

2. 存储：旁路 fix-width 文件 + mmap

Fix-width 的设计思路

配合 mmap 实现 docvalues-only 的优势

3. 为什么要把 dense 文件放在 compound 外面

4. 副本一致性：Transport 复制

5. 运维兜底：`_dense_sync_footer`

技术实现（二）：elasticsearch-server-patch

事故现象

补丁做了什么

为什么这样设计：几个关键取舍

1. 为什么用 mmap 原地写，而不是每次 `_update` 文档

2. 为什么不动 `_source`

3. 为什么旁路文件仍带 Lucene header/footer（插件侧）

4. 为什么 checksum 问题不能只靠插件解决

5. 为什么选「改源码 + 注入 jar」，而不是别的方式

推荐部署顺序

小结

相关仓库

【无标题】

在 Elasticsearch 8.17 上实现 dense 字段原地更新

背景：我们到底想解决什么问题

整体架构

技术实现（一）：es_update 插件

1. 字段模型：docvalues-only 的 dense_field

2. 存储：旁路 fix-width 文件 + mmap

Fix-width 的设计思路

配合 mmap 实现 docvalues-only 的优势

3. 为什么要把 dense 文件放在 compound 外面

4. 副本一致性：Transport 复制

5. 运维兜底：_dense_sync_footer

技术实现（二）：elasticsearch-server-patch

事故现象

补丁做了什么

为什么这样设计：几个关键取舍

1. 为什么用 mmap 原地写，而不是每次 _update 文档

2. 为什么不动 _source

3. 为什么旁路文件仍带 Lucene header/footer（插件侧）

4. 为什么 checksum 问题不能只靠插件解决

5. 为什么选「改源码 + 注入 jar」，而不是别的方式

推荐部署顺序

小结

相关仓库

1. 字段模型：docvalues-only 的 `dense_field`

5. 运维兜底：`_dense_sync_footer`

1. 为什么用 mmap 原地写，而不是每次 `_update` 文档

2. 为什么不动 `_source`