Kafka 数据存储与清理机制:Topic、Partition、Segment与日志删除

Kafka 的底层不是把消息一条条放进传统队列,而是把 Topic 的数据按 Partition 写成追加日志。理解它的存储结构,才能解释为什么 Kafka 能高吞吐,也能解释日志为什么可以按时间或大小清理。

一句话概括:Kafka 中 Topic 数据落在 Partition 上,每个 Partition 又被拆成多个 Segment;每个 Segment 通常包含 .log 数据文件、.index 偏移量索引和 .timeindex 时间索引。分段让查找更快,也让过期日志删除更方便。


#mermaid-svg-QAagJaqcicEGHmpk{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-QAagJaqcicEGHmpk .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-QAagJaqcicEGHmpk .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-QAagJaqcicEGHmpk .error-icon{fill:#552222;}#mermaid-svg-QAagJaqcicEGHmpk .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-QAagJaqcicEGHmpk .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-QAagJaqcicEGHmpk .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-QAagJaqcicEGHmpk .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-QAagJaqcicEGHmpk .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-QAagJaqcicEGHmpk .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-QAagJaqcicEGHmpk .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-QAagJaqcicEGHmpk .marker{fill:#333333;stroke:#333333;}#mermaid-svg-QAagJaqcicEGHmpk .marker.cross{stroke:#333333;}#mermaid-svg-QAagJaqcicEGHmpk svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-QAagJaqcicEGHmpk p{margin:0;}#mermaid-svg-QAagJaqcicEGHmpk .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-QAagJaqcicEGHmpk .cluster-label text{fill:#333;}#mermaid-svg-QAagJaqcicEGHmpk .cluster-label span{color:#333;}#mermaid-svg-QAagJaqcicEGHmpk .cluster-label span p{background-color:transparent;}#mermaid-svg-QAagJaqcicEGHmpk .label text,#mermaid-svg-QAagJaqcicEGHmpk span{fill:#333;color:#333;}#mermaid-svg-QAagJaqcicEGHmpk .node rect,#mermaid-svg-QAagJaqcicEGHmpk .node circle,#mermaid-svg-QAagJaqcicEGHmpk .node ellipse,#mermaid-svg-QAagJaqcicEGHmpk .node polygon,#mermaid-svg-QAagJaqcicEGHmpk .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-QAagJaqcicEGHmpk .rough-node .label text,#mermaid-svg-QAagJaqcicEGHmpk .node .label text,#mermaid-svg-QAagJaqcicEGHmpk .image-shape .label,#mermaid-svg-QAagJaqcicEGHmpk .icon-shape .label{text-anchor:middle;}#mermaid-svg-QAagJaqcicEGHmpk .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-QAagJaqcicEGHmpk .rough-node .label,#mermaid-svg-QAagJaqcicEGHmpk .node .label,#mermaid-svg-QAagJaqcicEGHmpk .image-shape .label,#mermaid-svg-QAagJaqcicEGHmpk .icon-shape .label{text-align:center;}#mermaid-svg-QAagJaqcicEGHmpk .node.clickable{cursor:pointer;}#mermaid-svg-QAagJaqcicEGHmpk .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-QAagJaqcicEGHmpk .arrowheadPath{fill:#333333;}#mermaid-svg-QAagJaqcicEGHmpk .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-QAagJaqcicEGHmpk .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-QAagJaqcicEGHmpk .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-QAagJaqcicEGHmpk .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-QAagJaqcicEGHmpk .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-QAagJaqcicEGHmpk .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-QAagJaqcicEGHmpk .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-QAagJaqcicEGHmpk .cluster text{fill:#333;}#mermaid-svg-QAagJaqcicEGHmpk .cluster span{color:#333;}#mermaid-svg-QAagJaqcicEGHmpk div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-QAagJaqcicEGHmpk .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-QAagJaqcicEGHmpk rect.text{fill:none;stroke-width:0;}#mermaid-svg-QAagJaqcicEGHmpk .icon-shape,#mermaid-svg-QAagJaqcicEGHmpk .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-QAagJaqcicEGHmpk .icon-shape p,#mermaid-svg-QAagJaqcicEGHmpk .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-QAagJaqcicEGHmpk .icon-shape .label rect,#mermaid-svg-QAagJaqcicEGHmpk .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-QAagJaqcicEGHmpk .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-QAagJaqcicEGHmpk .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-QAagJaqcicEGHmpk :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Topic: itheima
Partition 0
Partition 1
Segment 0
Segment 1
000.log 数据文件
000.index 偏移量索引
000.timeindex 时间索引

Topic、Partition、Segment 的关系

Kafka 的存储结构可以这样理解:

text 复制代码
Topic
  ├── Partition 0
  │     ├── Segment 0
  │     │     ├── .log
  │     │     ├── .index
  │     │     └── .timeindex
  │     └── Segment 1
  ├── Partition 1
  └── Partition 2
层级 作用
Topic 业务主题,比如订单事件、用户行为
Partition Topic 的物理分片,提高并行能力
Segment Partition 的日志分段,便于查找和清理
.log 真正保存消息数据
.index Offset 到物理位置的稀疏索引
.timeindex 时间到 Offset 的索引

Partition 是 Kafka 并行能力的基础,Segment 是 Kafka 管理磁盘文件的基础。

为什么要分段

如果一个 Partition 只对应一个巨大文件,查找和删除都会很麻烦。

分段之后有两个明显好处:

好处 说明
查找更方便 先定位 Segment,再通过索引定位消息
删除更方便 过期数据所在的旧 Segment 可以整体删除

#mermaid-svg-BCYlumRYNL3Cgg4s{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-BCYlumRYNL3Cgg4s .error-icon{fill:#552222;}#mermaid-svg-BCYlumRYNL3Cgg4s .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-BCYlumRYNL3Cgg4s .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-BCYlumRYNL3Cgg4s .marker{fill:#333333;stroke:#333333;}#mermaid-svg-BCYlumRYNL3Cgg4s .marker.cross{stroke:#333333;}#mermaid-svg-BCYlumRYNL3Cgg4s svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-BCYlumRYNL3Cgg4s p{margin:0;}#mermaid-svg-BCYlumRYNL3Cgg4s .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster-label text{fill:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster-label span{color:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster-label span p{background-color:transparent;}#mermaid-svg-BCYlumRYNL3Cgg4s .label text,#mermaid-svg-BCYlumRYNL3Cgg4s span{fill:#333;color:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s .node rect,#mermaid-svg-BCYlumRYNL3Cgg4s .node circle,#mermaid-svg-BCYlumRYNL3Cgg4s .node ellipse,#mermaid-svg-BCYlumRYNL3Cgg4s .node polygon,#mermaid-svg-BCYlumRYNL3Cgg4s .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-BCYlumRYNL3Cgg4s .rough-node .label text,#mermaid-svg-BCYlumRYNL3Cgg4s .node .label text,#mermaid-svg-BCYlumRYNL3Cgg4s .image-shape .label,#mermaid-svg-BCYlumRYNL3Cgg4s .icon-shape .label{text-anchor:middle;}#mermaid-svg-BCYlumRYNL3Cgg4s .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-BCYlumRYNL3Cgg4s .rough-node .label,#mermaid-svg-BCYlumRYNL3Cgg4s .node .label,#mermaid-svg-BCYlumRYNL3Cgg4s .image-shape .label,#mermaid-svg-BCYlumRYNL3Cgg4s .icon-shape .label{text-align:center;}#mermaid-svg-BCYlumRYNL3Cgg4s .node.clickable{cursor:pointer;}#mermaid-svg-BCYlumRYNL3Cgg4s .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-BCYlumRYNL3Cgg4s .arrowheadPath{fill:#333333;}#mermaid-svg-BCYlumRYNL3Cgg4s .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-BCYlumRYNL3Cgg4s .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-BCYlumRYNL3Cgg4s .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BCYlumRYNL3Cgg4s .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-BCYlumRYNL3Cgg4s .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BCYlumRYNL3Cgg4s .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster text{fill:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s .cluster span{color:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-BCYlumRYNL3Cgg4s .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-BCYlumRYNL3Cgg4s rect.text{fill:none;stroke-width:0;}#mermaid-svg-BCYlumRYNL3Cgg4s .icon-shape,#mermaid-svg-BCYlumRYNL3Cgg4s .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-BCYlumRYNL3Cgg4s .icon-shape p,#mermaid-svg-BCYlumRYNL3Cgg4s .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-BCYlumRYNL3Cgg4s .icon-shape .label rect,#mermaid-svg-BCYlumRYNL3Cgg4s .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-BCYlumRYNL3Cgg4s .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-BCYlumRYNL3Cgg4s .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-BCYlumRYNL3Cgg4s :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 查找 offset=10520
定位所在 Segment
查 .index
跳到 .log 对应物理位置

这就是为什么 Kafka 的日志清理通常可以按 Segment 粒度执行,而不是一条条消息删除。

日志清理策略一:按保留时间

课件里提到第一种清理策略是按时间。消息在 Kafka 中保存超过指定时间后,会触发清理。

默认保留时间常见是 168 小时,也就是 7 天。
#mermaid-svg-3btDbZeSYiiJgli6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-3btDbZeSYiiJgli6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-3btDbZeSYiiJgli6 .error-icon{fill:#552222;}#mermaid-svg-3btDbZeSYiiJgli6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-3btDbZeSYiiJgli6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-3btDbZeSYiiJgli6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-3btDbZeSYiiJgli6 .marker.cross{stroke:#333333;}#mermaid-svg-3btDbZeSYiiJgli6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-3btDbZeSYiiJgli6 p{margin:0;}#mermaid-svg-3btDbZeSYiiJgli6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-3btDbZeSYiiJgli6 .cluster-label text{fill:#333;}#mermaid-svg-3btDbZeSYiiJgli6 .cluster-label span{color:#333;}#mermaid-svg-3btDbZeSYiiJgli6 .cluster-label span p{background-color:transparent;}#mermaid-svg-3btDbZeSYiiJgli6 .label text,#mermaid-svg-3btDbZeSYiiJgli6 span{fill:#333;color:#333;}#mermaid-svg-3btDbZeSYiiJgli6 .node rect,#mermaid-svg-3btDbZeSYiiJgli6 .node circle,#mermaid-svg-3btDbZeSYiiJgli6 .node ellipse,#mermaid-svg-3btDbZeSYiiJgli6 .node polygon,#mermaid-svg-3btDbZeSYiiJgli6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-3btDbZeSYiiJgli6 .rough-node .label text,#mermaid-svg-3btDbZeSYiiJgli6 .node .label text,#mermaid-svg-3btDbZeSYiiJgli6 .image-shape .label,#mermaid-svg-3btDbZeSYiiJgli6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-3btDbZeSYiiJgli6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-3btDbZeSYiiJgli6 .rough-node .label,#mermaid-svg-3btDbZeSYiiJgli6 .node .label,#mermaid-svg-3btDbZeSYiiJgli6 .image-shape .label,#mermaid-svg-3btDbZeSYiiJgli6 .icon-shape .label{text-align:center;}#mermaid-svg-3btDbZeSYiiJgli6 .node.clickable{cursor:pointer;}#mermaid-svg-3btDbZeSYiiJgli6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-3btDbZeSYiiJgli6 .arrowheadPath{fill:#333333;}#mermaid-svg-3btDbZeSYiiJgli6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-3btDbZeSYiiJgli6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-3btDbZeSYiiJgli6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-3btDbZeSYiiJgli6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-3btDbZeSYiiJgli6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-3btDbZeSYiiJgli6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-3btDbZeSYiiJgli6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-3btDbZeSYiiJgli6 .cluster text{fill:#333;}#mermaid-svg-3btDbZeSYiiJgli6 .cluster span{color:#333;}#mermaid-svg-3btDbZeSYiiJgli6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-3btDbZeSYiiJgli6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-3btDbZeSYiiJgli6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-3btDbZeSYiiJgli6 .icon-shape,#mermaid-svg-3btDbZeSYiiJgli6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-3btDbZeSYiiJgli6 .icon-shape p,#mermaid-svg-3btDbZeSYiiJgli6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-3btDbZeSYiiJgli6 .icon-shape .label rect,#mermaid-svg-3btDbZeSYiiJgli6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-3btDbZeSYiiJgli6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-3btDbZeSYiiJgli6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-3btDbZeSYiiJgli6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 否

Segment 写入完成
等待保留时间
是否超过 retention 时间
继续保留
删除过期 Segment

这种策略适合大多数日志、行为数据、事件流水场景。业务只关心最近一段时间的数据,超过保留期就可以清理。

日志清理策略二:按存储大小

第二种策略是按 Topic 占用空间大小。当 Topic 日志文件大小超过阈值后,Kafka 会删除更旧的数据。
#mermaid-svg-nOQxrkQOjuBGaZYD{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-nOQxrkQOjuBGaZYD .error-icon{fill:#552222;}#mermaid-svg-nOQxrkQOjuBGaZYD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-nOQxrkQOjuBGaZYD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-nOQxrkQOjuBGaZYD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-nOQxrkQOjuBGaZYD .marker.cross{stroke:#333333;}#mermaid-svg-nOQxrkQOjuBGaZYD svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-nOQxrkQOjuBGaZYD p{margin:0;}#mermaid-svg-nOQxrkQOjuBGaZYD .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster-label text{fill:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster-label span{color:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster-label span p{background-color:transparent;}#mermaid-svg-nOQxrkQOjuBGaZYD .label text,#mermaid-svg-nOQxrkQOjuBGaZYD span{fill:#333;color:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD .node rect,#mermaid-svg-nOQxrkQOjuBGaZYD .node circle,#mermaid-svg-nOQxrkQOjuBGaZYD .node ellipse,#mermaid-svg-nOQxrkQOjuBGaZYD .node polygon,#mermaid-svg-nOQxrkQOjuBGaZYD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-nOQxrkQOjuBGaZYD .rough-node .label text,#mermaid-svg-nOQxrkQOjuBGaZYD .node .label text,#mermaid-svg-nOQxrkQOjuBGaZYD .image-shape .label,#mermaid-svg-nOQxrkQOjuBGaZYD .icon-shape .label{text-anchor:middle;}#mermaid-svg-nOQxrkQOjuBGaZYD .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-nOQxrkQOjuBGaZYD .rough-node .label,#mermaid-svg-nOQxrkQOjuBGaZYD .node .label,#mermaid-svg-nOQxrkQOjuBGaZYD .image-shape .label,#mermaid-svg-nOQxrkQOjuBGaZYD .icon-shape .label{text-align:center;}#mermaid-svg-nOQxrkQOjuBGaZYD .node.clickable{cursor:pointer;}#mermaid-svg-nOQxrkQOjuBGaZYD .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-nOQxrkQOjuBGaZYD .arrowheadPath{fill:#333333;}#mermaid-svg-nOQxrkQOjuBGaZYD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-nOQxrkQOjuBGaZYD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-nOQxrkQOjuBGaZYD .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-nOQxrkQOjuBGaZYD .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-nOQxrkQOjuBGaZYD .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-nOQxrkQOjuBGaZYD .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster text{fill:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD .cluster span{color:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-nOQxrkQOjuBGaZYD .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-nOQxrkQOjuBGaZYD rect.text{fill:none;stroke-width:0;}#mermaid-svg-nOQxrkQOjuBGaZYD .icon-shape,#mermaid-svg-nOQxrkQOjuBGaZYD .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-nOQxrkQOjuBGaZYD .icon-shape p,#mermaid-svg-nOQxrkQOjuBGaZYD .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-nOQxrkQOjuBGaZYD .icon-shape .label rect,#mermaid-svg-nOQxrkQOjuBGaZYD .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-nOQxrkQOjuBGaZYD .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-nOQxrkQOjuBGaZYD .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-nOQxrkQOjuBGaZYD :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 否

Topic 日志持续增长
是否超过大小阈值
继续写入
从最旧 Segment 开始删除

按大小清理通常用于控制磁盘成本。它需要结合业务可接受的数据保留范围来配置,否则可能出现数据还没来得及被下游处理就被清掉。

清理机制的工程影响

Kafka 的消息不是消费完就立刻删除。消费者只是提交自己的 Offset,消息仍然会在 Kafka 中保存到保留策略触发。

这带来两个重要影响:

影响 说明
可以重复消费 只要日志还在,可以重置 Offset 重新消费
磁盘要规划 高吞吐 Topic 必须估算保留时间和磁盘容量

如果业务需要重新补数据,比如修复一个消费程序 bug,可以把消费者组 Offset 回退到旧位置重新消费。但前提是旧日志还没被清理。

面试回答模板

可以这样答:

Kafka 的数据是按照 Topic、Partition、Segment 三级结构存储的。Topic 会拆成多个 Partition,每个 Partition 在磁盘上又会分成多个 Segment。每个 Segment 通常包含 .log 数据文件、.index 偏移量索引文件和 .timeindex 时间索引文件。分段的好处是减少单个文件大小,提高查找效率,也方便清理过期数据。Kafka 的日志清理主要有两类策略:第一是按保留时间,消息保存超过指定时间后删除,默认常见是 168 小时;第二是按 Topic 日志大小,超过阈值后删除最旧的数据。消费者提交 Offset 不代表消息立即删除,消息是否删除由日志保留策略决定。

小结

Kafka 存储结构可以记成一句话:
#mermaid-svg-IqJrrL5hoDgp30yC{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-IqJrrL5hoDgp30yC .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-IqJrrL5hoDgp30yC .error-icon{fill:#552222;}#mermaid-svg-IqJrrL5hoDgp30yC .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-IqJrrL5hoDgp30yC .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-IqJrrL5hoDgp30yC .marker{fill:#333333;stroke:#333333;}#mermaid-svg-IqJrrL5hoDgp30yC .marker.cross{stroke:#333333;}#mermaid-svg-IqJrrL5hoDgp30yC svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-IqJrrL5hoDgp30yC p{margin:0;}#mermaid-svg-IqJrrL5hoDgp30yC .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-IqJrrL5hoDgp30yC .cluster-label text{fill:#333;}#mermaid-svg-IqJrrL5hoDgp30yC .cluster-label span{color:#333;}#mermaid-svg-IqJrrL5hoDgp30yC .cluster-label span p{background-color:transparent;}#mermaid-svg-IqJrrL5hoDgp30yC .label text,#mermaid-svg-IqJrrL5hoDgp30yC span{fill:#333;color:#333;}#mermaid-svg-IqJrrL5hoDgp30yC .node rect,#mermaid-svg-IqJrrL5hoDgp30yC .node circle,#mermaid-svg-IqJrrL5hoDgp30yC .node ellipse,#mermaid-svg-IqJrrL5hoDgp30yC .node polygon,#mermaid-svg-IqJrrL5hoDgp30yC .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-IqJrrL5hoDgp30yC .rough-node .label text,#mermaid-svg-IqJrrL5hoDgp30yC .node .label text,#mermaid-svg-IqJrrL5hoDgp30yC .image-shape .label,#mermaid-svg-IqJrrL5hoDgp30yC .icon-shape .label{text-anchor:middle;}#mermaid-svg-IqJrrL5hoDgp30yC .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-IqJrrL5hoDgp30yC .rough-node .label,#mermaid-svg-IqJrrL5hoDgp30yC .node .label,#mermaid-svg-IqJrrL5hoDgp30yC .image-shape .label,#mermaid-svg-IqJrrL5hoDgp30yC .icon-shape .label{text-align:center;}#mermaid-svg-IqJrrL5hoDgp30yC .node.clickable{cursor:pointer;}#mermaid-svg-IqJrrL5hoDgp30yC .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-IqJrrL5hoDgp30yC .arrowheadPath{fill:#333333;}#mermaid-svg-IqJrrL5hoDgp30yC .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-IqJrrL5hoDgp30yC .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-IqJrrL5hoDgp30yC .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IqJrrL5hoDgp30yC .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-IqJrrL5hoDgp30yC .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IqJrrL5hoDgp30yC .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-IqJrrL5hoDgp30yC .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-IqJrrL5hoDgp30yC .cluster text{fill:#333;}#mermaid-svg-IqJrrL5hoDgp30yC .cluster span{color:#333;}#mermaid-svg-IqJrrL5hoDgp30yC div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-IqJrrL5hoDgp30yC .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-IqJrrL5hoDgp30yC rect.text{fill:none;stroke-width:0;}#mermaid-svg-IqJrrL5hoDgp30yC .icon-shape,#mermaid-svg-IqJrrL5hoDgp30yC .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-IqJrrL5hoDgp30yC .icon-shape p,#mermaid-svg-IqJrrL5hoDgp30yC .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-IqJrrL5hoDgp30yC .icon-shape .label rect,#mermaid-svg-IqJrrL5hoDgp30yC .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-IqJrrL5hoDgp30yC .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-IqJrrL5hoDgp30yC .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-IqJrrL5hoDgp30yC :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Topic
Partition
Segment
.log
.index
.timeindex

Partition 负责并行,Segment 负责文件管理,Retention 负责清理。

相关推荐
heimeiyingwang19 小时前
【架构实战】分布式ID生成方案:雪花算法与业务ID设计
分布式·算法·架构
AOwhisky20 小时前
Ceph系列第一期:Ceph分布式存储核心概念与架构初识
linux·运维·笔记·分布式·ceph·学习·架构
大帅点兵20 小时前
设计一个金融交易监控系统
大数据·clickhouse·flink·spark·kafka·hbase
Plastic garden20 小时前
Kafka
分布式·kafka
未若君雅裁20 小时前
Kafka 顺序消费:分区、消费者组、Key与业务有序性
分布式·微服务·kafka
Advancer-21 小时前
点评plus---异步消费之后可靠的生成订单
java·spring·kafka
AOwhisky1 天前
Ceph系列第二期:Ceph集群部署实战(cephadm)
linux·运维·笔记·分布式·ceph·云计算·存储
qiuyepiaoling1 天前
rabbitmq 基础
分布式·rabbitmq·ruby
未若君雅裁1 天前
Kafka 消息可靠性:发送确认、acks、副本保存与Offset手动提交
分布式·微服务·kafka