milvus 数据备份和还原

milvus 出现了一个bug,所有集合 全部卡在 0 % , 日志显示:Milvus 内部的 RocksMQ 消息队列数据损坏,无法修复,只能重新导入。所以,还是整一个备份吧!顺便把RocksMQ 替换成 plauser

swift 复制代码
[2026/04/07 10:32:03.284 +00:00] [WARN] [adaptor/scanner_switchable.go:200] ["create underlying scanner for wal scanner, start a backoff"] [module=streamingnode] [component=scanner] [name=recovery] [channel=by-dev-rootcoord-dml_12:rw@104] [startMessageID=4/2] [nextInterval=2.535581757s] [error="decode rmqID fail with err: strconv.ParseUint: parsing \"CAQQAg==\": invalid syntax, id: CAQQAg==: invalid message id"] [errorVerbose="decode rmqID fail with err: strconv.ParseUint: parsing \"CAQQAg==\": invalid syntax, id: CAQQAg==: invalid message id\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/rmq.unmarshalMessageID\n  | \t/workspace/source/pkg/streaming/walimpls/impls/rmq/message_id.go:33\n  | github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/rmq.(*walImpl).Read\n  | \t/workspace/source/pkg/streaming/walimpls/impls/rmq/wal.go:86\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*catchupScanner).createReaderWithBackoff\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_switchable.go:187\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*catchupScanner).Do\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_switchable.go:86\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*scannerAdaptorImpl).produceEventLoop\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_adaptor.go:198\n  | [...repeated from below...]\nWraps: (2) decode rmqID fail with err: strconv.ParseUint: parsing \"CAQQAg==\": invalid syntax, id: CAQQAg==\nWraps: (3) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/pkg/v2/streaming/util/message.init\n  | \t/workspace/source/pkg/streaming/util/message/message_id.go:16\n  | runtime.doInit1\n  | \t/usr/local/go/src/runtime/proc.go:7410\n  | runtime.doInit\n  | \t/usr/local/go/src/runtime/proc.go:7377\n  | runtime.main\n  | \t/usr/local/go/src/runtime/proc.go:254\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1700\nWraps: (4) invalid message id\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.leafError"]
[2026/04/07 10:32:03.456 +00:00] [WARN] [adaptor/scanner_switchable.go:200] ["create underlying scanner for wal scanner, start a backoff"] [module=streamingnode] [component=scanner] [name=recovery] [channel=by-dev-rootcoord-dml_10:rw@104] [startMessageID=2/41] [nextInterval=4.105696061s] [error="decode rmqID fail with err: strconv.ParseUint: parsing \"CAIQKQ==\": invalid syntax, id: CAIQKQ==: invalid message id"] [errorVerbose="decode rmqID fail with err: strconv.ParseUint: parsing \"CAIQKQ==\": invalid syntax, id: CAIQKQ==: invalid message id\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/rmq.unmarshalMessageID\n  | \t/workspace/source/pkg/streaming/walimpls/impls/rmq/message_id.go:33\n  | github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/rmq.(*walImpl).Read\n  | \t/workspace/source/pkg/streaming/walimpls/impls/rmq/wal.go:86\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*catchupScanner).createReaderWithBackoff\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_switchable.go:187\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*catchupScanner).Do\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_switchable.go:86\n  | github.com/milvus-io/milvus/internal/streamingnode/server/wal/adaptor.(*scannerAdaptorImpl).produceEventLoop\n  | \t/workspace/source/internal/streamingnode/server/wal/adaptor/scanner_adaptor.go:198\n  | [...repeated from below...]\nWraps: (2) decode rmqID fail with err: strconv.ParseUint: parsing \"CAIQKQ==\": invalid syntax, id: CAIQKQ==\nWraps: (3) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/pkg/v2/streaming/util/message.init\n  | \t/workspace/source/pkg/streaming/util/message/message_id.go:16\n  | runtime.doInit1\n  | \t/usr/local/go/src/runtime/proc.go:7410\n  | runtime.doInit\n  | \t/usr/local/go/src/runtime/proc.go:7377\n  | runtime.main\n  | \t/usr/local/go/src/runtime/proc.go:254\n  | runtime.goexit\n  | 

具体过程:

  1. 下载milvus-backup
  2. 解压并放置指定目录
  3. 修改配置milvus.yaml
  4. 重新启动 milvus

下载 milvus-backup

下载地址:github.com/zilliztech/...

milvus-backup貌似只支持 linux系统

解压并放置指定目录

bash 复制代码
# 将压缩包上传至服务器
tar -zxvf milvus-backup_0.5.12_Linux_x86_64.tar.gz

# 放置milvus-backup和创建文件backup.yaml
------config
 |---------backup.yaml
------milvus-backup

修改配置

backup.yaml

yaml 复制代码
# milvus: 要备份的源 Milvus 实例信息[reference:5]
milvus:
  address: 127.0.0.1            # 容器名,与 docker-compose-base.yml 中一致
  port: 19530
  # 如果你没有开启认证,这部分可以注释掉
  # authorizationEnabled: true
  # user: "root"                # 如果开启了认证,填写你的用户名
  # password: "Milvus"          # 如果开启了认证,填写你的密码
  # tlsMode: 0                  # 0 表示关闭 TLS[reference:6]

# minio: Milvus 实例所使用的对象存储信息[reference:7]
minio:
  storageType: "minio"          # 固定为 "minio"[reference:8]
  address: 127.0.0.1                # 容器名,与 docker-compose-base.yml 中一致
  port: 9000
  accessKeyID: ${MINIO_USER}          # 替换为你的 ${MINIO_USER} 实际值
  secretAccessKey: ${MINIO_PASSWORD}  # 替换为你的 ${MINIO_PASSWORD} 实际值
  useSSL: false
  bucketName: "a-bucket"        # Milvus 默认存储桶,Docker Compose 部署为 a-bucket[reference:9]
  rootPath: "files"             # Milvus 存储根路径,与 docker-compose-base.yml 中一致[reference:10]

# backup: 备份文件存放的目标存储信息[reference:11]
backup:
  backupStorageType: "minio"    # 备份存储类型[reference:12]
  address: 127.0.0.1         # 备份存储地址,与 minio.address 保持一致[reference:13]
  backupPort: 9000
  backupAccessKeyID: ${MINIO_USER}     # 与 accessKeyID 保持一致
  backupSecretAccessKey: ${MINIO_PASSWORD} # 与 secretAccessKey 保持一致
  backupBucketName: "a-bucket"  # 备份文件存放的桶名,推荐与 bucketName 相同[reference:14]
  backupRootPath: "backup"      # 备份文件的根路径,建议单独设置一个目录[reference:15]

milvus.yaml 中,修改 mq的type值 和 pulsar的 address值

yaml 复制代码
...
mq:
  # Default value: "default"
  # Valid values: [default, pulsar, kafka, rocksmq, woodpecker]
  type: pulsar # 修改此处
...
pulsar:
  # IP address of Pulsar service.
  # Environment variable: PULSAR_ADDRESS
  # pulsar.address and pulsar.port together generate the valid access to Pulsar.
  # Pulsar preferentially acquires the valid IP address from the environment variable PULSAR_ADDRESS when Milvus is started.
  # Default value applies when Pulsar is running on the same network with Milvus.
  address: pulsar  # 修改此处,不是container-name
  

docker 配置,注意自己要配置好 .env文件,至于docker 镜像问题,建议科学上网

yaml 复制代码
  minio:
    image: quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z
    container_name: ragflow-minio
    command: server --console-address ":9001" /data
    ports:
      - ${MINIO_PORT}:9000
      - ${MINIO_CONSOLE_PORT}:9001
    env_file: .env
    environment:
      - MINIO_ROOT_USER=${MINIO_USER}
      - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
      - TZ=${TIMEZONE}
    volumes:
      - minio_data:/data
    networks:
      - ragflow
    restart: unless-stopped 

  etcd:
    container_name: ragflow-milvus-etcd
    image: quay.io/coreos/etcd:v3.5.18
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - etcd_data:/etcd          # 改用命名卷,避免路径冲突
    command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3
    networks:
      - ragflow
    restart: unless-stopped 

  standalone:
    container_name: ragflow-milvus-standalone
    image: milvusdb/milvus:v2.6.13
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      MINIO_ACCESS_KEY_ID: ${MINIO_USER}      # 与 base .env 中的 MINIO_USER 一致
      MINIO_SECRET_ACCESS_KEY: ${MINIO_PASSWORD}  # 与 base .env 中的 MINIO_PASSWORD 一致
      ETCD_ENDPOINTS: etcd:2379
      # 修改 minio 地址:指向 base 中的 minio 服务(服务名 minio,内部端口 9000)
      MINIO_ADDRESS: minio:9000
      # MQ_TYPE: woodpecker
    volumes:
      - milvus_data:/var/lib/milvus   # 改用命名卷
      - ./milvus.yaml:/milvus/configs/milvus.yaml  # 新增:挂载配置文件
    ports:
      - "19530:19530"    # Milvus gRPC
      - "9091:9091"      # Milvus metrics
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["gpu"]
              device_ids: ["1"]
    depends_on:
      - etcd
      - minio               # 依赖 base 的 minio
    networks:
      - ragflow
    restart: unless-stopped 

    # milvus attu WEBUI界面
  attu:
    container_name: ragflow-milvus-attu
    image: zilliz/attu:latest
    environment:
      MILVUS_URL: standalone:19530
    ports:
      - "8987:3000"
    depends_on:
      - "standalone"
    # profiles:
    #   - all
    #   - attu
    networks:
      - ragflow   # 如果 Attu 也需要加入同一网络(可选,但建议加上)
    restart: unless-stopped 

  pulsar:
    container_name: ragflow-milvus-pulsar
    image: apachepulsar/pulsar:3.3.4  # 建议使用稳定版本
    command: bin/pulsar standalone
    ports:
      - "6650:6650"   # Pulsar 客户端连接端口
      - "8080:8080"   # Pulsar HTTP API 端口
    volumes:
      - pulsar_data:/pulsar/data  # 数据持久化
    networks:
      - ragflow
    restart: unless-stopped

volumes:
  minio_data:
    driver: local
  etcd_data:
    driver: local
  milvus_data:
    driver: local
  pulsar_data:
    driver: local

networks:
  ragflow:
    driver: bridge

重新启动 milvus

bash 复制代码
# 启动容器
docker compose -f <docker文件路径> -p <项目名称> up -d

# 备份milvus 
./milvus-backup create -n my_backup_$(date +%Y%m%d_%H%M%S)
# 读取列表
./milvus-backup list
# 还原备份
./milvus-backup restore -n <备份文件名> -s _restored

关于备份是否成功,可以打开minio控制台,localhost:9001,找到是否存在a-bunket的桶

定期备份

  1. 新建 milvus_cleanup_backup.sh
bash 复制代码
#!/bin/bash

# --- 配置区 ---
# 1. 设置milvus-backup二进制文件的路径
BACKUP_BIN=<milvus-backup 路径> # 例如"/home/ykt-root/文档/_dockers/ragflow/milvus-backup"
# 2. 设置备份脚本的工作目录
WORK_DIR=<configs文件夹路径> # 例如"/home/ykt-root/文档/_dockers/ragflow"
# 3. 设置备份保留天数
RETENTION_DAYS=30
# 4. 设置日志文件路径
LOG_FILE="/var/log/milvus_backup.log"

# --- 获取当前日期的时间戳,用于对比 ---
CURRENT_DATE=$(date +%s)

# --- 切换工作目录 ---
cd $WORK_DIR

echo "$(date) - 开始清理超过 $RETENTION_DAYS 天的旧备份。" >> $LOG_FILE

# 1. 使用 list 命令获取所有备份名称
BACKUP_LIST=$($BACKUP_BIN list 2>/dev/null)

# 2. 遍历每个备份名称
echo "$BACKUP_LIST" | while read BACKUP_NAME; do
    # 跳过空行
    if [ -z "$BACKUP_NAME" ]; then
        continue
    fi

    # 从备份名称中提取日期部分,假设格式为 auto_backup_20231015_143000
    BACKUP_DATE_STR=$(echo $BACKUP_NAME | grep -oP 'auto_backup_\K[0-9]{8}')
    if [ -z "$BACKUP_DATE_STR" ]; then
        echo "警告: 无法从备份名 $BACKUP_NAME 中解析日期,跳过。" >> $LOG_FILE
        continue
    fi

    # 将日期字符串转换为时间戳
    BACKUP_DATE=$(date -d "${BACKUP_DATE_STR}" +%s 2>/dev/null)
    if [ -z "$BACKUP_DATE" ]; then
        echo "警告: 日期 $BACKUP_DATE_STR 无效,跳过。" >> $LOG_FILE
        continue
    fi

    # 计算备份文件的年龄(天数)
    AGE_DAYS=$(( ($CURRENT_DATE - $BACKUP_DATE) / 86400 ))

    # 如果超过保留天数,则执行删除
    if [ $AGE_DAYS -gt $RETENTION_DAYS ]; then
        echo "$(date) - 备份 $BACKUP_NAME 已存在 $AGE_DAYS 天,超过 $RETENTION_DAYS 天,正在删除..." >> $LOG_FILE
        $BACKUP_BIN delete --name $BACKUP_NAME >> $LOG_FILE 2>&1
        if [ $? -eq 0 ]; then
            echo "$(date) - 备份 $BACKUP_NAME 删除成功。" >> $LOG_FILE
        else
            echo "$(date) - 备份 $BACKUP_NAME 删除失败。" >> $LOG_FILE
        fi
    fi
done

echo "----------------------------------------" >> $LOG_FILE
  1. 设置权限:
bash 复制代码
chmod +x <替换为milvus_cleanup_backup.sh实际路径>
  1. 设置定时启动
bash 复制代码
# 编辑 crontab 文件:
crontab -e
# 设置默认编辑器
# 3点定时清理脚本
0 3 * * * <替换为milvus_cleanup_backup.sh实际路径>
# 查看编辑的定时任务
crontab -l
相关推荐
用户962377954482 小时前
代码审计 | Filter —— Tomcat 内存马从零到注入
后端
snakeshe10102 小时前
从装饰器到动态代理:彻底理解 Java AOP 的底层原理与实战应用
后端
小飞Coding2 小时前
基于 Redis +Lua+ ZooKeeper 的轻量级内嵌式限流
后端
Hui Baby2 小时前
springboot读取配置文件
后端·python·flask
leo_messi942 小时前
2026版商城项目(三)-- ES+认证服务
后端·python·django
Hadoop_Liang3 小时前
构建Spring Boot项目Docker镜像
spring boot·后端·docker
自珍JAVA3 小时前
Gobrs-Async 框架
后端
xdscode3 小时前
Spring 依赖注入方式全景解析
java·后端·spring
青柠代码录3 小时前
【Spring】@Component VS @Configuration
后端