基于SAM2的眼动数据跟踪2

一、如何将目标跟踪的结果转为.aois

1、目标跟踪的结果文件长什么样？

2、.aois文件长什么样？

3、如何理解.aois文件里面的内容？帧不同步？

问题：目标框位置坐标我们从json里面拿就可以了，但是.aois文件里面的DurationMicroseconds和Seconds是怎么获知的呢？

4、视频转图片

问题：ffmpeg.exe、ffprobe.exe是什么？

问题：如何理解上面提取视频帧的的代码？

问题：为什么要把视频信息从video.py解析视频后生成到video_meta.json然后又保存到LabelingWidget中？

一、如何将目标跟踪的结果转为.aois

1、目标跟踪的结果文件长什么样？

对一张图片里面的目标生成目标框，会以.json的文件保存标注，每个图片都对应于一个json文件，一般可以用labelme.exe打开，但是X-AnyLabeling生成的json是不能用labelme打开的，因为它有些字段不一样。我们看看X-AnyLabeling生成的json长什么样：

复制代码

{
  "version": "3.1.1",
  "flags": {},
  "shape": [
    {
      "kie_linking": [],
      "label": "pad",
      "score": null,
      "points": [
        [
          1079.0,
          872.0
        ],
        [
          1508.0,
          872.0
        ],
        [
          1508.0,
          1079.0
        ],
        [
          1079.0,
          1079.0
        ]
      ],
      "group_id": null,
      "description": "",
      "difficult": false,
      "shape_type": "rectangle",
      "flags": {},
      "attributes": {}
    }
  ], 
  "imagePath": "frame_00000.jpg",
  "imageData": null,
  "imageHeight": 1080,
  "imageWidth": 1920,
  "description": ""
}

2、.aois文件长什么样？

我们在Tobii Pro Lab软件中对一个视频进行目标框的标注，大概标注300帧左右，然后导出.aois文件，会发现这个文件非常地乱，自己拆解里面的字段然后自己写成可以看的格式（其实是GPT搞定的），如下：

复制代码

{
  "Version": 2,
  "Tags": [],
  "Media": {
    "MediaType": 1,
    "Height": 1080,
    "Width": 1920,
    "MediaCount": 1,
    "DurationMicroseconds": 714094000
  },
  "Aois": [
    {
      "Name": "pad",
      "Red": 212,
      "Green": 0,
      "Blue": 255,
      "Tags": [],
      "KeyFrames": [
         {
          "IsActive": true,
          "Seconds": 0.0,
          "Vertices": [
            {
              "X": 1079.0,
              "Y": 872.0
            },
            {
              "X": 1508.0,
              "Y": 872.0
            },
            {
              "X": 1508.0,
              "Y": 1079.0
            },
            {
              "X": 1079.0,
              "Y": 1079.0
            }
          ]
        }
        {
          "IsActive": true,
          "Seconds": 0.0,
          "Vertices": [
            {
              "X": 1079.0,
              "Y": 872.0
            },
            {
              "X": 1508.0,
              "Y": 872.0
            },
            {
              "X": 1508.0,
              "Y": 1079.0
            },
            {
              "X": 1079.0,
              "Y": 1079.0
            }
          ]
        }
      ]
    }
  ]
}

如果你在软件AOI那个界面的右侧设置了标签组，你就会在导出的aois文件里面看到下面这些东西。所谓设置标签组的作用就是说，多个同标签的目标框你可以统一删除。我们就不考虑.aois文件里面有这个东西了，除了用于删标签比较方便以外，用处不大。

复制代码

[
  {"Id":"7aa644cc‑d141‑45a1‑b2a9‑89aa5a43f65a","GroupName":"pad","TagName":"Tag1"},
  {"Id":"a9138ef3‑0e83‑4c0f‑bcc9‑775c51980dea","GroupName":"PAD","TagName":"Tag3"},
  {"Id":"e3c4d294‑9a7a‑48a9‑b186‑c462210487c4","GroupName":"pad1","TagName":"Tag2"},
  {"Id":"f0fb7dec‑2a4b‑4450‑aebc‑6fe9b6d17de5","GroupName":"ROAD","TagName":"Tag4"}
]

3、如何理解.aois文件里面的内容？帧不同步？

很明显"KeyFrames"这个字段就是记录关键帧的目标框的，记录的方式是记录四个点的x和y，这四个点是从第一个点开始逆时针数的，但其实我们一般都是只记录第1个点和第3个点，也就是矩形的左上角点和右下角点就能确定一个矩形，我也不知道为什么它要记录四个点。

我们注意到"KeyFrames"这个字段里面有个"IsActive"，这个其实意思就是是否显示这个标注，如果"IsActive"为true就代表显示这个帧的标注，false就代表不显示这个帧的标注。为什么这里面有个关键帧的概念呢，这是因为它这个软件很可能是只记录目标移动了的帧的所在位置，在目标不移动这个期间都是直接自动补充目标框，它不像我们的目标跟踪算法一样，会对每一帧图片都记录目标框的位置。它这样做有个毛病，就是当目标消失了之后，在目标重现之前，它会一直显示最后一次目标出现的时候的目标框，我们理解为那是"目标残影 "，所以其实我的理解是，我们可以对每一帧图片写入目标框位置，但是如果目标消失了，在目标出现之前我们还是得写入最后一次目标出现的时候的目标框位置并把这一帧的"IsActive"置为false，至于怎么判断目标消失了，应该是要设置一个目标框最小面积，目标框面积小于多少就认为是目标消失。

那能不能目标消失的话我就不写入这一帧的目标框呢？感觉上是不能，因为目标消失时你不写入，它就会认为"这个物体存在但是没移动"，所以它会"自动补充渲染物体没移动这段时间的目标框位置，其实就是在两个关键帧之间直接渲染第一个关键的目标框"，只有你写入了目标框并且把"IsActive"置为false，它才会认为"物体消失了"。它这种搞法感觉不是很合理，因为你直接用一个变量判断是否目标消失，然后目标消失直接记录上次最后出现的目标的目标框不就行了，没必要每一帧都记录上次最后出现的目标的目标框吧。

不管它怎么搞，反正我们只要想办法能够把我们自己写的.aois文件成功导入到那个软件里面就行了。

问题：目标框位置坐标我们从json里面拿就可以了，但是.aois文件里面的DurationMicroseconds和Seconds是怎么获知的呢？

复制代码

"DurationMicroseconds": 714094000 这其实是表示视频时长是714094000微妙
1 秒 = 1 000 000 微秒（µs）
1 毫秒 = 1 000 微秒

换算步骤 ：
秒 ： 714 094 000 ÷ 1 000 000 = 714.094 秒
分钟 + 秒 ： 714.094 ÷ 60 = 11.901566...
整数部分 11 → 11 分钟
剩余 0.901566... × 60 ≈ 54.094 秒
结论 ：这段视频的时长约为 11 分 54 秒 （更精确到千分之一秒是 0.094 s）。

只要有视频，我们是有办法获知视频的时长的，也可以获知视频转视频帧之后的总帧数，根据下面第4个公式就可以计算出fps，根据下面第5个公式就可以计算出seconds。这些公式其实就是跟路程=时间x速度，时间=路程/速度差不多。

总帧数 = 时长(秒) × FPS
总帧数 = 时长(ms) / 1000 × FPS
总帧数 = 时长(μs) / 1000000 × FPS
FPS = 1000000 × 总帧数 / 时长(μs)
Seconds = 帧号 / FPS

4、视频转图片

在介绍json转.aois之前，我们先要解决一个问题就是如何获取.aois中的DurationMicroseconds和Seconds，前面已经提到Seconds = 帧号 / FPS，FPS = 1000000 × 总帧数 / 时长(μs)，也就是说我们输入之后得先将视频转视频帧，并且解析出总帧数和视频时长，然后再计算出FPS，最后再计算出Seconds。

要解析视频，首先要有这两个东西：ffmpeg.exe、ffprobe.exe

下载 release-full 版 FFmpeg
https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-full.7z

解压到任意目录，例如
D:\tools\ffmpeg

把 D:\tools\ffmpeg\bin 加入系统环境变量 PATH

问题：`ffmpeg.exe`、`ffprobe.exe是什么？`

ffmpeg.exe 和 ffprobe.exe 是 FFmpeg 项目下的两个核心命令行工具，广泛用于音视频处理。它们的作用如下：

✅ ffmpeg.exe：音视频处理"万能工具"

功能：用于转码、剪辑、合并、压缩、提取音频、加字幕、录屏、推流等几乎所有音视频处理任务。

常见用途举例：

视频格式转换：ffmpeg -i input.avi output.mp4

提取音频：ffmpeg -i video.mp4 -vn audio.aac

压缩视频：ffmpeg -i input.mp4 -vcodec libx264 -crf 28 output.mp4

加字幕：ffmpeg -i video.mp4 -vf subtitles=subs.srt output.mp4

✅ ffprobe.exe：音视频信息"侦探"

功能：用于查看媒体文件的详细信息，如编码格式、分辨率、帧率、码率、时长、音频轨道、字幕轨道等。

常见用途举例：

查看视频信息：ffprobe video.mp4

输出为 JSON 格式（方便程序处理）：
ffprobe -v quiet -print_format json -show_format -show_streams video.mp4

✅ 总结一句话：

工具名作用简述

ffmpeg.exe 处理音视频（转码、剪辑等）

ffprobe.exe 查看音视频信息（不修改文件）

工具名	作用简述
`ffmpeg.exe`	处理音视频（转码、剪辑等）
`ffprobe.exe`	查看音视频信息（不修改文件）

接下来，我们需要修改 anylabeling/views/labeling/utils/video.py的源码。

在源码中的extract_frames_from_video函数有这两句：

total_frames = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))

fps = video_capture.get(cv2.CAP_PROP_FPS)

我们不用这两句获取的总帧数和FPS，而是计算视频转完图片之后的图片数作为总帧数，用ffprobe解析出视频的时长之后，再根据上面提到的公式计算FPS。这样做的目的是为了获取实际的总帧数和FPS，如果直接用上面这两句获取的话，它是opencv预读的，会不准确。一开始我就是用它这个预读的值去转.aois，发现获取的Seconds是不跟软件里面的视频帧同步的，然后比对导出的.aois里面的Seconds有什么规律，才反向推理出这里有问题的。

python 复制代码

# anylabeling/views/labeling/utils/video.py

# [新增函数]
def _get_exact_duration_us(path: str):
    """
    调用 ffprobe 拿到视频真实时长（毫秒），
    解决 cv2.CAP_PROP_FRAME_COUNT 在 VFR 视频里不准的问题。
    """
    import subprocess, json
    cmd = ["ffprobe", "-v", "error",
           "-show_entries", "format=duration", "-of", "json", path]
    dur = float(json.loads(subprocess.check_output(cmd))["format"]["duration"])
    return int(dur * 1000000)          # 714094000 微秒

def extract_frames_from_video(self, input_file, out_dir):
    """
    主入口：抽帧策略
    1. 优先 ffmpeg（快、准、支持硬件加速）
    2. 退化 OpenCV（兼容无 ffmpeg 环境）
    新增实时进度 + 真实帧数统计
    """
    temp_video_path = None
    video_capture = None
    opened_successfully = False
    ffmpeg_path = None

    try:
        input_file_str = str(input_file)

        # Load video directly
        # =======================================================
        # 阶段 1：尝试直接用 OpenCV 打开（90% 场景可秒开）
        # =======================================================
        video_capture = cv2.VideoCapture(input_file_str)
        if video_capture.isOpened():
            opened_successfully = True
        else:
            video_capture.release()
            logger.warning(
                f"Loading video failed. Trying temporary file workaround."
            )

            # --------------------------------------------------
            # 兜底：整段读内存 → 写临时文件 → 再 VideoCapture
            # 解决中文路径、网络路径、特殊编码导致打不开的问题
            # --------------------------------------------------
            try:
                with open(input_file, "rb") as f:
                    video_data = f.read()
                _, ext = osp.splitext(input_file)
                suffix = ext if ext else ".mp4"
                temp_file = tempfile.NamedTemporaryFile(
                    suffix=suffix, delete=False
                )
                temp_video_path = temp_file.name
                temp_file.write(video_data)
                temp_file.close()
                logger.debug(
                    f"Writing video data to temporary file: {temp_video_path}"
                )

                video_capture = cv2.VideoCapture(temp_video_path)
                if video_capture.isOpened():
                    opened_successfully = True
                else:
                    video_capture.release()
                    logger.error(
                        f"Failed to open video via temporary file: {temp_video_path}"
                    )
            except Exception as e:
                logger.error(f"Error during temporary file workaround: {e}")
                if video_capture:
                    video_capture.release()

        if not opened_successfully:
            popup = Popup(
                f"Failed to open video file: {osp.basename(input_file)}",
                self,
                icon=new_icon_path("warning", "svg"),
            )
            popup.show_popup(self, position="center")
            return None

        # --- Proceed with frame extraction settings ---
        # 我们不用下面这两行获取的总帧数和FPS
        # =======================================================
        # 阶段 2：拿"粗略信息"------仅当无 ffmpeg 时才真正依赖它们
        # =======================================================
        total_frames = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
        fps = video_capture.get(cv2.CAP_PROP_FPS)

        # Handle cases where fps might be 0 or invalid
        # fps<=0 时给一个默认 30，防止后面除零
        if not fps or fps <= 0:
            logger.warning(
                f"Invalid or zero FPS ({fps}) detected for video. Defaulting FPS to 30 for calculations."
            )
            fps = 30.0  # Assign a default FPS
        logger.info(
            f"Video opened: Total Frames ~{total_frames}, FPS ~{fps:.2f}"
        )

        # =======================================================
        # 新增：拿真实时长（毫秒），后面算"真实平均 fps"用
        # =======================================================
        total_us = _get_exact_duration_us(input_file_str)
        print(f"total_us:{total_us}")

        # =======================================================
        # 阶段 3：弹窗让用户选 interval / 前缀 / 序号位宽
        # =======================================================
        dialog = FrameExtractionDialog(self, total_frames, fps)
        if not dialog.exec_():
            logger.info(
                "Frame extraction cancelled by user in settings dialog."
            )
            # video_capture is released in the outer finally block
            return None

        interval, prefix, seq_len = dialog.get_values()
        os.makedirs(out_dir, exist_ok=True)

        # --- Check for ffmpeg ---
        # =======================================================
        # 阶段 4：检测系统是否装有 ffmpeg
        # =======================================================
        ffmpeg_path = shutil.which("ffmpeg")

        # Inner try: Handle the actual extraction (ffmpeg or OpenCV)
        # =======================================================
        # 阶段 5：抽帧主逻辑 ------ 优先 ffmpeg 分支
        # =======================================================
        try:
            if ffmpeg_path:
                logger.info(f"Detected ffmpeg for extraction: {ffmpeg_path}")
                # --- FFMPEG Path ---
                # VideoCapture is no longer needed once ffmpeg takes over
                # VideoCapture 使命完成，提前释放句柄
                if video_capture and video_capture.isOpened():
                    video_capture.release()

                # --------------------------------------------------
                # 进度条：范围 (0,0) 表示"不确定"，后面会动态 setValue
                # --------------------------------------------------
                progress_dialog = QProgressDialog(
                    self.tr("Extracting frames using ffmpeg..."),
                    self.tr("Cancel"),
                    0,
                    0,
                    self,  # Range (0,0) makes it indeterminate
                )
                progress_dialog.setWindowModality(Qt.WindowModal)
                progress_dialog.setWindowTitle(self.tr("Progress"))
                progress_dialog.setMinimumWidth(400)
                progress_dialog.setMinimumHeight(150)
                progress_dialog.setStyleSheet(
                    get_progress_dialog_style(color="#1d1d1f", height=20)
                )
                progress_dialog.show()
                QApplication.processEvents()  # Ensure dialog is displayed

                # --------------------------------------------------
                # 构造输出路径 & 目标 fps
                # --------------------------------------------------
                video_source_path = (
                    temp_video_path if temp_video_path else input_file_str
                )
                output_pattern = osp.join(out_dir, f"{prefix}%0{seq_len}d.jpg")
                output_fps = (
                    fps / interval if interval > 0 else fps
                )  # Avoid division by zero

                cmd = [
                    ffmpeg_path,
                    "-i",
                    video_source_path,
                    "-vf",
                    f"fps={output_fps}",
                    "-qscale:v",
                    "2",  # High quality JPEG
                    "-start_number",
                    "0",
                    output_pattern,
                ]
                logger.info(f"Running ffmpeg command: {' '.join(cmd)}")

                # [修改]
                # -----------  边抽边读真实帧数  -----------
                # ===================================================
                # 关键：让 ffmpeg 输出 frame= 1234 到 stderr，供实时解析
                # ===================================================
                import re
                _frame_re = re.compile(r"frame=\s*(\d+)")
                real_frame_count = 0
                # 让 ffmpeg 输出进度
                cmd = [
                    ffmpeg_path,
                    "-i", video_source_path,
                    "-vf", f"fps={output_fps}",
                    "-qscale:v", "2",                # JPEG 高质量
                    "-start_number", "0",
                    "-stats", "-v", "info",          # 关键：输出 frame= 到 stderr
                    output_pattern
                ]
                # ----------------------------------------
                # ffmpeg_failed = False
                # try:
                    # Using Popen for potential cancellation, though complex
                    # ......
                # if ffmpeg_failed:
                    # return None  # Indicate failure if ffmpeg path failed
                #  [修改] 将上面这段替换为下面：

                ffmpeg_failed = False
                try:
                    with subprocess.Popen(
                        cmd,
                        stdin=DEVNULL,
                        stdout=DEVNULL,  # ← 关键：不要让缓冲区堵死
                        stderr=subprocess.PIPE,
                        bufsize=1,
                        universal_newlines=True
                    ) as proc:
                        for line in proc.stderr:
                            # 实时日志（可选）
                            logger.debug("ffmpeg: " + line.rstrip())

                            # 实时抓取帧号
                            m_frame = _frame_re.search(line)
                            if m_frame:
                                real_frame_count = int(m.group(1))
                                # 更新进度条文字
                                progress_dialog.setLabelText(
                                    self.tr(f"Extracting frames ... 已解帧 {real_frame_count}")
                                )
                                progress_dialog.setValue(real_frame_count)
            
                                QApplication.processEvents()   # 保持 UI 响应
                                if progress_dialog.wasCanceled():
                                    proc.terminate()
                                try:
                                    proc.wait(timeout=2)
                                except subprocess.TimeoutExpired:
                                    proc.kill()
                                ffmpeg_failed = True
                                break

                        # retcode = proc.returncode
                        retcode = proc.wait()
                        progress_dialog.close()  # 进程结束立即关
                        
                        # --------------------------------------------------
                        # 抽帧成功：用真实帧数 & 真实时长算平均 fps
                        # --------------------------------------------------
                        if not ffmpeg_failed and retcode == 0:
                            logger.info(f"ffmpeg 抽帧完成，真实解码帧数：{real_frame_count}")

                            # 用真实数量直接赋值，不再 os.listdir 数文件
                            # 计算真实平均 FPS
                            fps_real = 1000000.0 * real_frame_count / total_us if total_ms else 25.0
                            logger.info(f"真实帧数：{real_frame_count}  总时长：{total_us} ms  平均 FPS：{fps_real:.5f}")
                            saved_frame_count = real_frame_count
                            
                            # 回写供外部调用
                            self.total_frames = real_frame_count
                            self.fps = fps_real
                            self.total_us = total_us

                            # 保存
                            # 落盘 meta json，后续训练/标注直接读
                            meta = {
                                "total_frames": real_frame_count,
                                "fps": fps_real,
                                "total_us": total_us
                                "source_video": str(video_source_path),
                                "extract_stride": interval,
                            }
                            print(f"meta:{meta}")
                            save_meta_path = (Path(os.path.splitext(video_source_path)[0]) / "video_meta.json")
                            print(f"save_meta_path:{save_meta_path}")
                            save_meta_path.write_text(
                                json.dumps(meta, indent=2), encoding="utf-8"
                            )
                            logger.info(f"ffmpeg extracted {saved_frame_count} frames to {out_dir}")
                        else:
                            logger.error(f"ffmpeg 退出码={retcode}")
                            stderr_all = proc.stderr.read()  # 之前是逐行，这里再读剩余
                            logger.error(f"ffmpeg stderr:\n{stderr_all}")
                            ffmpeg_failed = True

                except FileNotFoundError:
                    logger.error(f"ffmpeg not found: {ffmpeg_path}")
                    popup = Popup(self.tr("ffmpeg not found."), self, icon=new_icon_path("error", "svg"))
                    popup.show_popup(self, position="center")
                    ffmpeg_failed = True
                except Exception as e:
                    logger.exception(f"Error running ffmpeg: {e}")
                    popup = Popup(f"{self.tr('Error running ffmpeg')}: {e}", self, icon=new_icon_path("error", "svg"))
                    popup.show_popup(self, position="center")
                    ffmpeg_failed = True

                # ----------------------------------------

            # =======================================================
            # 阶段 6：退化分支 ------ 无 ffmpeg 时用 OpenCV 逐帧读
            # =======================================================
            else:  # if not ffmpeg_path
                logger.info("ffmpeg not found. Using OpenCV for extraction.")
                # --- OpenCV Path ---
                estimated_frames = (
                    (total_frames + interval - 1) // interval
                    if total_frames > 0 and interval > 0
                    else 0
                )
                progress_dialog = QProgressDialog(
                    self.tr("Extracting frames (OpenCV)... Please wait..."),
                    self.tr("Cancel"),
                    0,
                    estimated_frames,
                    self,
                )
                progress_dialog.setWindowModality(Qt.WindowModal)
                progress_dialog.setWindowTitle(self.tr("Progress"))
                progress_dialog.setMinimumWidth(400)
                progress_dialog.setMinimumHeight(150)
                progress_dialog.setStyleSheet(
                    get_progress_dialog_style(color="#1d1d1f", height=20)
                )
                progress_dialog.setValue(0)
                progress_dialog.show()

                frame_count = 0
                saved_frame_count = 0
                extraction_cancelled = False
                while True:
                    if progress_dialog.wasCanceled():
                        logger.info(
                            "Frame extraction cancelled by user (OpenCV)."
                        )
                        extraction_cancelled = True
                        break

                    if not video_capture.isOpened():
                        logger.warning(
                            "Video capture became unopened during OpenCV processing."
                        )
                        break

                    ret, frame = video_capture.read()
                    if not ret:
                        break

                    if frame_count % interval == 0:
                        frame_filename = osp.join(
                            out_dir,
                            f"{prefix}{str(saved_frame_count).zfill(seq_len)}.jpg",
                        )
                        try:
                            write_success = cv2.imwrite(frame_filename, frame)
                            if not write_success:
                                logger.error(
                                    f"Failed to write frame: {frame_filename}"
                                )
                        except Exception as e:
                            logger.error(
                                f"Error writing frame {frame_filename}: {e}"
                            )

                        saved_frame_count += 1
                        progress_dialog.setValue(saved_frame_count)

                    frame_count += 1
                    QApplication.processEvents()  # Keep UI responsive

                progress_dialog.close()

                if extraction_cancelled:
                    logger.warning(
                        f"Extraction cancelled. Frames saved so far (OpenCV): {saved_frame_count}"
                    )
                    # Decide if cancellation is an error or partial success. Currently returns out_dir.
                else:
                    logger.info(
                        f"OpenCV extraction finished. Saved frames: {saved_frame_count}"
                    )

            # --- Common success return (after ffmpeg or OpenCV) ---
            # =======================================================
            # 阶段 7：统一返回 out_dir（成功）或 None（失败）
            # =======================================================
            return out_dir

        # Except block for the *inner* try (extraction phase: ffmpeg or OpenCV)
        # 内部异常：抽帧阶段（ffmpeg/OpenCV）出错
        except Exception as extraction_e:
            logger.exception(
                f"An unexpected error occurred during frame extraction logic: {extraction_e}"
            )
            popup = Popup(
                f"An unexpected error occurred during extraction: {extraction_e}",
                self,
                icon=new_icon_path("warning", "svg"),
            )
            popup.show_popup(self, position="center")
            return None  # Indicate failure of extraction phase

    # Except block for the *outer* try (opening/setup phase)
    # 外部异常：打开视频失败 /  setup 阶段出错
    except Exception as opening_e:
        logger.exception(
            f"An unexpected error occurred during video opening/setup: {opening_e}"
        )
        # Use Popup instead of QMessageBox
        popup = Popup(
            f"An error occurred during setup: {opening_e}",
            self,
            icon=new_icon_path("error", "svg"),
        )
        popup.show_popup(self, position="center")
        return None  # Indicate failure

    # Finally block for the *outer* try (always runs)
    # ===========================================================
    # 兜底：无论成功失败，必执行的清理工作
    # ===========================================================
    finally:
        # Release capture if it exists and is opened (mainly for OpenCV path or if ffmpeg failed early)
        # 释放 VideoCapture
        if video_capture is not None and video_capture.isOpened():
            logger.info("Releasing video capture resource.")
            video_capture.release()
        # Clean up the temporary file if created
         # 删除临时文件
        if temp_video_path and osp.exists(temp_video_path):
            try:
                logger.debug(
                    f"Removing temporary video file: {temp_video_path}"
                )
                os.remove(temp_video_path)
            except OSError as e:
                logger.error(
                    f"Error removing temporary file {temp_video_path}: {e}"
                )

问题：如何理解上面提取视频帧的的代码？

下面把这段代码拆成"它到底在干什么 "和"为什么要这么干"两部分讲，读完你就能一眼看出作者的意图，也能知道以后该在哪块地方做二次开发。

一、总目标

给 AnyLabeling 的"视频抽帧"功能做一次彻底的升级，解决 3 个老痛点：

OpenCV 抽帧慢 → 优先用 ffmpeg 硬件加速。

cv2.CAP_PROP_FRAME_COUNT 不准 （尤其是 VFR 视频）→ 用 ffprobe 先拿"真实时长 "，再用 ffmpeg 的实时 frame= 日志拿"真实帧数"，不再信 cv2 的鬼话。

抽帧过程黑屏 → 弹出 QProgressDialog，实时显示已解码帧数并允许用户 Cancel。

二、核心新增函数
复制代码
def _get_exact_duration_ms(path: str) -> int:
调 ffprobe -of json -show_entries format=duration

返回毫秒级精准时长（后面算"真实平均 fps"要用）。

三、主流程 extract_frames_from_video() 逐段解释

阶段关键代码/变量目的/技巧

1. 打开视频 cv2.VideoCapture 先裸开 90% 场景能直接打开，省事。

失败就整段读入内存 → 写临时文件 再 VideoCapture 解决"中文路径 / 网络路径 / 特殊编码"导致 OpenCV 打不开的问题。

2. 拿基础信息 total_frames = CAP_PROP_FRAME_COUNT fps = CAP_PROP_FPS 仅当没有 ffmpeg 时用它们做估算；有 ffmpeg 就只拿 fps 当参考。这两句已经在修正代码中被废弃

3. 拿"真实时长" total_ms = _get_exact_duration_ms(...) 后面算真实平均 fps 的分子。

4. 用户对话框 FrameExtractionDialog 让用户选： ① 每隔几帧抽 1 帧（interval） ② 文件名前缀、数字位宽。

5. 分支：优先 ffmpeg shutil.which("ffmpeg") 机器装过 ffmpeg 就走高速分支，否则退化到 OpenCV 逐帧循环。

四、ffmpeg 分支的 3 个"隐藏大招"

技巧代码片段说明

1. 实时进度 ffmpeg -stats -v info 把 frame= 1234 打到 stderr 用正则 frame=\s*(\d+) 每抓到一次就刷新进度条文字。

2. 不再 os.listdir 数文件 real_frame_count 直接拿最后一行帧号省掉一次磁盘遍历，秒级完成；网络盘尤其明显。

3. 真实平均 fps fps_real = 1000 * real_frame_count / total_ms 对 VFR 视频更准，后续标注、播放倍速都按这个值。

注：

输出图案 prefix%05d.jpg 保证序号对齐。

-qscale:v 2 ≈ JPEG 质量 95%，视觉无损。

用户点 Cancel → proc.terminate() → wait(timeout=2) → 超时再 kill()，防止僵尸进程。

五、OpenCV 退化分支

就是老式的 while video_capture.read() 循环，每隔 interval 帧写一次图。

进度条最大值为 estimated_frames = (total_frames+interval-1)//interval

（因为此时没有真实帧数，只能估算）。

六、统一收尾

finally 块保证：

-- video_capture.release()

-- 临时文件 temp_video_path 存在就删。

无论哪条分支，成功都返回 out_dir ；任何异常都弹 Popup 并返回 None。

七、产出的"副产品"

ffmpeg 分支跑完后，会在源视频同目录 下写一份
video_meta.json：
python 复制代码
{
  "total_frames": 17820,
  "fps": 24.954697840900497,
  "total_us": 714094000,
  "source_video": "/path/to/video.mp4",
  "extract_stride": 1
}
后续标注、训练、可视化都能直接读，不再重复 ffprobe。

八、一句话总结

这段代码就是"先用 ffprobe 拿时长，再用 ffmpeg 实时日志拿帧数，边抽边刷新进度条 "的完整工程实现，彻底解决旧版 OpenCV 抽帧"慢、不准、假死"的三大顽疾。

阶段	关键代码/变量	目的/技巧
1. 打开视频	`cv2.VideoCapture` 先裸开	90% 场景能直接打开，省事。
	失败就整段读入内存 → 写临时文件再 `VideoCapture`	解决"中文路径 / 网络路径 / 特殊编码"导致 OpenCV 打不开的问题。
2. 拿基础信息	`total_frames = CAP_PROP_FRAME_COUNT` `fps = CAP_PROP_FPS`	仅当没有 ffmpeg 时用它们做估算；有 ffmpeg 就只拿 fps 当参考。这两句已经在修正代码中被废弃
3. 拿"真实时长"	`total_ms = _get_exact_duration_ms(...)`	后面算真实平均 fps 的分子。
4. 用户对话框	`FrameExtractionDialog`	让用户选： ① 每隔几帧抽 1 帧（interval） ② 文件名前缀、数字位宽。
5. 分支：优先 ffmpeg	`shutil.which("ffmpeg")`	机器装过 ffmpeg 就走高速分支，否则退化到 OpenCV 逐帧循环。

技巧	代码片段	说明
1. 实时进度	`ffmpeg -stats -v info` 把 `frame= 1234` 打到 stderr	用正则 `frame=\s*(\d+)` 每抓到一次就刷新进度条文字。
2. 不再 `os.listdir` 数文件	`real_frame_count` 直接拿最后一行帧号	省掉一次磁盘遍历，秒级完成；网络盘尤其明显。
3. 真实平均 fps	`fps_real = 1000 * real_frame_count / total_ms`	对 VFR 视频更准，后续标注、播放倍速都按这个值。

在anylabeling\views\labeling\label_widget.py中，我们要对之前video.py中视频转图片所生成的video_meta.json进行解析，然后保存到实例，后面生成json的时候就可以用了。

为什么我们要在import_image_folder中判断是否存在video_meta.json，如果存在则解析出视频信息并保存到实例，如果不能存在则使用一些默认值。其实是因为import_image_folder是用户在"选择打开的图片文件夹"时会被调用的函数，这时候我们不确定用户打开的图片文件夹是不是由视频转换而成的，所以必须存在video_meta.json以将视频转换成的图片文件夹跟普通的图片文件夹进行区分。实际上对于普通的图片文件夹，我们是没办法进行跟踪的，因为图片都不连续，根本跟踪不了，而且也没有视频信息，也写入不了.aois文件。

python 复制代码

class LabelingWidget(LabelDialog):
    def __init__(  # noqa: C901
        self,
        parent=None,
        config=None,
        filename=None,
        output=None,
        output_file=None,
        output_dir=None,
    ):
        self.total_frames = 0
        self.fps = 25
        self.total_us = 0

    ......

    def import_image_folder(self, dirpath, pattern=None, load=True):
        meta_file = Path(dirpath) / "video_meta.json"
        print(f"meta_file:{meta_file}")

        # 1.如果文件夹里自带元信息，直接读
        if meta_file.exists():
            meta = json.loads(meta_file.read_text(encoding="utf-8"))
        else:
            # 2.纯图片文件夹，全部置空
            self.total_frames = len([
                f for f in os.listdir(dirpath)
                if f.loweer().endswith(('.jpg', '.png'))
            ])
            self.total_us = self.total_frames / self.fps * 1000000
            meta = {"total_frame": self.total_frames, "fps": self.fps, "total_us": self.total_us, "source_video": None, "extract_stride": 1}
            print("未发现文件夹里面存在video_meta.json文件，属于普通图片文件夹，不是视频帧文件夹，使用图片数作为总帧数，fps设为25")
            print(f"meta:{meta}")

        # 3.保存到实例，后面 save_labels 就能用
        self.fps = meta.get("fps")
        self.total_fraames = meta.get("total_frames")
        self.total_us = meta.get("total_us")

        ......

问题：为什么要把视频信息从video.py解析视频后生成到video_meta.json然后又保存到LabelingWidget中？

（1）因为video.py解析出的视频信息必须想办法传递到保存json的那个函数中去，所以必须先保存到video_meta.json文件中。

（2）然后才能在anylabeling\views\labeling\label_widget.py的import_image_folder跨文件读取video_meta.json，读取出来后再保存到LabelingWidget中

（3）而LabelingWidget其实有个save_labels的函数，这个函数里面用label_file = LabelFile()创建了LabelFile实例，并且调用了LabelFile的save函数，而LabelFile的save函数正是生成json的函数，我们只要在这里把视频信息传参进去就可以了。

python 复制代码

class LabelingWidget(LabelDialog):
    def __init__(  # noqa: C901
        self,
        parent=None,
        config=None,
        filename=None,
        output=None,
        output_file=None,
        output_dir=None,
    ):
        self.total_frames = 0
        self.fps = 25
        self.total_us = 0

    ......

    def save_labels(self, filename):
        # 注意这里，创建实例
        label_file = LabelFile()
        # Get current shapes
        # Excluding auto labeling special shapes
        shapes = [
            item.shape().to_dict()
            for item in self.label_list
            if item.shape().label
            not in [
                AutoLabelingMode.OBJECT,
                AutoLabelingMode.ADD,
                AutoLabelingMode.REMOVE,
            ]
        ]
        flags = {}
        for i in range(self.flag_widget.count()):
            item = self.flag_widget.item(i)
            key = item.text()
            flag = item.checkState() == Qt.Checked
            flags[key] = flag
        try:
            image_path = osp.relpath(self.image_path, osp.dirname(filename))
            image_data = (
                self.image_data if self._config["store_data"] else None
            )
            if osp.dirname(filename) and not osp.exists(osp.dirname(filename)):
                os.makedirs(osp.dirname(filename))

            label_file.save(
                filename=filename,
                shapes=shapes,
                image_path=image_path,
                image_data=image_data,
                image_height=self.image.height(),
                image_width=self.image.width(),
                other_data=self.other_data,
                flags=flags,
                # [新增]
                total_frames=self.total_frames,
                fps=self.fps,
                total_us=self.total_us
            )
            self.label_file = label_file
            items = self.file_list_widget.findItems(
                self.image_path, Qt.MatchExactly
            )
            if len(items) > 0:
                if len(items) != 1:
                    raise RuntimeError("There are duplicate files.")
                items[0].setCheckState(Qt.Checked)
            # disable allows next and previous image to proceed
            # self.filename = filename
            return True
        except LabelFileError as e:
            self.error_message(
                self.tr("Error saving label data"), self.tr("<b>%s</b>") % e
            )
            return False

5、json转.aois

在anylabeling\views\labeling\label_file.py中，找到save函数，我们保留原来生成json的逻辑，同时也生成.aois文件。其实我们根据总帧数total_frames计算出了fps，实际上已经不需要传入total_frames了，但是这里还是传了，做个备用。

python 复制代码

class LabelFile:
    def __init__(self, filename=None, image_dir=None):
        ......
    
    def save(
        self,
        filename=None,
        shapes=None,
        image_path=None,
        image_height=None,
        image_width=None,
        image_data=None,
        other_data=None,
        flags=None,
        # [新增]
        total_frames=None,
        fps=Nonone,
        total_us=None,
    ):
        if image_data is not None:
            image_data = base64.b64encode(image_data).decode("utf-8")
            image_height, image_width = self._check_image_height_and_width(
                image_data, image_height, image_width
            )

        if other_data is None:
            other__data = {}
        if flags is None:
            flags = {}

        is_active = True
        num_active = 0
        for i, shape in enumerate(shapes):
            if shape["shape_type"] == "rectangle" or shape["shape_type"] == "polygon" or shape["shape_type"] == "rotation":
                sorted_box = LabelConverter.calculate_bounding_box(
                    shape["points"]
                )
                xmin, ymin, xmax, ymax = sorted_box
                shape["points"] = [
                    [xmin, ymmin],
                    [xmax, ymin],
                    [xmax, ymax],
                    [xmin, ymax],
                ]
                shapes[i] = shape
                if shape.get("flags", {}).get("is_ghost") is True:
                    print("是 ghost")
                    num_active += 1
        if num_active > 0:
            is_active = False
        data = {
            "version": __version__,
            "flags": flags,
            "shapes": shapes,
            "imagePath": image_path,
            "imageData": image_data,
            "imageHeight": image_height,
            "imageWidth": image_width,
        }

        for key, value in other_data.items():
            assert key not in data
            data[key] = value
        try:
            with utils.io_open(filename, "w") as f:
                json.dump(data, f, ensure_ascii=False, indent=2)
                logger.debug(f"[label_file.py save] json.dump() filename:{filename} shapes:{shapes}")
            self.filename = filename
        except Exception as e:  # noqa
            raise LabelFileError(e) from e

        # 写入aois文件
        frame_num = self.extract_frame_number(filename)
        if frame_num is not None and shapes:
             print(f"[label_file.py][save] frame_num:{frame_num}")
             print(f"[label_file.py][save] fps:{fps}")
             seconds = frame_num / fps
             vertices = [{'X': p[0], 'Y': p[1]} for p in shapes[0]['points']]
             print(f"[label_file.py][save] num_active:{num_active}")
             print(f"[label_file.py][save] is_active:{is_active}")
             AOIS_KEY_FRAMES.append({
                "IsActive": is_active,
                "Seconds": round(seconds, 6),
                "Vertices": vertices
             })

             aois_file = {
                "Version": 2,
                "Tags": [],
                "Media": {
                    "MediaType": 1,
                    "Height": image_height,
                    "Width": image_width,
                    "MediaCount": 1,
                    # "DurationMicroseconds": int(total_frames / fps * 1e6)
                    "DurationMicroseconds": total_us
                },
                "Aois": [{
                    "Name": shapes[0]..get('label', 'target'),
                    "Red": 212,
                    "Green": 0,
                    "Blue": 255,
                    "Tags": [],
                    "KeyFrames": AOIS_KEY_FRAMES
                }]
            }
            aois_path = os.path.join(os.path.dirname(filename), "output.aois")
            with open(aois_path, "w", encoding="utf-8") as af:
                json.dump(aois_file, af, indent=2, ensure_ascii=False)
            print(f"AOIs 已更新: {aois_path}")

到这里，我们已经实现了生成.aois的代码修改。运行主程序python anylabeling/app.py然后将视频转图片文件夹生成video_meta.json，再对图片文件夹进行标注和跟踪，就会发现图片文件夹下面不但生成了json文件，还生成了output.aois文件，而这个.aois文件是可以直接导入到软件中的。经过实际的测试，导入是成功的，并且帧是同步的，也就代表我们写入的DurationMicroseconds和Seconds是正确的，值得一提的是，我们还在目标消失的时将is_active = False。

6、如何绘制目标残影？

关于为什么上面有这么一段代码：

if shape.get("flags", {}).get("is_ghost") is True:

print("是 ghost")

num_active += 1

上面的意思其实就是如果json里面的shapes里面的矩形框的"flags"不是空，而是：

"flags": {

"is_ghost": true

},

说明这一帧其实是没有目标的，但是之所以还有目标框坐标是因为它是"目标残影"，我们在《基于SAM2的眼动数据跟踪2》中的《3、如何理解.aois文件里面的内容？帧不同步？》提到了目标消失了以后，如果我们不写入json则会被软件自动渲染为目标框最后一次出现的位置，即渲染"目标残影"，所以我们与其被动地被软件渲染这个"目标残影"，不如我们主动将目标框最后一次出现的位置记录进json，然后将is_active = False。

在anylabeling\services\auto_labeling\segment_anything_2_video.py中，我们先说一个题外话，就是我们把这个文件里面导入torch和导入sem2模型的try分开来，并且使用traceback.print_exc()打印错误跟踪栈。这样修改的是因为我之前报错的时候发现，缺少torch和缺少SAM2居然是报同一个错，而且还不显示真实的错误栈，导致一时间不知道是sam2模型导入失败还是torch导入失败。

复制代码

try:
    import torch
    from sam2.build_sam import build_sam2, build_sam2_camera_predictor
    from sam2.sam2_image_predictor import SAM2ImagePredictor

    SAM2_VIDEO_AVAILABLE = True
except ImportError:
    SAM2_VIDEO_AVAILABLE = False

[修改]
import traceback
try:
    import torch
    TORCH_AVAILABLE = True
except ImportError:
    traceback.print_exc()
    TORCH_AVAILABLE = False

try:
    from sam2.build_sam import build_sam2, build_sam2_camera_predictor
    from sam2.sam2_image_predictor import SAM2ImagePredictor
    SAM2_VIDEO_AVAILABLE = True
except ImportError:
    traceback.print_exc()
    SAM2_VIDEO_AVAILABLE = False

class SegmentAnything2Video(Model):
    def __init__(self, config_path, on_message) -> None:
        ......
        # [新增]
        if not TORCH_AVAILABLE:
            message = "torch will not be available."
            raise ImportError(message)

        if not SAM2_VIDEO_AVAILABLE:
            message = "SegmentAnything2Video model will not be available. Please install related packages and try again."
            raise ImportError(message)

回到正题，在anylabeling\services\auto_labeling\segment_anything_2_video.py中，找到后处理函数：

复制代码

    def post_process(self, masks, index=None):
        """Post-process the masks produced by the model.

        Args:
            masks (np.array): The masks to post-process.
            index (int, optional): The index of the mask. Defaults to None.

        Returns:
            list: A list of Shape objects representing the masks.
        """
        # Convert masks to binary format
        masks[masks > 0.0] = 255
        masks[masks <= 0.0] = 0
        masks = masks.astype(np.uint8)

        # Find contours of the masks
        contours, _ = cv2.findContours(
            masks, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE
        )

        # [新增] 如果contours为空且keep_ghost=True，则代表目标不存在->标记残影
        if not contours:
            if self.keep_ghost and obj_id in self._last_valid_boxes:
                logger.debug(f"[Ghost] obj {obj_id} disappeared, return cached")
                ghost_shapes = [s.copy() for s in self._last_valid_boxes[obj_id]]
                for shape in ghost_shapes:
                    shape.flags["is_ghost"] = True  # 添加残影标记
                return ghost_shapes
            return []

        # Refine and filter contours
        approx_contours = []
        for contour in contours:
            # Approximate contour using configurable epsilon
            epsilon = self.epsilon * cv2.arcLength(contour, True)
            approx = cv2.approxPolyDP(contour, epsilon, True)
            [新增] 如果小于某个面积，则过滤掉，即认为目标不存在
            area = cv2.contourArea(approx)
            logger.debug(f"[Debug] area: {area}")
            if area < self.min_area_pixel:
                logger.debug(f"[Debug] area: {obj_id} < {self.min_area_pixel}")
                continue
        
            approx_contours.append(approx)

        # [新增] 过滤后为空 -> 残影
        if not approx_contours:
            if self.keep_ghost and obj_id in self._last_valid_boxes:
                logger.debug(f"[Ghost] obj {obj_id} below area, return cached")
                ghost_shapes = [s.copy() for s in self._last_valid_boxes[obj_id]]
                for shape in ghost_shapes:
                    shape.flags["is_ghost"] = True  # 添加残影标记
                return ghost_shapes
            return []

        # Remove large contours (likely background)
        #if len(approx_contours) > 1:
        #    image_size = masks.shape[0] * masks.shape[1]
        #    areas = [cv2.contourArea(contour) for contour in approx_contours]
        #    filtered_approx_contours = [
        #        contour
        #        for contour, area in zip(approx_contours, areas)
        #        if area < image_size * 0.9
        #    ]

        # Remove small contours (likely noise)
        #if len(approx_contours) > 1:
        #    areas = [cv2.contourArea(contour) for contour in approx_contours]
        #    avg_area = np.mean(areas)
        #
        #    filtered_approx_contours = [
        #        contour
        #        for contour, area in zip(approx_contours, areas)
        #        if area > avg_area * 0.2
        #    ]
        #   approx_contours = filtered_approx_contours

        if len(approx_contours) < 1:
            return []

        # Convert contours to shapes
        shapes = []
        if self.output_mode == "polygon":
            for approx in approx_contours:
                # Scale points
                points = approx.reshape(-1, 2)
                points[:, 0] = points[:, 0]
                points[:, 1] = points[:, 1]
                points = points.tolist()
                if len(points) < 3:
                    continue
                points.append(points[0])
                shape = Shape(flags={})
                for point in points:
                    point[0] = int(point[0])
                    point[1] = int(point[1])
                    shape.add_point(QtCore.QPointF(point[0], point[1]))
                # Create Polygon shape
                shape.shape_type = "polygon"
                shape.group_id = (
                    self.group_ids[index] if index is not None else None
                )
                shape.closed = True
                shape.label = (
                    "AUTOLABEL_OBJECT" if index is None else self.labels[index]
                )
                shape.selected = False
                shapes.append(shape)
        elif self.output_mode == "rectangle":
            x_min = 100000000
            y_min = 100000000
            x_max = 0
            y_max = 0
            for approx in approx_contours:
                points = approx.reshape(-1, 2)
                points[:, 0] = points[:, 0]
                points[:, 1] = points[:, 1]
                points = points.tolist()
                if len(points) < 3:
                    continue

                for point in points:
                    x_min = min(x_min, point[0])
                    y_min = min(y_min, point[1])
                    x_max = max(x_max, point[0])
                    y_max = max(y_max, point[1])

            shape = Shape(flags={})
            shape.add_point(QtCore.QPointF(x_min, y_min))
            shape.add_point(QtCore.QPointF(x_max, y_min))
            shape.add_point(QtCore.QPointF(x_max, y_max))
            shape.add_point(QtCore.QPointF(x_min, y_max))
            shape.shape_type = "rectangle"
            shape.closed = True
            shape.group_id = (
                self.group_ids[index] if index is not None else None
            )
            shape.fill_color = "#000000"
            shape.line_color = "#000000"
            shape.label = (
                "AUTOLABEL_OBJECT" if index is None else self.labels[index]
            )
            shape.selected = False
            shapes.append(shape)
        elif self.output_mode == "rotation":
            shape = Shape(flags={})
            rotation_box = get_bounding_boxes(approx_contours[0])[1]
            for point in rotation_box:
                shape.add_point(QtCore.QPointF(int(point[0]), int(point[1])))
            shape.direction = calculate_rotation_theta(rotation_box)
            shape.shape_type = self.output_mode
            shape.closed = True
            shape.fill_color = "#000000"
            shape.line_color = "#000000"
            shape.label = (
                "AUTOLABEL_OBJECT" if index is None else self.labels[index]
            )
            shape.selected = False
            shapes.append(shape)
        
        # [新增] 更新残影缓存
        if self.keep_ghost:
            self._last_valid_boxes[obj_id] = shapes
        return shapes

我们需要在类初始化中添加：

复制代码

class SegmentAnything2Video(Model):
     def __init__(self, config_path, on_message) -> None:
        ......
        self.min_area_pixel = int(self.config.get("min_area_pixel", 0))  # 新增
        self.keep_ghost = bool(self.config.get("keep_ghost    # 新增
        self._last_valid_boxes = {}  # obj_id -> last Shape list

在anylabeling\configs\auto_labeling\sam2_hiera_large_video.yaml中，

复制代码

type: segment_anything_2_video
name: sam2_hiera_large_video-r20240901
......
min_area_pixel: 0   # [新增] 小于 0 像素就当没出现
keep_ghost: True    # [新增] 允许残影

上面修改了这些代码，目的是为了什么呢？其实目的就是为了在目标消失的时候也写入json，然后json里面的shape的flags标记is_ghost:True，原本目标消失是shapes是为空的，而现在目标消失的帧也有目标框位置，这个位置就是它上一次最后出现的位置，这是为了适应软件那种渲染方式，使得目标消失的时候根据flags标记is_ghost:True来标记.aois中的"IsActive"置为false。这在本文的第3点已经解释过了。至于min_area_pixel，其实这个面积过滤没什么用，一般来说都是设置为0，也就是没有面积的时候认为是目标消失。

7、json2aois.py

最后，我们还想再写一个独立于软件的json转.aois脚本，以作为一个工具脚本。

扫描给定的"图片文件夹"，按文件名自然排序后得到总帧数；
用 ffprobe 读取视频（或任意媒体文件）的真实时长（μs）；
按公式 FPS = 1000000 * 总帧数 / 时长_μs 计算真实帧率；
逐个读取与图片同名（扩展名换成 .json）的 label 文件，把其中所有 shape 按脚本里给出的逻辑转换成 AOIs KeyFrames；
最终在该文件夹下生成一个 output.aois（如果已存在则覆盖）。

python 复制代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
json2aois.py  把帧文件夹里每帧 json 转成一个 output.aois
生成路径固定落在图片文件夹内；优先用 video_meta.json，没有再解析视频并生成。
"""
import json
import subprocess
from pathlib import Path
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
from collections import defaultdict
from typing import List, Dict, Any, Tuple

try:
    from PIL import Image
except ImportError:
    raise SystemExit("pip install pillow")

IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.tif'}
META_FILE = "video_meta.json"          # 保存在图片文件夹内


# ---------- 工具 ----------
def natural_sort_key(name: str):
    import re
    return [int(t) if t.isdigit() else t.lower() for t in re.split(r'(\d+)', name)]


def get_first_frame_size(img_list: List[Path]) -> Tuple[int, int]:
    if not img_list:
        return 0, 0
    with Image.open(img_list[0]) as im:
        return im.width, im.height


def _get_duration_us(video: Path) -> int:
    cmd = ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "json", str(video)]
    try:
        dur = float(json.loads(subprocess.check_output(cmd))["format"]["duration"])
    except Exception as e:
        raise RuntimeError(f"ffprobe 失败: {e}") from None
    return int(dur * 1_000_000)


def load_or_gen_meta(frames_dir: Path, ref_video: Path) -> Tuple[float, int, int]:
    """返回 fps, total_us, total_frames"""
    meta_path = frames_dir / META_FILE
    if meta_path.exists():
        data = json.loads(meta_path.read_text(encoding="utf-8"))
        print(f"[INFO] 使用已有 {META_FILE}")
        return data["fps"], data["total_us"], data["total_frames"]

    # 扫描图片
    img_list = sorted(
        [p for p in frames_dir.iterdir() if p.suffix.lower() in IMAGE_EXTS],
        key=lambda x: natural_sort_key(x.name)
    )
    if not img_list:
        raise RuntimeError("未找到任何图片")
    total_frames = len(img_list)

    # 解析视频
    total_us = _get_duration_us(ref_video)
    fps = 1_000_000 * total_frames / total_us

    # 写回 meta
    meta = {
        "fps": fps, 
        "total_us": total_us, 
        "total_frames": total_frames
        "source_video": str(ref_video),   
        "extract_stride": 1  
    }
    meta_path.write_text(json.dumps(meta, indent=2), encoding="utf-8")
    print(f"[INFO] 已生成 {META_FILE}")
    return fps, total_us, total_frames


# ---------- 核心逻辑 ----------
class Json2Aois:
    def __init__(self, frames_dir: Path, ref_video: Path):
        self.frames_dir = frames_dir.resolve()
        if not self.frames_dir.is_dir():
            raise ValueError(f"帧文件夹不存在: {self.frames_dir}")
        self.ref_video = ref_video.resolve()
        if not self.ref_video.is_file():
            raise ValueError(f"参考视频不存在: {self.ref_video}")

        # 获取 fps / 时长 / 帧数
        self.fps, self.total_us, self.total_frames = load_or_gen_meta(self.frames_dir, self.ref_video)
        print(f"[INFO] total_frames={self.total_frames}  total_us={self.total_us:,}  fps={self.fps:.6f}")

        # 图片列表
        self.img_list = sorted(
            [p for p in self.frames_dir.iterdir() if p.suffix.lower() in IMAGE_EXTS],
            key=lambda x: natural_sort_key(x.name)
        )

        self._keyframes = defaultdict(list)

    # 单帧
    def _parse_one_json(self, json_path: Path, frame_idx: int):
        if not json_path.exists():
            return
        data = json.loads(json_path.read_text(encoding="utf-8"))
        shapes = data.get("shapes", [])
        seconds = frame_idx / self.fps
        for shape in shapes:
            if shape.get("shape_type") not in {"rectangle", "polygon", "rotation"}:
                continue
            pts = shape.get("points", [])
            if len(pts) < 3:
                continue
            xs, ys = zip(*pts)
            xmin, xmax, ymin, ymax = min(xs), max(xs), min(ys), max(ys)
            vertices = [{"X": float(x), "Y": float(y)} for x, y in
                        [(xmin, ymin), (xmax, ymin), (xmax, ymax), (xmin, ymax)]]
            is_ghost = shape.get("flags", {}).get("is_ghost", False)
            label = shape.get("label") or "target"
            self._keyframes[label].append({
                "IsActive": not is_ghost,
                "Seconds": round(seconds, 6),
                "Vertices": vertices
            })

    # 主入口
    def run(self, out_name: str):
        # 逐帧
        for idx, img_path in enumerate(self.img_list, start=0):
            self._parse_one_json(img_path.with_suffix('.json'), idx)

        first_w, first_h = get_first_frame_size(self.img_list)

        aois = [
            {
                "Name": label,
                "Red": 212,
                "Green": 0,
                "Blue": 255,
                "Tags": [],
                "KeyFrames": kfs,
            }
            for label, kfs in self._keyframes.items()
        ]

        aois_file = {
            "Version": 2,
            "Tags": [],
            "Media": {
                "MediaType": 1,
                "Height": first_h,
                "Width": first_w,
                "MediaCount": 1,
                "DurationMicroseconds": self.total_us,
            },
            "Aois": aois,
        }

        # 输出路径强制落在图片文件夹
        out_path = self.frames_dir / out_name
        out_path.write_text(json.dumps(aois_file, ensure_ascii=False, indent=2), encoding="utf-8")
        print(f"[INFO] 已生成 {out_path}")


# ---------- CLI ----------
def main():
    parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
    parser.add_argument("--frames", help="图片+json 所在文件夹", default=r"E:\train\data\2025eye_track\pad")
    parser.add_argument("-v", "--video", help="参考视频路径", default=r"E:\train\data\2025eye_track\pad.mp4", )
    parser.add_argument("-o", "--output", default="output.aois", help="输出文件名（不含路径）")
    args = parser.parse_args()

    frames_dir = Path(args.frames)
    ref_video = Path(args.video)
    Json2Aois(frames_dir, ref_video).run(args.output)


if __name__ == "__main__":
    main()

有了独立的json转.aois脚本，只要我们有json，也能不经过软件去转.aois了，这是个备用脚本。

二、总结

到此，我们已经能将目标跟踪的结果即json文件，转为.aois文件。接下来，我们将考虑如何将整个python程序转为exe程序。