YOLO + DeepSort 的视频目标检测与跟踪全解析

在看过一些"能识别视频中每一个物体并持续跟踪"的演示视频后，你可能会以为背后是一套极其复杂的系统。其实，利用 YOLO 模型配合 DeepSort 算法，就能实现无缝的目标检测与跟踪。

如果你不想花时间在环境配置、模型下载、依赖安装这些"开场的麻烦事"上，可以直接用 Coovally 平台------里面已经内置了 YOLOv3/v4/v5/v7/v8/11、Faster R-CNN、RetinaNet、DETR、DeepSort、Mask R-CNN 等主流与前沿检测、跟踪模型，一键加载、自由组合。

YOLO 与 DeepSort 如何协作？

YOLO 作为 检测器，逐帧运行，输出每个检测目标的边界框位置、类别和置信度。
DeepSort 作为 跟踪器，接收 YOLO 的检测结果，并结合历史跟踪信息，为每个目标分配唯一 ID，实现跨帧跟踪。

每个跟踪轨迹包含：

预测边界框（由卡尔曼滤波器预测）
唯一轨迹 ID
运动模型（Kalman Filter）
外观特征向量（embedding）

由于 YOLO 已经被广泛熟知，下面的重点放在 DeepSort 上。

DeepSort 的核心思想

DeepSort 源自论文 《Simple Online and Realtime Tracking with a Deep Association Metric》 ，是对 SORT 算法的改进版。

SORT： 依赖卡尔曼滤波（预测目标位置）+ 匈牙利算法（基于 IOU 进行帧间匹配）。
缺陷： 遮挡时容易出现 ID 切换（ID Switch）。
DeepSort 改进： 在匹配时不仅考虑运动信息，还引入外观特征（embedding） ，显著减少 ID 切换。

两类关键信息

1.外观特征

使用预训练 CNN 提取 128-D 或 256-D 向量，描述目标的外观信息。
每一帧中，检测到的目标图像会送入 CNN 提取 embedding。

2.运动信息

使用卡尔曼滤波器预测目标的下一位置（位置+速度）。
即便目标被部分遮挡，也能通过历史轨迹推测位置。

Tracker 类的核心方法

在 DeepSort 中，Tracker 是主入口类，整合了多个模块(如detection.py、iou_matching.py、linear_assignment.py 等)。

predict()

作用：推进所有轨迹的状态预测一步，通常在每一帧开始时调用。

# 复制代码

def predict(self):
    """Propagate track state distributions one time step forward.
    This function should be called once every time step, before `update`.
    """
    for track in self.tracks:
        track.predict(self.kf)

update(detections)

作用：更新轨迹集，包括匹配检测、标记丢失目标、新建轨迹，并更新度量器。

# 复制代码

def update(self, detections):
    # Run matching cascade.
    matches, unmatched_tracks, unmatched_detections = \
        self._match(detections)
    # Update track set.
    for track_idx, detection_idx in matches:
        self.tracks[track_idx].update(
            self.kf, detections[detection_idx])
    for track_idx in unmatched_tracks:
        self.tracks[track_idx].mark_missed()
    for detection_idx in unmatched_detections:
        self._initiate_track(detections[detection_idx])
    self.tracks = [t for t in self.tracks if not t.is_deleted()]
    # Update distance metric.
    active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
    features, targets = [], []
    for track in self.tracks:
        if not track.is_confirmed():
            continue
        features += track.features
        targets += [track.track_id for _ in track.features]
        track.features = []
    self.metric.partial_fit(
        np.asarray(features), np.asarray(targets), active_targets)

_match(detections)

作用：将当前检测与已有轨迹匹配，先用外观+运动特征匹配，再用 IOU 弥补。

# 复制代码

def _match(self, detections):
    def gated_metric(tracks, dets, track_indices, detection_indices):
        features = np.array([dets[i].feature for i in detection_indices])
        targets = np.array([tracks[i].track_id for i in track_indices])
        cost_matrix = self.metric.distance(features, targets)
        cost_matrix = linear_assignment.gate_cost_matrix(
            self.kf, cost_matrix, tracks, dets, track_indices,
            detection_indices)
        return cost_matrix
    # Split track set into confirmed and unconfirmed tracks.
    confirmed_tracks = [
        i for i, t in enumerate(self.tracks) if t.is_confirmed()]
    unconfirmed_tracks = [
        i for i, t in enumerate(self.tracks) if not t.is_confirmed()]
    matches_a, unmatched_tracks_a, unmatched_detections = \
        linear_assignment.matching_cascade(
            gated_metric, self.metric.matching_threshold, self.max_age,
            self.tracks, detections, confirmed_tracks)
    # Associate remaining tracks together with unconfirmed tracks using IOU.
    iou_track_candidates = unconfirmed_tracks + [
        k for k in unmatched_tracks_a if
        self.tracks[k].time_since_update == 1]
    unmatched_tracks_a = [
        k for k in unmatched_tracks_a if
        self.tracks[k].time_since_update != 1]
    matches_b, unmatched_tracks_b, unmatched_detections = \
        linear_assignment.min_cost_matching(
            iou_matching.iou_cost, self.max_iou_distance, self.tracks,
            detections, iou_track_candidates, unmatched_detections)
    matches = matches_a + matches_b
    unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))
    return matches, unmatched_tracks, unmatched_detections

_initiate_track(detection)

作用：初始化新的轨迹。

# 复制代码

def _initiate_track(self, detection):
    mean, covariance = self.kf.initiate(detection.to_xyah())
    self.tracks.append(Track(
        mean, covariance, self._next_id, self.n_init, self.max_age,
        detection.feature))
    self._next_id += 1

YOLO + DeepSort 的实际使用

DeepSort 官方仓库原本只用 MOTChallenge 数据集的标注框，不包含检测过程。

如果配合 YOLO 使用：

YOLO 检测视频每帧中的目标（输出边界框、类别、置信度）。
将这些信息传给 DeepSort 进行 ID 分配与轨迹维护。
实现高效、连续的目标检测与跟踪。

常见应用场景：

行人重识别（Re-ID）
交通监控
智慧零售
安防巡检

DeepSort 的 CNN 特征提取器

网络结构在 deep_sort/tools/freeze_model.py 中定义（多层卷积+残差模块）。
输出 128 维归一化向量（embedding）。
训练数据来自 MOT16 + 其他 Re-ID 数据集。
训练阶段会额外加分类层预测 ID，但在推理时只保留 embedding 层用于匹配。

总结

YOLO 解决"看见什么"的问题（检测）。

DeepSort 解决"是不是同一个"的问题（跟踪）。

通过结合运动预测（Kalman Filter）与外观特征匹配（CNN embedding），DeepSort 在多目标跟踪任务中大幅降低 ID 切换率，实现更加稳定、流畅的跟踪效果。

如果你已经熟悉 YOLO + DeepSort 的原理，不妨直接在 Coovally 上实践，让算法从"论文和代码"变成"随时可用的工具"。