在此之前,我们已经对yolo系列做出了详细的探析,有兴趣的朋友可以参考yolov8等文章。YOLOV8对生态进行了优化,目前已经支持了分割,分类,跟踪等功能,这对于我们开发者来说,是十分便利。今天我们对YOLO v8中支持的目标跟踪进行详细解读。
代码地址 :yolov8
一、算法简介
目标跟踪现阶段是强烈依赖目标检测结果的,主流的目标跟踪pipeline,简单的总结一下:
首先,目标检测模型输出的检测框是我们跟踪的对象。假设我们在第一帧确定了跟踪对象,即给定检测框id,那当检测模型输出第2帧检测结果,我们需要将第2帧的检测框与第1帧检测框的id进行匹配。那么匹配的过程是怎样的呢?
有人说,直接用两帧检测框的IOU去匹配呗,将IOU看作cost_matrix,利用匈牙利算法去匹配两帧的id。理想情况下,是没问题的,但是,当我们处于crowd场景下,遮挡,光照等给检测带来误差,那么,IOU直接匹配就不那么靠谱了。
远古神器卡尔曼滤波器有神奇的疗效,我们将检测框的坐标看作状态变量,通过状态转移矩阵可以预测出下一帧检测框的位置。然后,下一帧的检测框坐标作为观测量,可以对预测量进行更新并矫正,从而更好的预测下下帧的检测框位置。
总结来说,就是利用卡尔曼滤波器预测检测框位置,构建合适的cost_matrix并利用匈牙利算法匹配轨迹与当前帧检测框的id,同时加入适当的逻辑,就能构建一个效果不错的跟踪器。
二、代码详解
YOLOv8采用2022年提出的跟踪算法BoT-SORT和ByteTrack两种算法实现目标跟踪。如果你想体验该算法的跟踪效果只需要执行以下代码。
python
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model.track(source=".avi", show=True, save=True)
在Yolo V8中,实现对一个视频中运动目标跟踪的核心代码块为文件:.\ultralytics\tracker\trackers\byte_tracker.py 中类BYTETracker.update()。下面我们不赘述卡尔曼以及匈牙利算法的原理,仅从代码逻辑来拆解。
在这之前,需要了解一个重要的类class BaseTrack,该对象为跟踪的基类,用于处理基本的跟踪属性与操作。class STrack(BaseTrack),class BOTrack(STrack),BoT-SORT和ByteTrack的track都是继承于此。从代码中我们可以发现该类记录了跟踪id,is_activated 激活状态等等操作与属性。在跟踪过程中,每帧的检测框都会分配一个track。
python
class BaseTrack:
"""Base class for object tracking, handling basic track attributes and operations."""
_count = 0
track_id = 0
is_activated = False
state = TrackState.New
history = OrderedDict()
features = []
curr_feature = None
score = 0
start_frame = 0
frame_id = 0
time_since_update = 0
# Multi-camera
location = (np.inf, np.inf)
@property
def end_frame(self):
"""Return the last frame ID of the track."""
return self.frame_id
@staticmethod
def next_id():
"""Increment and return the global track ID counter."""
BaseTrack._count += 1
return BaseTrack._count
def activate(self, *args):
"""Activate the track with the provided arguments."""
raise NotImplementedError
## ........
在跟踪前,先初始化跟踪器,其中self.tracked_stracks列表保存了可跟踪的轨迹,self.lost_stracks保存丢失的轨迹,self.removed_stracks保存被移除的轨迹。当self.lost_stracks的成员在满足被移除的条件后则变为self.removed_stracks的成员,而过程中若被新的track匹配上,则变为self.tracked_stracks的成员。self.frame_id用来记录帧数,self.kalman_filter表示卡尔曼滤波器,后面称之为KF。
python
def __init__(self, args, frame_rate=30):
"""Initialize a YOLOv8 object to track objects with given arguments and frame rate."""
self.tracked_stracks = [] # type: list[STrack]
self.lost_stracks = [] # type: list[STrack]
self.removed_stracks = [] # type: list[STrack]
self.frame_id = 0
self.args = args
self.max_time_lost = int(frame_rate / 30.0 * args.track_buffer)
self.kalman_filter = self.get_kalmanfilter()
self.reset_id()
接下来正式进入update逻辑讲解。每检测一帧就会执行一次update,因此self.frame_id帧数+1,activated_starcks 用来保存该帧激活的轨迹,refind_stracks保存该帧重新激活的轨迹,lost_stracks 保存该帧丢失的轨迹,removed_stracks 保存该帧移除的轨迹。results为YOLOv8的检测结果,bboxes 是检测框坐标,后面接上了索引。
python
def update(self, results, img=None):
"""Updates object tracker with new detections and returns tracked object bounding boxes."""
self.frame_id += 1
activated_starcks = []
refind_stracks = []
lost_stracks = []
removed_stracks = []
scores = results.conf
bboxes = results.xyxy
cls = results.cls
# Add index
bboxes = np.concatenate([bboxes, np.arange(len(bboxes)).reshape(-1, 1)], axis=-1)
byte_tracker根据置信度将检测框分成两类,当置信度高于self.args.track_high_thresh(0.5)时,我们将其称为第一检测框,当置信度低于0.5且高于0.1时,称为第二检测框。
python
remain_inds = scores > self.args.track_high_thresh
inds_low = scores > self.args.track_low_thresh
inds_high = scores < self.args.track_high_thresh
inds_second = np.logical_and(inds_low, inds_high)
dets_second = bboxes[inds_second]
dets = bboxes[remain_inds]
scores_keep = scores[remain_inds]
scores_second = scores[inds_second]
cls_keep = cls[remain_inds]
cls_second = cls[inds_second]
为每个检测框分配track,上面讲到,track类提供了跟踪属性与操作。目前,YOLOV8中的self.args.with_reid=False,与特征相关的还未实现。
python
detections = self.init_track(dets, scores_keep, cls_keep, img)
def init_track(self, dets, scores, cls, img=None):
"""Initialize track with detections, scores, and classes.为每一个det分配一个track"""
if len(dets) == 0:
return []
if self.args.with_reid and self.encoder is not None:
features_keep = self.encoder.inference(img, dets)
return [BOTrack(xyxy, s, c, f) for (xyxy, s, c, f) in zip(dets, scores, cls, features_keep)] # detections
else:
return [BOTrack(xyxy, s, c) for (xyxy, s, c) in zip(dets, scores, cls)] # detections
步骤一 :将可跟踪轨迹与第一检测框进行关联匹配。首先,遍历self.tracked_stracks可跟踪轨迹,将已激活的轨迹添加到tracked_stracks中,未被激活的轨迹加入unconfirmed未证实轨迹。将丢失轨迹与激活轨迹合并到strack_pool中,注意剔除重复track_id。重点来了,self.multi_predict(strack_pool),我们需要利用KF将strack_pool中前一帧轨迹通过状态转移矩阵预测出该帧最优估计以及状态协方差矩阵,并计算轨迹池中的轨迹与该帧检测出的bbox的距离作为cost_matrix,利用匈牙利算法进行关联匹配。
通过匈牙利算法,我们获得matches(关联成功后轨迹与检测框的索引),u_track是未关联上的轨迹,u_detection未关联上的检测框。
python
unconfirmed = []
tracked_stracks = [] # type: list[STrack]
for track in self.tracked_stracks:
if not track.is_activated:
unconfirmed.append(track)
else:
tracked_stracks.append(track)
# Step 2: First association, with high score detection boxes
strack_pool = self.joint_stracks(tracked_stracks, self.lost_stracks) ##将丢失track与激活track合并到strack_pool中,注意剔除重复track_id。
# Predict the current location with KF
self.multi_predict(strack_pool) ## 第一帧step4已初始化KF,第二帧的track通过KF的状态转移方程预测最优估计以及状态协方差矩阵
if hasattr(self, 'gmc') and img is not None:
warp = self.gmc.apply(img, dets)
STrack.multi_gmc(strack_pool, warp)
STrack.multi_gmc(unconfirmed, warp)
dists = self.get_dists(strack_pool, detections) ## 计算轨迹池中的轨迹与该帧检测出的bbox的距离,并利用匈牙利算法进行匹配。
matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.args.match_thresh)
遍历matches,已激活的轨迹需要根据该帧检测框的信息去更新轨迹属性,KF根据测量值(该帧的检测框坐标)矫正最优估计,并更新状态协方差矩阵。而未激活的轨迹重新关联上检测框,需要重新激活,re_activate与update功能类似。
python
for itracked, idet in matches:
track = strack_pool[itracked]
det = detections[idet]
if track.state == TrackState.Tracked: ## 已激活的track
track.update(det, self.frame_id) ## 确定匹配后更新track信息,KF根据测量值矫正最优估计,并更新状态协方差矩阵
activated_starcks.append(track)
else: ## 未激活的track重新匹配上需要重新激活
track.re_activate(det, self.frame_id, new_id=False)
refind_stracks.append(track)
def update(self, new_track, frame_id):
"""
Update a matched track
:type new_track: STrack
:type frame_id: int
:return:
"""
self.frame_id = frame_id
self.tracklet_len += 1
new_tlwh = new_track.tlwh
self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance,self.convert_coords(new_tlwh))
self.state = TrackState.Tracked
self.is_activated = True
self.score = new_track.score
self.cls = new_track.cls
self.idx = new_track.idx
def re_activate(self, new_track, frame_id, new_id=False):
"""Reactivates a previously lost track with a new detection."""
self.mean, self.covariance = self.kalman_filter.update(self.mean, self.covariance, self.convert_coords(new_track.tlwh))
self.tracklet_len = 0
self.state = TrackState.Tracked
self.is_activated = True
self.frame_id = frame_id
if new_id:
self.track_id = self.next_id()
self.score = new_track.score
self.cls = new_track.cls
self.idx = new_track.idx
步骤二: 将可跟踪轨迹与第二检测框进行关联匹配。首先,为第二检测框分配track,将轨迹池strack_pool中在步骤一中未关联上的轨迹存放在r_tracked_stracks中。计算r_tracked_stracks与第二检测框的IOU,将IOU作为cost_matric进行匈牙利匹配,注意与步骤一匹配过程相比,步骤二的匹配阈值从0.8降低至0.5。由于第二检测框置信度较低,坐标回归质量较差,为了捞回更多的轨迹,适当降低阈值是必要的。遍历matches,更新可激活轨迹,并重新激活未激活轨迹。若步骤二中仍存在未关联上的轨迹,需将其状态改成lost丢失,
python
detections_second = self.init_track(dets_second, scores_second, cls_second, img) ## 0.1<置信度<0.5的box分配track
r_tracked_stracks = [strack_pool[i] for i in u_track if strack_pool[i].state == TrackState.Tracked] ## 第一次未匹配上的track
# TODO
dists = matching.iou_distance(r_tracked_stracks, detections_second)
matches, u_track, u_detection_second = matching.linear_assignment(dists, thresh=0.5)
for itracked, idet in matches:
track = r_tracked_stracks[itracked]
det = detections_second[idet]
if track.state == TrackState.Tracked:
track.update(det, self.frame_id)
activated_starcks.append(track)
else:
track.re_activate(det, self.frame_id, new_id=False)
refind_stracks.append(track)
for it in u_track: ## 第2次还未匹配上的track,将其状态改成lost丢失
track = r_tracked_stracks[it]
if track.state != TrackState.Lost:
track.mark_lost()
lost_stracks.append(track)
u_detection是步骤一中没有关联上的track(即检测框),unconfirmed未证实的轨迹是可跟踪轨迹中is_tracked=False的轨迹,这里需要与u_detection进行匹配尝试。如若关联成功,则需要更新轨迹将is_tracked置为True,并更新其KF相关参数,否则该轨迹被移除。
python
# Deal with unconfirmed tracks, usually tracks with only one beginning frame
detections = [detections[i] for i in u_detection]
dists = self.get_dists(unconfirmed, detections) ## 这里的detections是第一检测框未匹配上的
matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7)
for itracked, idet in matches: ## unconfirmed未证实的轨迹与步骤一未匹配上的检测框关联成功,更新轨迹并加入激活轨迹
unconfirmed[itracked].update(detections[idet], self.frame_id)
activated_starcks.append(unconfirmed[itracked])
for it in u_unconfirmed: ## 未证实轨迹仍然关联失败,则移除未证实轨迹
track = unconfirmed[it]
track.mark_removed()
removed_stracks.append(track)
步骤三 :初始化新轨迹。当我们在处理第一帧时,因为可跟踪轨迹等列表均为空,无法匹配,因此直接进入步骤三,将检测框置信度高于self.args.new_track_thresh的track激活,is_tracked置为True,初始化KF,并将track存放于列表activated_starcks中,等待合并到self.tracked_stracks可跟踪轨迹中。当frame_id != 1时,activate不会将is_tracked置为True,未匹配上的track存于activated_starcks中,等待合并到self.tracked_stracks。注意由于is_tracked=False,这部分在步骤一中被归纳为unconfirmed未证实轨迹。
python
## 处理第一帧时,因为无法匹配,直接进入Step 4,激活track,将is_tracked置True,初始化KF,并将track存放于列表activated_starcks中,等待合并到self.tracked_stracks可跟踪轨迹中。
## frame_id!=1时,activate不会将is_tracked置为True,未匹配上的track存于activated_starcks中,等待合并到self.tracked_stracks。注意这部分在220行被归纳到unconfirmed中。
for inew in u_detection:
track = detections[inew]
if track.score < self.args.new_track_thresh:
continue
track.activate(self.kalman_filter, self.frame_id)
activated_starcks.append(track)
步骤四: 可跟踪轨迹在经过步骤一与步骤二后,仍未与检测框关联成功,其状态会变为lost丢失,并从可跟踪轨迹中移除,并入self.lost_stracks丢失轨迹中。如果丢失轨迹中的track在30帧内仍未匹配上,则将其移除。
python
# Step 5: Update state
for track in self.lost_stracks: ## 如果超过30帧lost_track仍未匹配上,则移除
if self.frame_id - track.end_frame > self.max_time_lost:
track.mark_removed()
removed_stracks.append(track)
步骤五: 更新self.tracked_stracks,self.lost_stracks,self.removed_stracks列表。将activated_starcks,refind_stracks合并到可跟踪轨迹中,继续在下一帧进行关联跟踪。将lost_stracks在self.tracked_stracks列表中出现的track剔除(丢失已找回),将lost_stracks在self.removed_stracks列表中出现的track剔除(丢失已移除)。remove_duplicate_stracks将self.tracked_stracks, self.lost_stracks列表中IOU接近的track去重,保留最新出现的track。最后return被激活的轨迹,其坐标是KF修正后的值。
python
self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]
self.tracked_stracks = self.joint_stracks(self.tracked_stracks, activated_starcks) ## 将激活的tracks合并到self.tracked_stracks列表中
self.tracked_stracks = self.joint_stracks(self.tracked_stracks, refind_stracks)
self.lost_stracks = self.sub_stracks(self.lost_stracks, self.tracked_stracks) ## 将lost_stracks在self.tracked_stracks列表中出现的track剔除(丢失已找回)
self.lost_stracks.extend(lost_stracks)
self.lost_stracks = self.sub_stracks(self.lost_stracks, self.removed_stracks) ## 将lost_stracks在self.removed_stracks列表中出现的track剔除(丢失已移除)
self.tracked_stracks, self.lost_stracks = self.remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
self.removed_stracks.extend(removed_stracks)
if len(self.removed_stracks) > 1000:
self.removed_stracks = self.removed_stracks[-999:] # clip remove stracks to 1000 maximum
return np.asarray(
[x.tlbr.tolist() + [x.track_id, x.score, x.cls, x.idx] for x in self.tracked_stracks if x.is_activated],
dtype=np.float32)
代码详解过于冗长,文字单调,缺少精炼的图释,有不理解的或者讲解错误的地方还望指出,接下来会对目标跟踪中的卡尔曼滤波进行剖析,有兴趣的朋友可以关注留言。