基于YOLO5的机械臂视觉抓取实现

前言：

机器人视觉系统标定是保证机器人精确运动和控制的关键环节之一。通过对机器人的运动学进行分析，可以精确计算出机器人末端执行器的位姿信息，从而实现对目标的精准定位和控制。相机标定是计算机视觉和图像处理中的重要步骤，标定可以去除相机镜头引入的畸变，确保图像信息的准确性。本文对机器人的运动学进行详细分析，并构建其运动学模型。对相机进行标定得到相机的内外参数，并通过手眼标定建立末端执行器相对于基准坐标系的变换关系，将相机获取的目标位置坐标转换到机器人坐标系中，为机器人进行抓取操作提供位置信息。

1、机器人运动学建模与分析

2、相机标定

相机标定可以用于确定相机内外参数，以及相机与机械臂坐标系之间的变换关系。已知机械臂末端执行器的坐标系位置，可以将相机安装在机械臂末端，使其与机械臂末端执行器坐标系重合，或者通过机械臂末端执行器坐标系的变换矩阵将相机的坐标系转换到世界坐标系中。

2.1标定原理

2.2标定方法

3、手眼标定原理以及实现

本文章采用了"Eye in Hand"方式，将相机安装在机械臂的末端。这种配置下的相机具有更广泛的活动范围，能够捕获更广阔的视野。为了更精确地确定目标位置和姿态，手眼标定至关重要。相机一旦成功捕获到目标在图像中的像素位置，应用预先设定的坐标转换矩阵，将这一位置信息从相机的像素坐标系精确地转换到机器人末端执行器的空间坐标系。这一过程确保了目标位置的准确映射，为机器人的精确操控提供了重要的数据支撑。

3.1标定原理

通过机器人末端坐标系到摄像机坐标系的转换矩阵，并结合摄像机的内部和外部参数获取物体的实际坐标。由于标定板自始至终都固定在一个位置上，这个转换矩阵对每一组图像都是一样的。标定板与机械臂的基坐标系在手眼标定过程中相对固定。为了获取不同位姿的标定板样本，通过调整机械臂末端位姿，进一步调整相机位姿，捕捉末端与基础坐标系之间的空间关系，同时进行标定板图像采集。利用适当的图像分析方法，可以准确获取相机相对于标定板的姿态，进而获得机械臂末端坐标系与摄像机坐标系之间姿态变换的参数。

3.2标定方法

4、基于YOLOv5的目标定位程序实现

4.1：载入双目相机参数

4.2: 双目测距

4.2.1 消除畸变

在立体视觉系统中，在立体校正前进行畸变消除是必要的，因为相机的镜头通常会引入几何畸变，而这些畸变会影响图像的几何一致性，从而影响立体校正和后续的深度计算。

复制代码

# 消除畸变
def undistortion(image, camera_matrix, dist_coeff):
    undistortion_image = cv2.undistort(image, camera_matrix, dist_coeff)

    return undistortion_image

4.2.2 立体校正

4.2.2.1 校正目的

立体校正利用双目标定的内外参数（焦距、成像原点、畸变系数）和双目相对位置关系（旋转矩阵和平移向量），分别对左右视图进行消除畸变和行对准，使得左右视图的成像原点坐标一致、两摄像头光轴平行、左右成像平面共面、对极线行对齐。

校正前的左右相机的光心并不是平行的，两个光心的连线就叫基线，像平面与基线的交点就是极点，像点与极点所在的直线就是极线，左右极线与基线构成的平面就是空间点对应的极平面。

校正后，极点在无穷远处，两个相机的光轴平行。像点在左右图像上的高度一致。这也就是极线校正的目标。校正后做后续的立体匹配时，只需在同一行上搜索左右像平面的匹配点即可，能使效率大大提高。

4.2.2.2 校正方法

利用OpenCV中的stereoRectify()函数实现立体校正，stereoRectify()函数内部采用的是Bouguet的极线校正算法,Bouguet算法步骤：

1、将右图像平面相对于左图像平面的旋转矩阵分解成两个矩阵Rl和Rr，叫做左右相机的合成旋转矩阵

2、将左右相机各旋转一半，使得左右相机的光轴平行。此时左右相机的成像面达到平行，但是基线与成像平面不平行

3、构造变换矩阵Rrect使得基线与成像平面平行。构造的方法是通过右相机相对于左相机的偏移矩阵T完成的

4、通过合成旋转矩阵与变换矩阵相乘获得左右相机的整体旋转矩阵。左右相机坐标系乘以各自的整体旋转矩阵就可使得左右相机的主光轴平行，且像平面与基线平行

5、通过上述的两个整体旋转矩阵，就能够得到理想的平行配置的双目立体系图像。校正后根据需要对图像进行裁剪，需重新选择一个图像中心，和图像边缘从而让左、右叠加部分最大

4.2.2.3 相关代码

复制代码

def getRectifyTransform(height, width, config):
    # 读取内参和外参
    left_K = config.cam_matrix_left
    right_K = config.cam_matrix_right
    left_distortion = config.distortion_l
    right_distortion = config.distortion_r
    R = config.R
    T = config.T

    # 计算校正变换
    R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(left_K, left_distortion, right_K, right_distortion,
                                                      (width, height), R, T, alpha=0)
    map1x, map1y = cv2.initUndistortRectifyMap(left_K, left_distortion, R1, P1, (width, height), cv2.CV_32FC1)
    map2x, map2y = cv2.initUndistortRectifyMap(right_K, right_distortion, R2, P2, (width, height), cv2.CV_32FC1)

    return map1x, map1y, map2x, map2y, Q


# 畸变校正和立体校正
def rectifyImage(image1, image2, map1x, map1y, map2x, map2y):
    rectifyed_img1 = cv2.remap(image1, map1x, map1y, cv2.INTER_AREA)
    rectifyed_img2 = cv2.remap(image2, map2x, map2y, cv2.INTER_AREA)

    return rectifyed_img1, rectifyed_img2

4.2.3 立体匹配和视差计算

立体匹配也称作视差估计，立体匹配可划分为四个步骤：匹配代价计算、代价聚合、视差计算和视差优化。立体校正后的左右两幅图像得到后，匹配点是在同一行上的，可以使用OpenCV中的BM算法或者SGBM算法计算视差图。由于SGBM算法的表现要远远优于BM算法，因此采用SGBM算法获取视差图。在立体匹配生成视差图后，可以对视差图进行后处理，如滤波，空洞填充等方法，从而改善视差图的视觉效果

4.2.3.1 相关代码

复制代码

def stereoMatchSGBM(left_image, right_image, down_scale=False):
    # SGBM匹配参数设置
    if left_image.ndim == 2:
        img_channels = 1
    else:
        img_channels = 3
    blockSize = 3
    paraml = {'minDisparity': 0,
              'numDisparities': 64,
              'blockSize': blockSize,
              'P1': 8 * img_channels * blockSize ** 2,
              'P2': 32 * img_channels * blockSize ** 2,
              'disp12MaxDiff': 1,
              'preFilterCap': 63,
              'uniquenessRatio': 15,
              'speckleWindowSize': 100,
              'speckleRange': 1,
              'mode': cv2.STEREO_SGBM_MODE_SGBM_3WAY
              }

    # 构建SGBM对象
    left_matcher = cv2.StereoSGBM_create(**paraml)
    paramr = paraml
    paramr['minDisparity'] = -paraml['numDisparities']
    right_matcher = cv2.StereoSGBM_create(**paramr)

    # 计算视差图
    size = (left_image.shape[1], left_image.shape[0])
    if down_scale == False:
        disparity_left = left_matcher.compute(left_image, right_image)
        disparity_right = right_matcher.compute(right_image, left_image)
    else:
        left_image_down = cv2.pyrDown(left_image)
        right_image_down = cv2.pyrDown(right_image)
        factor = left_image.shape[1] / left_image_down.shape[1]

        disparity_left_half = left_matcher.compute(left_image_down, right_image_down)
        disparity_right_half = right_matcher.compute(right_image_down, left_image_down)
        disparity_left = cv2.resize(disparity_left_half, size, interpolation=cv2.INTER_AREA)
        disparity_right = cv2.resize(disparity_right_half, size, interpolation=cv2.INTER_AREA)
        disparity_left = factor * disparity_left
        disparity_right = factor * disparity_right

    # 真实视差（因为SGBM算法得到的视差是×16的）
    trueDisp_left = disparity_left.astype(np.float32) / 16.
    trueDisp_right = disparity_right.astype(np.float32) / 16.

    return trueDisp_left, trueDisp_right

4.2.4 深度计算

得到视差图后，计算像素深度值，公式：depth = ( f * baseline) / disp

其中，depth表示深度图；f表示归一化的焦距，也就是内参中的fx； baseline是两个相机光心之间的距离，称作基线距离；disp是视差值。直接利用opencv中的cv2.reprojectImageTo3D（）函数计算深度图，代码如下：

复制代码

def getDepthMapWithQ(disparityMap: np.ndarray, Q: np.ndarray) -> np.ndarray:
    points_3d = cv2.reprojectImageTo3D(disparityMap, Q)
    depthMap = points_3d[:, :, 2]
    reset_index = np.where(np.logical_or(depthMap < 0.0, depthMap > 65535.0))
    depthMap[reset_index] = 0
    return depthMap.astype(np.float32)

4.3 ：修改YOLOv5中detect.py中的部分代码

python 复制代码

# Run inference
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
seen, windows, dt = 0, [], (Profile(), Profile(), Profile())# 立体校正配置
config = stereoconfig.stereoCamera()

# 获取立体校正变换矩阵
map1x, map1y, map2x, map2y, Q = getRectifyTransform(720, 1280, config)

# 遍历数据集
for path, im, im0s, vid_cap, s in dataset:
    # 将图像转换为模型所需的格式
    with dt[0]:
        im = torch.from_numpy(im).to(model.device)
        im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # 扩展为批处理维度

    # 推理过程
    with dt[1]:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred = model(im, augment=augment, visualize=visualize)

    # 非极大值抑制
    with dt[2]:
        pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

    # 定义CSV文件路径
    csv_path = save_dir / 'predictions.csv'

    # 写入CSV文件的函数
    def write_to_csv(image_name, prediction, confidence):
        data = {'Image Name': image_name, 'Prediction': prediction, 'Confidence': confidence}
        with open(csv_path, mode='a', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=data.keys())
            if not csv_path.is_file():
                writer.writeheader()
            writer.writerow(data)

    # 处理预测结果
    for i, det in enumerate(pred):  # 每张图片
        seen += 1
        if webcam:  # batch_size >= 1
            p, im0, frame = path[i], im0s[i].copy(), dataset.count
            s += f'{i}: '
        else:
            p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

        # 启动立体视觉线程
        thread = MyThread(stereo_threading, args=(config, im0, map1x, map1y, map2x, map2y, Q))
        thread.start()
        p = Path(p)  # 转换为Path对象
        save_path = str(save_dir / p.name)  # 图片保存路径
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # 标签保存路径
        s += '%gx%g ' % im.shape[2:]  # 打印图像尺寸
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # 归一化增益
        imc = im0.copy() if save_crop else im0  # 是否保存裁剪图像
        annotator = Annotator(im0, line_width=line_thickness, example=str(names))
        if len(det):
            # 将检测框从图像尺寸缩放到原始图像尺寸
            det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

            # 打印结果
            for c in det[:, 5].unique():
                n = (det[:, 5] == c).sum()  # 每类的检测数
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # 添加到字符串

            # 写入结果
            for *xyxy, conf, cls in reversed(det):
                c = int(cls)  # 类别整数
                label = names[c] if hide_conf else f'{names[c]}'
                confidence = float(conf)
                confidence_str = f'{confidence:.2f}'

                # 写入CSV文件
                if save_csv:
                    write_to_csv(p.name, label, confidence_str)

                # 写入TXT文件
                if save_txt:  # 写入文件
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # 归一化xywh
                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # 标签格式
                    with open(f'{txt_path}.txt', 'a') as f:
                        f.write(('%g ' * len(line)).rstrip() % line + '\n')

                # 在图像上添加检测框
                if save_img or save_crop or view_img:
                    c = int(cls)  # 类别整数
                    label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                    annotator.box_label(xyxy, label, color=colors(c, True))
                    # 双目测距代码
                    x_center = (xyxy[0] + xyxy[2]) / 2
                    y_center = (xyxy[1] + xyxy[3]) / 2
                    x_0 = int(x_center)
                    y_0 = int(y_center)
                    if 0 < x_0 < 1280:
                        x1 = xyxy[0]
                        x2 = xyxy[2]
                        y1 = xyxy[1]
                        y2 = xyxy[3]
                        thread.join()
                        points_3d = thread.get_result()
                        a = points_3d[int(y_0), int(x_0), 0] / 1000
                        b = points_3d[int(y_0), int(x_0), 1] / 1000
                        c = points_3d[int(y_0), int(x_0), 2] / 1000
                        distance = ((a ** 2 + b ** 2 + c ** 2) ** 0.5)
                        print(f'目标 {label} 的三维坐标为: ({a:.2f}, {b:.2f}, {c:.2f}) 米')
                        if distance != 0:
                            c = int(cls)  # 类别整数
                            label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                            annotator.box_label(xyxy, label, color=colors(c, True))
                            print('点 (%d, %d) 的 %s 距离左摄像头的相对距离为 %0.2f m' % (
                                x_center, y_center, label, distance))
                            text_dis_avg = "dis:%0.2fm" % distance
                            cv2.putText(im0, text_dis_avg, (int(x1 + (x2 - x1) + 5), int(y1 + 30)), cv2.FONT_ITALIC,
                                        1.2, (255, 255, 255), 3)
        if save_crop:
            save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

    # 显示结果
    im0 = annotator.result()
    if view_img:
        cv2.namedWindow("Webcam", cv2.WINDOW_NORMAL)
        cv2.resizeWindow("Webcam", 1280, 480)
        cv2.moveWindow("Webcam", 0, 100)
        cv2.imshow("Webcam", im0)
        cv2.waitKey(1)

参考：

YOLOV5 + 双目相机实现三维测距（新版本）