【机器人】复现 ASGrasp 通用透明物体重建、6-DoF抓取预测

在机器人抓取领域，透明和镜面物体的抓取一直是个难题。

ASGrasp核心是两层学习型立体网络，能够同时恢复透明和镜面物体的可见部分和不可见部分的深度信息。

然后融合两层深度信息，进行重建物体，得到点云信息作为GSNet（GraspNess）的输入，进而预测出抓取位姿。

与传统方法不同，ASGrasp不依赖深度相机生成的深度图，而是直接从主动立体相机获取的原始红外图像和RGB图像入手。

RGB图像，要恢复物体的表面可见部分深度信息。
左红外图和右红外图，预测那些被遮挡或者不可直接观测到的深度信息。
使用的相机是常规的Intel® RealSense系列（d415等）

论文地址：ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera

代码地址：https://github.com/jun7-shi/ASGrasp

1、准备GPU加速环境

2、安装torch和cudatoolkit

1、准备GPU加速环境

推荐在Conda环境中搭建环境，方便不同项目的管理～

首先需要安装好Nvidia 显卡驱动，后面还要安装CUDA11.1

输入命令：nvidia-smi，能看到显卡信息，说明Nvidia 显卡驱动安装好了

然后需要单独安装CUDA11.1了，上面虽然安装了CUDA12.2也不影响的

各种CUDA版本：CUDA Toolkit Archive | NVIDIA Developer

CUDA11.1下载地址 ：CUDA Toolkit 11.1.0 | NVIDIA Developer

然后根据电脑的系统（Linux）、CPU类型，选择runfile方式

然后下载cuda_11.1.0_455.23.05_linux.run文件

复制代码

wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run

开始安装

复制代码

sudo sh cuda_11.1.0_455.23.05_linux.run

来到下面的界面，点击"Continue"

输入"accept"

下面是关键，在455.23.05这里"回车"，取消安装；

这里X是表示需要安装的，我们只需安装CUDA11相关的即可

安装完成后，能看到/usr/local/cuda-11.1目录啦

复制代码

(base) lgp@lgp-MS-7E07:~/2024_project$ ls /usr/local/cuda-11.1
bin   EULA.txt  libnvvp                  nsight-systems-2020.3.4  nvvm    samples  targets
DOCS  extras    nsight-compute-2020.2.0  nvml                     README  src      tools

参考1：Ubuntu20.04下CUDA、cuDNN的详细安装与配置过程（图文）_ubuntu cudnn安装-CSDN博客

参考2：Ubuntu18.04安装CUDA11.0 Installation failed. See log at /var/log/cuda-installer.log for details._ $error$ : install of driver component failed.-CSDN博客

2、安装torch和cudatoolkit

首先创建一个Conda环境，名字为graspness，python版本为3.8

然后进行graspness环境

bash 复制代码

conda create -n graspness python=3.8
conda activate graspness

这里需要安装pytorch1.8.2，cudatoolkit=11.1

bash 复制代码

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia

pytorch1.8.2官网地址：Previous PyTorch Versions | PyTorch

3、安装Graspness相关依赖库

下载graspness代码

bash 复制代码

git clone https://github.com/rhett-chen/graspness_implementation.git
cd graspnet-graspness

编辑 requirements.txt，注释torch和MinkowskiEngine

bash 复制代码

# pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
# torch>=1.8
tensorboard==2.3
numpy==1.23.5
scipy
open3d>=0.8
Pillow
tqdm
# MinkowskiEngine==0.5.4

开始安装Graspness相关依赖库

bash 复制代码

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

4、安装MinkowskiEngine

在安装MinkowskiEngine之前，需要先安装相关的依赖

bash 复制代码

pip install ninja  -i https://pypi.tuna.tsinghua.edu.cn/simple

bash 复制代码

conda install openblas-devel -c anaconda

然后本地安装的流程：

bash 复制代码

export CUDA_HOME=/usr/local/cuda-11.1		# 指定CUDA_HOME为cuda-11.1
export MAX_JOBS=2		# 降低占用CPU的核心数目，避免卡死（可选）
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

等待安装完成

5、安装pointnet2和knn

这些两个的安装需要CUDA编译的，依赖于前面的export CUDA_HOME=/usr/local/cuda-11.1

首先来到graspnet-graspness工程中，安装pointnet2

bash 复制代码

cd pointnet2
python setup.py install

再安装knn

bash 复制代码

cd knn
python setup.py install

6、安装graspnetAPI

这里安装的流程如下，逐条命令执行

bash 复制代码

git clone https://github.com/graspnet/graspnetAPI.git
cd graspnetAPI
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

成功安装后，需要再安装numpy==1.23.5

bash 复制代码

pip install numpy==1.23.5 -i https://pypi.tuna.tsinghua.edu.cn/simple

到这里安装完成啦～

7、搭建ASgrasp工程代码

首先来到ASgrasp的github地址，下载代码到本地，然后解压文件

代码地址：https://github.com/jun7-shi/ASGrasp

下载预训练权重，到checkpoints/目录下

点击下载ASgrasp预训练权重

下代码GSNet（GraspNess），到gsnet目录下

点击下载gsnet代码

8、运行推理

运行下面命令，进行推理

复制代码

python infer_mvs_2layer_gsnet.py

输入的数据：

运行效果：

如果运行代码，出现错误RuntimeError: cusolver error: 7, when calling cusolverDnCreate(handle)

需要修改src/core_multilayers/submodule.py的31～41行代码：

python 复制代码

    with torch.no_grad():
        if batch == 2:
            inv_ref_proj = []
            for i in range(batch):
                # 检查矩阵是否可逆
                det = torch.det(ref_proj[i].cpu())
                if det == 0:
                    raise ValueError(f"Matrix at index {i} is singular and cannot be inverted. Determinant is zero.")
                
                # 将矩阵移动到 CPU 进行求逆操作，然后移回原设备
                inv_ref_proj.append(torch.linalg.inv(ref_proj[i].cpu()).unsqueeze(0).to(ref_proj[i].device))
            inv_ref_proj = torch.cat(inv_ref_proj, dim=0)
            assert (not torch.isnan(inv_ref_proj).any()), "nan in inverse(ref_proj)"
            proj = torch.matmul(src_proj, inv_ref_proj)
        else:
            # 检查矩阵是否可逆
            det = torch.det(ref_proj.cpu())
            if det == 0:
                raise ValueError("Matrix is singular and cannot be inverted. Determinant is zero.")
            
            # 将矩阵移动到 CPU 进行求逆操作，然后移回原设备
            inv_ref_proj = torch.linalg.inv(ref_proj.cpu()).to(ref_proj.device)
            proj = torch.matmul(src_proj, inv_ref_proj)
            assert (not torch.isnan(proj).any()), "nan in proj"

如下面图片这样：

详细的原因分析参考我这篇博客：问题解决 RuntimeError: cusolver error: 7, when calling cusolverDnCreate(handle)-CSDN博客

输出Top1的抓取位姿，需要修改176～179行代码

python 复制代码

            gg = gg.nms()
            gg = gg.sort_by_score()
            count = gg.__len__()
            if count <1:
                return None
            if count > 1:
                count = 1
            gg = gg[:count]

输出Top10的抓取位姿，

9、可视化深度图

ASGrasp核心是两层学习型立体网络，能够同时恢复透明和镜面物体的可见部分和不可见部分的深度信息。

第一层，RGB图像要恢复物体的表面可见部分深度信息。
第二层，左红外图和右红外图预测那些被遮挡或者不可直接观测到的深度信息。

这里分别可视化两层深度图，看看效果如何

下面是保存深度图的函数

python 复制代码

def save_colored_depth_map(depth_map, save_path):
    """
    将深度图根据深度值差异彩色化并保存为图像文件。

    参数:
        depth_map (numpy.ndarray): 输入的深度图数据，形状为 (H, W)，值为深度值。
        save_path (str): 保存彩色化深度图的路径。
    """
    # 确保输出目录存在
    os.makedirs(os.path.dirname(save_path), exist_ok=True)

    # 归一化深度图到 0-1 范围
    depth_min = np.min(depth_map)
    depth_max = np.max(depth_map)
    normalized_depth = (depth_map - depth_min) / (depth_max - depth_min)

    # 使用 matplotlib 的 colormap
    cmap = plt.get_cmap('viridis')  # 你可以选择其他 colormap，如 'jet', 'turbo' 等
    colored_depth = cmap(normalized_depth)

    # 转换为 BGR 格式以便使用 OpenCV 保存
    colored_depth = (colored_depth[:, :, :3] * 255).astype(np.uint8)
    colored_depth = cv2.cvtColor(colored_depth, cv2.COLOR_RGB2BGR)

    # 保存彩色化深度图
    cv2.imwrite(save_path, colored_depth)

调用方式：

第一层深度图

第二层深度图

下面是RGB图：

完整代码：

python 复制代码

import os
import sys
import numpy as np
import torch
import cv2
import open3d as o3d
from graspnetAPI.graspnet_eval import GraspGroup
sys.path.append("./gsnet/pointnet2")
sys.path.append("./gsnet/utils")
sys.path.append("./src/core_multilayers")
from gsnet.models.graspnet import GraspNet, pred_decode
from raft_mvs_multilayers import RAFTMVS_2Layer
from dataset.graspnet_dataset import minkowski_collate_fn
from collision_detector import ModelFreeCollisionDetector
from data_utils import CameraInfo, create_point_cloud_from_depth_image, get_workspace_mask
from D415_camera import CameraMgr

import matplotlib.pyplot as plt

DEBUG_VIS = True

def save_colored_depth_map(depth_map, save_path):
    """
    将深度图根据深度值差异彩色化并保存为图像文件。

    参数:
        depth_map (numpy.ndarray): 输入的深度图数据，形状为 (H, W)，值为深度值。
        save_path (str): 保存彩色化深度图的路径。
    """
    # 确保输出目录存在
    os.makedirs(os.path.dirname(save_path), exist_ok=True)

    # 归一化深度图到 0-1 范围
    depth_min = np.min(depth_map)
    depth_max = np.max(depth_map)
    normalized_depth = (depth_map - depth_min) / (depth_max - depth_min)

    # 使用 matplotlib 的 colormap
    cmap = plt.get_cmap('viridis')  # 你可以选择其他 colormap，如 'jet', 'turbo' 等
    colored_depth = cmap(normalized_depth)

    # 转换为 BGR 格式以便使用 OpenCV 保存
    colored_depth = (colored_depth[:, :, :3] * 255).astype(np.uint8)
    colored_depth = cv2.cvtColor(colored_depth, cv2.COLOR_RGB2BGR)

    # 保存彩色化深度图
    cv2.imwrite(save_path, colored_depth)

def load_image(imfile, ratio=0.5):
    img = cv2.imread(imfile, 1)
    img = img[:, :, ::-1].copy() # reverse bgr to rgb
    img = torch.from_numpy(img).permute(2, 0, 1).float()
    return img[None].cuda()

def load_projmatrix_render_d415():
    cameraMgr = CameraMgr()
    calib_rgb = cameraMgr.sim_d415_rgb()
    calib_ir = cameraMgr.sim_d415_ir()
    pose_rgb = cameraMgr.sim_d415_rgb_pose()
    pose_ir1 = cameraMgr.sim_d415_ir1_pose()
    pose_ir2 = cameraMgr.sim_d415_ir2_pose()

    depth_min = torch.tensor(0.2)
    depth_max = torch.tensor(1.50)

    poses = [pose_rgb, pose_ir1, pose_ir2]
    intrinsics = [calib_rgb, calib_ir, calib_ir]

    poses = np.stack(poses, 0).astype(np.float32)
    intrinsics = np.stack(intrinsics, 0).astype(np.float32)

    poses = torch.from_numpy(poses)
    intrinsics = torch.from_numpy(intrinsics)

    proj = poses.clone()
    proj[:, :3, :4] = torch.matmul(intrinsics, poses[:, :3, :4])

    return proj[None].cuda(), depth_min[None].cuda(), depth_max[None].cuda()



class MVSGSNetEval():
    def __init__(self, cfgs):
        self.args = cfgs
        net = torch.nn.DataParallel(GraspNet(seed_feat_dim=cfgs.seed_feat_dim, graspness_threshold=cfgs.graspness_threshold, is_training=False))
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        net.to(device)
        # 加载检查点
        checkpoint = torch.load(cfgs.checkpoint_path)
        net.load_state_dict(checkpoint['model_state_dict'])
        net = net.module
        start_epoch = checkpoint['epoch']
        print("-> 加载检查点 %s (epoch: %d)" % (cfgs.checkpoint_path, start_epoch))

        # 初始化多视图立体模型
        mvs_net = torch.nn.DataParallel(RAFTMVS_2Layer(cfgs)).cuda()
        mvs_net.load_state_dict(torch.load(cfgs.restore_ckpt))
        mvs_net = mvs_net.module

        batch_interval = 100
        mvs_net.eval()
        net.eval()

        intrinsic = CameraMgr().sim_d415_rgb()
        self.camera = CameraInfo(640.0, 360.0, intrinsic[0][0], intrinsic[1][1], intrinsic[0][2], intrinsic[1][2], 1.0)

        self.mvs_net = mvs_net
        self.gsnet = net

        self.proj_matrices, self.dmin, self.dmax = load_projmatrix_render_d415()


    def infer(self, rgb_path, ir1_path, ir2_path, mask_path=None, zrot=None):
        with torch.no_grad():
            color = load_image(rgb_path)  # 加载彩色图像
            ir1 = load_image(ir1_path)  # 加载红外图像1
            ir2 = load_image(ir2_path)  # 加载红外图像2

            depth_up = self.mvs_net(color, ir1, ir2, self.proj_matrices.clone(), self.dmin, self.dmax, iters=self.args.valid_iters, test_mode=True)
            depth_up = (depth_up).squeeze()
            depth_2layer = depth_up.detach().cpu().numpy().squeeze()

            # 保存第一层深度图
            save_colored_depth_map(depth_2layer[0], './depth_0_colored.png')  # 保存第一层彩色深度图

            # 保存第二层深度图
            save_colored_depth_map(depth_2layer[1], './depth_1_colored.png')  # 保存第二层彩色深度图

            depth = depth_2layer[0]
            cloud = create_point_cloud_from_depth_image(depth, self.camera, organized=True)  # 从深度图创建点云
            depth_mask = (depth>0.25) & (depth<1.0)  # 深度掩码

            if mask_path is None:
                seg = np.ones(depth.shape)  # 如果没有掩码路径，创建全1掩码
            else:
                seg = cv2.imread(mask_path, -1)  # 加载掩码图像

            workspace_mask = get_workspace_mask(cloud, seg, organized=True, outlier=0.02)  # 获取工作空间掩码
            mask = (depth_mask & workspace_mask)  # 合并掩码

            cloud_masked = cloud[mask]  # 应用掩码到点云
            color = (color).squeeze()
            color = color.detach().cpu().numpy().squeeze()
            color = np.transpose(color, [1,2,0])
            color_masked = color[mask]  # 应用掩码到颜色
            idxs = np.random.choice(len(cloud_masked), 25000, replace=False)  # 随机采样点云
            cloud_sampled = cloud_masked[idxs]
            color_sampled = color_masked[idxs]

            # 添加第二层
            depth1 = depth_2layer[1]
            cloud1 = create_point_cloud_from_depth_image(depth1, self.camera, organized=True)  # 从第二层深度图创建点云
            depth_mask1 = (depth1 > 0.25) & (depth1 < 1.0) & (depth1-depth>0.01)  # 第二层深度掩码
            mask1 = (depth_mask1 & workspace_mask)  # 合并掩码
            cloud_masked1 = cloud1[mask1]
            comp_num_pt = 10000
            if (len(cloud_masked1) > 0):
                print('completed_point_cloud : ', len(cloud_masked1))
                if len(cloud_masked1) >= (comp_num_pt):
                    idxs = np.random.choice(len(cloud_masked1), comp_num_pt, replace=False)
                else:
                    idxs1 = np.arange(len(cloud_masked1))
                    idxs2 = np.random.choice(len(cloud_masked1), comp_num_pt - len(cloud_masked1),
                                             replace=True)
                    idxs = np.concatenate([idxs1, idxs2], axis=0)
                completed_sampled = cloud_masked1[idxs]

                cloud_sampled = np.concatenate([cloud_sampled, completed_sampled], axis=0)
                # completion points 不能用作 graspness 点，默认设置
                objectness_label = np.concatenate([np.ones([25000, ]), (-1)*np.ones([comp_num_pt, ])], axis=0)
                cloud_masked = np.concatenate([cloud_masked, cloud_masked1], axis=0)
            else:
                objectness_label = np.ones([25000, ])

            ret_dict = {'point_clouds': cloud_sampled.astype(np.float32),
                        'coors': cloud_sampled.astype(np.float32) / 0.005,
                        'feats': np.ones_like(cloud_sampled).astype(np.float32),
                        'full_point_clouds': cloud_masked.astype(np.float32),
                        'objectness_label': objectness_label.astype(np.int32),
                        }

            batch_data = minkowski_collate_fn([ret_dict])
            for key in batch_data:
                if 'list' in key:
                    for i in range(len(batch_data[key])):
                        for j in range(len(batch_data[key][i])):
                            batch_data[key][i][j] = batch_data[key][i][j].cuda()
                else:
                    batch_data[key] = batch_data[key].cuda()
            # 前向传播
            with torch.no_grad():
                try:
                    end_points = self.gsnet(batch_data)
                except:
                    return None
                grasp_preds = pred_decode(end_points)

            torch.cuda.empty_cache()
            preds = grasp_preds[0].detach().cpu().numpy()
            gg = GraspGroup(preds)

            if self.args.collision_thresh > 0:
                cloud = ret_dict['full_point_clouds']
                cloud = cloud_masked.astype(np.float32)
                mfcdetector = ModelFreeCollisionDetector(cloud, voxel_size=self.args.voxel_size_cd)
                collision_mask = mfcdetector.detect(gg, approach_dist=0.05, collision_thresh=self.args.collision_thresh)
                gg = gg[~collision_mask]

            gg = gg.nms()
            gg = gg.sort_by_score()
            count = gg.__len__()
            if count <1:
                return None
            if count > 20:
                count = 20
            gg = gg[:count]

            # 添加点云到可视化
            if DEBUG_VIS:
                grippers = gg.to_open3d_geometry_list()
                point_cloud = o3d.geometry.PointCloud()
                point_cloud.points = o3d.utility.Vector3dVector(ret_dict['point_clouds'].astype(np.float32))
                point_cloud.colors = o3d.utility.Vector3dVector(color_sampled.astype(np.float32)/255.0)
                o3d.visualization.draw_geometries([point_cloud, *grippers])

            gg = gg[:1]
            print("抓取宽度 : ", gg.widths)
            print("抓取得分 : ", gg.scores)
            return gg

if __name__ == '__main__':
    # 参数解析
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('--mixed_precision', action='store_true', help='使用混合精度')
    parser.add_argument('--valid_iters', type=int, default=16, help='前向传播期间的流场更新次数')

    # 架构选择
    parser.add_argument('--hidden_dims', nargs='+', type=int, default=[128] * 3,
                        help="隐藏状态和上下文维度")
    parser.add_argument('--corr_implementation', choices=["reg", "alt", "reg_cuda", "alt_cuda"], default="reg",
                        help="相关体实现")
    parser.add_argument('--shared_backbone', action='store_true',
                        help="使用单个骨干网络")
    parser.add_argument('--corr_levels', type=int, default=4, help="相关金字塔层级数")
    parser.add_argument('--corr_radius', type=int, default=4, help="相关金字塔宽度")
    parser.add_argument('--n_downsample', type=int, default=2, help="视差场分辨率")
    parser.add_argument('--context_norm', type=str, default="batch", choices=['group', 'batch', 'instance', 'none'],
                        help="上下文编码器归一化")
    parser.add_argument('--slow_fast_gru', action='store_true', help="低分辨率 GRU 更频繁迭代")
    parser.add_argument('--n_gru_layers', type=int, default=3, help="隐藏 GRU 层数")
    parser.add_argument('--num_sample', type=int, default=96, help="深度层级数")
    parser.add_argument('--depth_min', type=float, default=0.2, help="深度最小值")
    parser.add_argument('--depth_max', type=float, default=1.5, help="深度最大值")
    parser.add_argument('--train_2layer', default=True, help="")

    parser.add_argument('--restore_ckpt',
                        default=f'./checkpoints/raftmvs_2layer.pth',
                        help="恢复检查点")

    #########################################################################
    parser.add_argument('--checkpoint_path', default='./checkpoints/minkuresunet_epoch10.tar')
    parser.add_argument('--seed_feat_dim', default=512, type=int, help='点级特征维度')
    parser.add_argument('--num_point', type=int, default=25000, help='点数')
    parser.add_argument('--batch_size', type=int, default=1, help='推理时的批量大小')
    parser.add_argument('--voxel_size', type=float, default=0.005, help='稀疏卷积体素大小')
    parser.add_argument('--collision_thresh', type=float, default=0.01,
                        help='碰撞检测阈值')
    parser.add_argument('--voxel_size_cd', type=float, default=0.01, help='碰撞检测体素大小')
    parser.add_argument('--graspness_threshold', type=float, default=0, help='抓取性阈值')

    args = parser.parse_args()
    eval = MVSGSNetEval(args)

    # 测试输入
    rgb_path = f'./test_data/00100_0000_color.png'
    ir1_path = f'./test_data/00100_0000_ir_l.png'
    ir2_path = f'./test_data/00100_0000_ir_r.png' 
    gg = eval.infer(rgb_path, ir1_path, ir2_path)

分享完成～