【机器人】复现 DOV-SG 机器人导航 | 动态开放词汇

DOV-SG 建了动态 3D 场景图，并使用LLM大型语言模型进行任务分解，从而能够在交互式探索过程中对 3D 场景图进行局部更新。

来自RA-L 2025，适合长时间的语言引导移动操作，动态开放词汇 3D 场景图。

论文地址：Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

代码地址：https://github.com/BJHYZJ/DovSG

本文分享DOV-SG复现和模型推理的过程～

下面是一个导航示例：

导航过程：（绿色点是当前位置，红色点目标位置，紫红色是导航轨迹）

1、创建Conda环境

首先创建一个Conda环境，名字为dovsg，python版本为3.9，然后进入dovsg环境

对于的两行执行命令：

复制代码

conda create -n dovsg python=3.9 -y
conda activate dovsg

然后下载代码，进入代码工程：https://github.com/BJHYZJ/DovSG.git

复制代码

git clone https://github.com/BJHYZJ/DovSG.git
cd DovSG

成功后如下图所示：

2、安装 PyTorch

使用 torch==2.3.1 、cuda-12.1 的版本进行安装，执行下面命令：

复制代码

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

等待安装完成，打印信息：

Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.1.105 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.3 torch-2.3.1+cu121 torchaudio-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1 typing-extensions-4.12.2

3、安装 Segment-Anything-2

这些需要指定 segment-anything-2的代码版本为 '7e1596c'，兼容后面其他依赖库

执行下面命令：

复制代码

cd third_party
git clone https://github.com/facebookresearch/sam2.git segment-anything-2
cd segment-anything-2
git checkout 7e1596c

运行过程打印信息：

然后修改 setup.py 代码，有两处需要修改的

line 27: "numpy>=1.24.4" ==> "numpy>=1.23.0",

line 144: python_requires=">=3.10.0" ==> python_requires=">=3.9.0"

再进行安装segment-anything-2，执行下面命令：

复制代码

pip install -e ".[demo]"

等待安装完成～

复制代码

  Attempting uninstall: SAM-2
    Found existing installation: SAM-2 1.0
    Uninstalling SAM-2-1.0:
      Successfully uninstalled SAM-2-1.0
Successfully installed SAM-2-1.0 anyio-4.9.0 argon2-cffi-25.1.0 argon2-cffi-bindings-21.2.0 
arrow-1.3.0 asttokens-3.0.0 async-lru-2.0.5 attrs-25.3.0 babel-2.17.0 beautifulsoup4-4.13.4 
bleach-6.2.0 certifi-2025.6.15 cffi-1.17.1 charset_normalizer-3.4.2 comm-0.2.2 contourpy-1.3.0 cycler-0.12.1 
debugpy-1.8.14 decorator-5.2.1 defusedxml-0.7.1 exceptiongroup-1.3.0 executing-2.2.0 fastjsonschema-2.21.1 
fonttools-4.58.4 fqdn-1.5.1 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 importlib-metadata-8.7.0 
importlib-resources-6.5.2 ipykernel-6.29.5 ipython-8.18.1 ipywidgets-8.1.7 isoduration-20.11.0 
jedi-0.19.2 json5-0.12.0 jsonpointer-3.0.0 jsonschema-4.24.0 jsonschema-specifications-2025.4.1 
jupyter-1.1.1 jupyter-client-8.6.3 jupyter-console-6.6.3 jupyter-core-5.8.1 jupyter-events-0.12.0 
jupyter-lsp-2.2.5 jupyter-server-2.16.0 jupyter-server-terminals-0.5.3 jupyterlab-4.4.4 jupyterlab-pygments-0.3.0 jupyterlab-server-2.27.3 jupyterlab_widgets-3.0.15 kiwisolver-1.4.7 
matplotlib-3.9.4 matplotlib-inline-0.1.7 mistune-3.1.3 nbclient-0.10.2 nbconvert-7.16.6 nbformat-5.10.4 nest-asyncio-1.6.0 notebook-7.4.4 notebook-shim-0.2.4 opencv-python-4.11.0.86 overrides-7.7.0 pandocfilters-1.5.1 parso-0.8.4 pexpect-4.9.0 platformdirs-4.3.8 
prometheus-client-0.22.1 prompt-toolkit-3.0.51 psutil-7.0.0 ptyprocess-0.7.0 pure-eval-0.2.3 
pycparser-2.22 pygments-2.19.2 pyparsing-3.2.3 python-dateutil-2.9.0.post0 python-json-logger-3.3.0 
pyzmq-27.0.0 referencing-0.36.2 requests-2.32.4 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 
rpds-py-0.25.1 send2trash-1.8.3 six-1.17.0 sniffio-1.3.1 soupsieve-2.7 stack-data-0.6.3 terminado-0.18.1 
tinycss2-1.4.0 tomli-2.2.1 tornado-6.5.1 traitlets-5.14.3 types-python-dateutil-2.9.0.20250516 
uri-template-1.3.0 urllib3-2.5.0 wcwidth-0.2.13 webcolors-24.11.1 webencodings-0.5.1 websocket-client-1.8.0 widgetsnbextension-4.0.14 zipp-3.23.0

4、安装 GroundingDINO

这些需要指定 GroundingDINO 的代码版本为 '856dde2'，兼容后面其他依赖库

执行下面命令：

复制代码

cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git GroundingDINO
cd GroundingDINO/
git checkout 856dde2

运行过程打印信息：

再进行安装GroundingDINO，执行下面命令：

复制代码

pip install -e .

等待安装完成～

5、安装 RAM & Tag2Text

这些需要指定 recognize-anything 的代码版本为 '88c2b0c'，兼容后面其他依赖库

执行下面命令：

复制代码

cd ..
git clone https://github.com/xinyu1205/recognize-anything.git
cd recognize-anything/
git checkout 88c2b0c

再分别执行下面命令，进行安装：

复制代码

pip install -r requirements.txt
pip install -e .

运行过程打印信息：

等待安装完成～

6、安装 ACE

执行下面命令：

复制代码

cd ../../ace/dsacstar/
conda install opencv
python setup.py install

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$ python setup.py install

Detected active conda environment: /home/lgp/anaconda3/envs/dovsg

Assuming OpenCV dependencies in:

........

........

creating dist

creating 'dist/dsacstar-0.0.0-py3.9-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it

removing 'build/bdist.linux-x86_64/egg' (and everything under it)

Processing dsacstar-0.0.0-py3.9-linux-x86_64.egg

creating /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg

Extracting dsacstar-0.0.0-py3.9-linux-x86_64.egg to /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages

Adding dsacstar 0.0.0 to easy-install.pth file

Installed /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg

Processing dependencies for dsacstar==0.0.0

Finished processing dependencies for dsacstar==0.0.0

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$

7、安装 LightGlue

这些需要指定 LightGlue 的代码版本为 'edb2b83'，兼容后面其他依赖库

执行下面命令：

复制代码

cd ../../third_party/
git clone https://github.com/cvg/LightGlue.git
cd LightGlue/
git checkout edb2b83
python -m pip install -e .

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ git clone https://github.com/cvg/LightGlue.git

正克隆到 'LightGlue'...

remote: Enumerating objects: 386, done.

remote: Counting objects: 100% (205/205), done.

remote: Compressing objects: 100% (119/119), done.

remote: Total 386 (delta 147), reused 86 (delta 86), pack-reused 181 (from 2)

接收对象中: 100% (386/386), 17.43 MiB | 13.39 MiB/s, 完成.

处理 delta 中: 100% (236/236), 完成.

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ ls

DROID-SLAM GroundingDINO LightGlue pytorch3d recognize-anything segment-anything-2

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ cd LightGlue/

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ git checkout edb2b83

注意：正在切换到 'edb2b83'。

..............................

HEAD 目前位于 edb2b83 fix compilation for torch v2.2.1 (#124)

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ python -m pip install -e .

Obtaining file:///home/lgp/2025_project/DovSG/third_party/LightGlue

Installing build dependencies ... done

Checking if build backend supports build_editable ... done

................

Successfully built lightglue

Installing collected packages: kornia_rs, kornia, lightglue

Successfully installed kornia-0.8.1 kornia_rs-0.1.9 lightglue-0.0

再安装 Faiss库：

复制代码

conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl

8、安装 PyTorch3d

这些需要指定 PyTorch3d 的代码版本为 '05cbea1'，兼容后面其他依赖库

执行下面命令：

复制代码

cd ..
git clone https://github.com/facebookresearch/pytorch3d.git                                        
cd pytorch3d/
git checkout 05cbea1
python setup.py install

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$ python setup.py install

......................

Using /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages

Finished processing dependencies for pytorch3d==0.7.7

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$

9、安装其他依赖包和dovsg库

首先安装一些依赖包，执行下面命令：

复制代码

cd ../../
pip install ipython cmake pybind11 ninja scipy==1.10.1 scikit-learn==1.4.0 pandas==2.0.3 hydra-core opencv-python openai-clip timm matplotlib==3.7.2 imageio timm open3d numpy-quaternion more-itertools pyliblzfse einops transformers pytorch-lightning wget gdown tqdm zmq torch_geometric numpy==1.23.0  # -i https://pypi.tuna.tsinghua.edu.cn/simple

再安装 protobuf、MinkowskiEngine 、graspnet api

复制代码

pip install protobuf==3.19.0
pip install git+https://github.com/pccws/MinkowskiEngine
pip install graspnetAPI

还需要安装 torch-cluster（先用wget下载xx.whl文件到本地，在用pip进行安装）

复制代码

wget https://data.pyg.org/whl/torch-2.3.0%2Bcu121/torch_cluster-1.6.3%2Bpt23cu121-cp39-cp39-linux_x86_64.whl

pip install torch_cluster-1.6.3+pt23cu121-cp39-cp39-linux_x86_64.whl

安装一些依赖包，执行下面命令：

复制代码

pip install numpy==1.23.0 supervision==0.14.0 shapely alphashape 
pip install pyrealsense2 open_clip_torch graphviz pyrender
pip install openai==1.56.1
pip install transforms3d==0.3.1 scikit-image==0.19.3

最后安装 dovsg：

复制代码

pip install -e .

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$ pip install -e .

Obtaining file:///home/lgp/2025_project/DovSG
Preparing metadata (setup.py) ... done
Installing collected packages: dovsg
Running setup.py develop for dovsg
Successfully installed dovsg

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$

补丁2025/7/4：可视化需要 graphviz

复制代码

sudo apt-get install graphviz
conda install -c conda-forge graphviz python-graphviz

10、安装 DROID-SLAM

这里的 DROID-SLAM 和 DOV-SG 需要分割开，创建一个新的Conda环境进行搭建。

这些需要指定 DROID-SLAM 的代码版本为 8016d2b，兼容其他依赖库,执行下面命令：

复制代码

cd ./third_party/
git clone https://github.com/princeton-vl/DROID-SLAM.git
cd DROID-SLAM/
git checkout 8016d2b

等待下载完成～

在DROID-SLAM/thirdparty/中需要存放：eigen、lietorch、tartanair_tools等依赖库，需要执行：

复制代码

git submodule update --init thirdparty/lietorch

这样拉取并初始化所有子模块，在 DROID-SLAM 根目录下执行上面命令，这样会把 thirdparty/lietorch 等子模块都拉下来。

1、创建Conda环境

首先创建一个Conda环境，名字为droidenv，python版本为3.9，然后进入droidenv环境

对于的两行执行命令：

复制代码

conda create -n droidenv python=3.9 -y
conda activate droidenv

2、安装PyTorch

复制代码

conda install pytorch=1.10 torchvision torchaudio cudatoolkit=11.3 -c pytorch -y

3、安装依赖包

复制代码

conda install suitesparse -c conda-forge -y
pip install open3d==0.15.2 scipy opencv-python==4.7.0.72 matplotlib pyyaml==6.0.2 tensorboard # -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install evo --upgrade --no-binary evo
pip install gdown
pip install numpy==1.23.0 numpy-quaternion==2023.0.4

等待下载完成～

4、安装torch-sactter

复制代码

wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl

5、安装 DROID-SLAM

配置使用gcc-10/g++10

复制代码

sudo apt install gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10

系统默认 CUDA 12.1，临时切换为 CUDA 11.3

注意：临时切换只在当前 shell session 生效，关闭终端后恢复原状态（CUDA 12.1）

复制代码

export CUDA_HOME=/usr/local/cuda-11.3
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

(droidenv) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/DROID-SLAM$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2021 NVIDIA Corporation

Built on Sun_Mar_21_19:15:46_PDT_2021

Cuda compilation tools, release 11.3, V11.3.58

Build cuda_11.3.r11.3/compiler.29745058_0

进行安装DROID-SLAM：

复制代码

python setup.py install

等待编译完成～

11、下载模型权重

在项目中，一共使用了7个模型（有些太多了），各个模型的版本及下载链接/方法如下：

anygrasp: when you get anygrasp license from here, it will provid checkpoint for you.
bert-base-uncased: https://huggingface.co/google-bert/bert-base-uncased
CLIP-ViT-H-14-laion2B-s32B-b79K: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints

模型权重存放的路径：

DovSG/

├── checkpoints

│ ├── anygrasp

│ │ ├── checkpoint_detection.tar

│ │ └── checkpoint_tracking.tar

│ ├── bert-base-uncased

│ │ ├── config.json

│ │ ├── model.safetensors

│ │ ├── tokenizer_config.json

│ │ ├── tokenizer.json

│ │ └── vocab.txt

│ ├── CLIP-ViT-H-14-laion2B-s32B-b79K

│ │ └── open_clip_pytorch_model.bin

│ ├── droid-slam

│ │ └── droid.pth

│ ├── GroundingDINO

│ │ ├── groundingdino_swint_ogc.pth

│ │ └── GroundingDINO_SwinT_OGC.py

│ ├── recognize_anything

│ │ └── ram_swin_large_14m.pth

│ └── segment-anything-2

│ └── sam2_hiera_large.pt

└── license

├── licenseCfg.json

├── ZhijieYan.lic

├── ZhijieYan.public_key

└── ZhijieYan.signature

...

下载大模型的权重需要：

需要在本地安装 Git LFS 工具（用于处理大文件）：

复制代码

sudo apt-get install git-lfs

安装后，在终端执行以下命令启用 LFS 支持：

复制代码

git lfs install

2、bert-base-uncased 权重

执行下面命令，进行下载：

复制代码

mkdir checkpoints
cd checkpoints/
git clone https://huggingface.co/google-bert/bert-base-uncased

等待下载完成～

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ mkdir checkpoints

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ cd checkpoints/

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/google-bert/bert-base-uncased

正克隆到 'bert-base-uncased'...

remote: Enumerating objects: 85, done.

remote: Total 85 (delta 0), reused 0 (delta 0), pack-reused 85 (from 1)

展开对象中: 100% (85/85), 330.58 KiB | 912.00 KiB/s, 完成.

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls

bert-base-uncased

拉取对应的权重：

复制代码

cd bert-base-uncased
git lfs pull

3、CLIP-ViT-H-14-laion2B-s32B-b79K 权重

执行下面命令，进行下载：

复制代码

cd ../
git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K

等待下载完成～

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K

正克隆到 'CLIP-ViT-H-14-laion2B-s32B-b79K'...

remote: Enumerating objects: 47, done.

remote: Counting objects: 100% (8/8), done.

remote: Compressing objects: 100% (8/8), done.

remote: Total 47 (delta 2), reused 0 (delta 0), pack-reused 39 (from 1)

展开对象中: 100% (47/47), 1.08 MiB | 1.64 MiB/s, 完成.

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls

bert-base-uncased CLIP-ViT-H-14-laion2B-s32B-b79K

拉取对应的权重：

复制代码

cd CLIP-ViT-H-14-laion2B-s32B-b79K
git lfs pull

4、droid-slam 、GroundingDINO、recognize_anything、segment-anything-2 权重

执行下面命令，创建不同权重的文件夹：

复制代码

cd ../
mkdir droid-slam
mkdir GroundingDINO
mkdir recognize_anything
mkdir segment-anything-2

这些权重只能在网页下载后，复制到对应文件夹中

12、下载数据集

数据集下载地址：https://drive.google.com/drive/folders/13v5QOrqjxye__kJwDIuD7kTdeSSNfR5x

下载后解压到DovSG主目录中，生成一个data_example目录；

备注：poses_droidslam是后面运行生成的，这里忽略～

13、使用DROID-SLAM进行姿势估计

激活 Conda 环境，执行下面命令：

复制代码

conda deactivate 
conda activate droidenv

修改代码：third_party/DROID-SLAM/droid_slam/trajectory_filler.py

在第90行的for循环需要修改：

python 复制代码

        # for (tstamp, image, intrinsic) in image_stream:
        for (tstamp, image, pose, intrinsic) in image_stream:
            tstamps.append(tstamp)
            images.append(image)
            intrinsics.append(intrinsic)

            if len(tstamps) == 16:
                pose_list += self.__fill(tstamps, images, intrinsics)
                tstamps, images, intrinsics = [], [], []

因为在image_stream返回了四个值的，这样才对：for (tstamp, image, pose, intrinsic) in image_stream

运行姿势估计，执行下面命令：

python 复制代码

python dovsg/scripts/pose_estimation.py \
    --datadir "data_example/room1" \
    --calib "data_example/room1/calib.txt" \
    --t0 0 \
    --stride 1 \
    --weights "checkpoints/droid-slam/droid.pth" \
    --buffer 2048

程序运行结束后，我们将看到一个名为的新文件夹poses_droidslam，data_example/room1其中包含所有视点的姿势。

运行信息：

Pose Estimation:: 100%|██████████████████████████████████████████████████████████| 739/739 [00:25<00:00, 29.32it/s]

################################

Global BA Iteration #1

Global BA Iteration #2

Global BA Iteration #3

Global BA Iteration #4

Global BA Iteration #5

Global BA Iteration #6

Global BA Iteration #7

################################

Global BA Iteration #1

Global BA Iteration #2

Global BA Iteration #3

Global BA Iteration #4

Global BA Iteration #5

Global BA Iteration #6

Global BA Iteration #7

Global BA Iteration #8

Global BA Iteration #9

Global BA Iteration #10

Global BA Iteration #11

Global BA Iteration #12

Result Pose Number is 739

14、可视化重建的场景

根据DROID-SLAM估计的姿势，可视化重建场景

激活 Conda 环境，执行下面命令：

python 复制代码

conda deactivate 
conda activate dovsg

重建3D场景，执行下面命令：

python 复制代码

python dovsg/scripts/show_pointcloud.py \
    --tags "room1" \
    --pose_tags "poses_droidslam"

可视化效果：

15、进行DOV-SG推理

执行下面命令：

复制代码

python demo.py \
    --tags "room1" \
    --preprocess \
    --debug \
    --task_scene_change_level "Minor Adjustment" \
    --task_description "Please move the red pepper to the plate, then move the green pepper to plate."

该代码的思路流程：

使用相机对房间进行扫描，收集 RGB-D 数据。
基于收集到的 RGB-D 数据估计相机姿态。
根据检测到的地面（floor）进行坐标系变换。
训练重定位模型（ACE），为后续操作提供支持。
生成视图数据集（View Dataset）。
利用视觉语言模型（VLMs）对现实世界中的对象进行表示，使其如同 3D 场景图（3D Scene Graph）中的节点（nodes）一般；同时采用基于规则（rule-based）的方法提取对象间的关系（relationships）。
提取 LightGlue 特征，以辅助后续的重定位任务。
将其应用于大语言模型（LLM）的任务规划中。
在执行重定位（relocalization）子任务时，对 3D 场景图进行持续更新（continuously updating）。

根据检测到的地面，进行坐标系变换：

get floor pcd and transform scene.: 100%|████████████████████████████████████████| 247/247 [00:41<00:00, 5.93it/s]

训练重定位模型（ACE），为后续操作提供支持：

Train ACE

create save folder: data_example/room1/ace

filling training buffers with 1000000/8000000 samples

filling training buffers with 2000000/8000000 samples

filling training buffers with 3000000/8000000 samples

filling training buffers with 4000000/8000000 samples

filling training buffers with 5000000/8000000 samples

filling training buffers with 6000000/8000000 samples

filling training buffers with 7000000/8000000 samples

filling training buffers with 8000000/8000000 samples

Train ACE Over!

运行打印信息：

final text_encoder_type: bert-base-uncased

==> Initializing CLIP model...

==> Done initializing CLIP model.

BertLMHeadModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.

If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes

If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).

If you are not the owner of the model architecture class, please contact the model code owner to update it.

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/\~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`

/encoder/layer/0/crossattention/self/query is tied

/encoder/layer/0/crossattention/self/key is tied

/encoder/layer/0/crossattention/self/value is tied

/encoder/layer/0/crossattention/output/dense is tied

/encoder/layer/0/crossattention/output/LayerNorm is tied

/encoder/layer/0/intermediate/dense is tied

/encoder/layer/0/output/dense is tied

/encoder/layer/0/output/LayerNorm is tied

/encoder/layer/1/crossattention/self/query is tied

/encoder/layer/1/crossattention/self/key is tied

/encoder/layer/1/crossattention/self/value is tied

/encoder/layer/1/crossattention/output/dense is tied

/encoder/layer/1/crossattention/output/LayerNorm is tied

/encoder/layer/1/intermediate/dense is tied

/encoder/layer/1/output/dense is tied

/encoder/layer/1/output/LayerNorm is tied

checkpoints/recognize_anything/ram_swin_large_14m.pth

load checkpoint from checkpoints/recognize_anything/ram_swin_large_14m.pth

vit: swin_l

semantic meomry: 100%|███████████████████████████████████████████████████████████| 247/247 [04:15<00:00, 1.03s/it]

.........

检测出的物体：

LLM大语言模型任务规划过程，打印信息：

${'action': 'Go to', 'object1': 'red pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'red pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'red pepper', 'object2': 'plate'}, {'action': 'Go to', 'object1': 'green pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'green pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'green pepper', 'object2': 'plate'}$
Initializing Instance Localizer.

Data process over!

===> get observations from robot.

observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/0_start.npy

Sampling 64 hypotheses.

通过ICP匹配，执行重定位子任务，对 3D 场景图进行持续更新

IPC Number: 5182, 7589, 6760

IPC Number: 20009, 35374, 27853

IPC Number: 80797, 179609, 129217

导航过程：（绿色点是当前位置，红色点目标位置，紫红色是导航轨迹）

Now are in step 0

Runing Go to(red pepper, None) Task.

A is red pepper

B is None

====> A* planning.
$\[2.33353067 0.83389901 3.92763996$ $2.05 0.55 4.19324287$ $1.85 0.2 5.09701148\]$

机器人找到物体，进行操作（请将红辣椒移到盘子里，然后将青椒移到盘子里）：

data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/navigation_vis.jpg

please move the agent to target point (Press Enter).===> get observations from robot.

observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/1_after_Go to(red pepper, None).npy

分享完成～

【机器人】复现 DOV-SG 机器人导航 | 动态开放词汇 | 3D 场景图

1、创建Conda环境

2、安装 PyTorch

3、安装 Segment-Anything-2

line 27: "numpy>=1.24.4" ==> "numpy>=1.23.0",

line 144: python_requires=">=3.10.0" ==> python_requires=">=3.9.0"

4、安装 GroundingDINO

5、安装 RAM & Tag2Text

6、安装 ACE

7、安装 LightGlue

8、安装 PyTorch3d

9、安装其他依赖包和dovsg库

10、安装 DROID-SLAM

11、下载模型权重

12、下载数据集

13、使用DROID-SLAM进行姿势估计

14、可视化重建的场景

15、进行DOV-SG推理