DOV-SG 建了动态 3D 场景图,并使用LLM大型语言模型进行任务分解,从而能够在交互式探索过程中对 3D 场景图进行局部更新。
来自RA-L 2025,适合长时间的 语言引导移动操作,动态开放词汇 3D 场景图。
论文地址:Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation
代码地址:https://github.com/BJHYZJ/DovSG
本文分享DOV-SG复现和模型推理的过程~
下面是一个导航示例:

导航过程:(绿色点是当前位置 ,红色点目标位置,紫红色是导航轨迹)

1、创建Conda环境
首先创建一个Conda环境,名字为dovsg,python版本为3.9,然后进入dovsg环境
对于的两行执行命令:
conda create -n dovsg python=3.9 -y
conda activate dovsg
然后下载代码,进入代码工程:https://github.com/BJHYZJ/DovSG.git
git clone https://github.com/BJHYZJ/DovSG.git
cd DovSG
成功后如下图所示:

2、安装 PyTorch
使用 torch==2.3.1 、cuda-12.1 的版本进行安装,执行下面命令:
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
等待安装完成,打印信息:
Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.1.105 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.3 torch-2.3.1+cu121 torchaudio-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1 typing-extensions-4.12.2
3、安装 Segment-Anything-2
这些需要指定 segment-anything-2的代码版本为 '7e1596c',兼容后面其他依赖库
执行下面命令:
cd third_party
git clone https://github.com/facebookresearch/sam2.git segment-anything-2
cd segment-anything-2
git checkout 7e1596c
运行过程打印信息:

然后修改 setup.py 代码,有两处需要修改的
line 27: "numpy>=1.24.4" ==> "numpy>=1.23.0",
line 144: python_requires=">=3.10.0" ==> python_requires=">=3.9.0"
再进行安装segment-anything-2,执行下面命令:
pip install -e ".[demo]"
等待安装完成~
Attempting uninstall: SAM-2
Found existing installation: SAM-2 1.0
Uninstalling SAM-2-1.0:
Successfully uninstalled SAM-2-1.0
Successfully installed SAM-2-1.0 anyio-4.9.0 argon2-cffi-25.1.0 argon2-cffi-bindings-21.2.0
arrow-1.3.0 asttokens-3.0.0 async-lru-2.0.5 attrs-25.3.0 babel-2.17.0 beautifulsoup4-4.13.4
bleach-6.2.0 certifi-2025.6.15 cffi-1.17.1 charset_normalizer-3.4.2 comm-0.2.2 contourpy-1.3.0 cycler-0.12.1
debugpy-1.8.14 decorator-5.2.1 defusedxml-0.7.1 exceptiongroup-1.3.0 executing-2.2.0 fastjsonschema-2.21.1
fonttools-4.58.4 fqdn-1.5.1 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 importlib-metadata-8.7.0
importlib-resources-6.5.2 ipykernel-6.29.5 ipython-8.18.1 ipywidgets-8.1.7 isoduration-20.11.0
jedi-0.19.2 json5-0.12.0 jsonpointer-3.0.0 jsonschema-4.24.0 jsonschema-specifications-2025.4.1
jupyter-1.1.1 jupyter-client-8.6.3 jupyter-console-6.6.3 jupyter-core-5.8.1 jupyter-events-0.12.0
jupyter-lsp-2.2.5 jupyter-server-2.16.0 jupyter-server-terminals-0.5.3 jupyterlab-4.4.4 jupyterlab-pygments-0.3.0 jupyterlab-server-2.27.3 jupyterlab_widgets-3.0.15 kiwisolver-1.4.7
matplotlib-3.9.4 matplotlib-inline-0.1.7 mistune-3.1.3 nbclient-0.10.2 nbconvert-7.16.6 nbformat-5.10.4 nest-asyncio-1.6.0 notebook-7.4.4 notebook-shim-0.2.4 opencv-python-4.11.0.86 overrides-7.7.0 pandocfilters-1.5.1 parso-0.8.4 pexpect-4.9.0 platformdirs-4.3.8
prometheus-client-0.22.1 prompt-toolkit-3.0.51 psutil-7.0.0 ptyprocess-0.7.0 pure-eval-0.2.3
pycparser-2.22 pygments-2.19.2 pyparsing-3.2.3 python-dateutil-2.9.0.post0 python-json-logger-3.3.0
pyzmq-27.0.0 referencing-0.36.2 requests-2.32.4 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1
rpds-py-0.25.1 send2trash-1.8.3 six-1.17.0 sniffio-1.3.1 soupsieve-2.7 stack-data-0.6.3 terminado-0.18.1
tinycss2-1.4.0 tomli-2.2.1 tornado-6.5.1 traitlets-5.14.3 types-python-dateutil-2.9.0.20250516
uri-template-1.3.0 urllib3-2.5.0 wcwidth-0.2.13 webcolors-24.11.1 webencodings-0.5.1 websocket-client-1.8.0 widgetsnbextension-4.0.14 zipp-3.23.0
4、安装 GroundingDINO
这些需要指定 GroundingDINO 的代码版本为 '856dde2',兼容后面其他依赖库
执行下面命令:
cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git GroundingDINO
cd GroundingDINO/
git checkout 856dde2
运行过程打印信息:

再进行安装GroundingDINO,执行下面命令:
pip install -e .
等待安装完成~

5、安装 RAM & Tag2Text
这些需要指定 recognize-anything 的代码版本为 '88c2b0c',兼容后面其他依赖库
执行下面命令:
cd ..
git clone https://github.com/xinyu1205/recognize-anything.git
cd recognize-anything/
git checkout 88c2b0c
再分别执行下面命令,进行安装:
pip install -r requirements.txt
pip install -e .
运行过程打印信息:

等待安装完成~
6、安装 ACE
执行下面命令:
cd ../../ace/dsacstar/
conda install opencv
python setup.py install
等待安装完成~
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$ python setup.py install
Detected active conda environment: /home/lgp/anaconda3/envs/dovsg
Assuming OpenCV dependencies in:
........
........
creating dist
creating 'dist/dsacstar-0.0.0-py3.9-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing dsacstar-0.0.0-py3.9-linux-x86_64.egg
creating /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Extracting dsacstar-0.0.0-py3.9-linux-x86_64.egg to /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Adding dsacstar 0.0.0 to easy-install.pth file
Installed /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Processing dependencies for dsacstar==0.0.0
Finished processing dependencies for dsacstar==0.0.0
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$
7、安装 LightGlue
这些需要指定 LightGlue 的代码版本为 'edb2b83',兼容后面其他依赖库
执行下面命令:
cd ../../third_party/
git clone https://github.com/cvg/LightGlue.git
cd LightGlue/
git checkout edb2b83
python -m pip install -e .
等待安装完成~
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ git clone https://github.com/cvg/LightGlue.git
正克隆到 'LightGlue'...
remote: Enumerating objects: 386, done.
remote: Counting objects: 100% (205/205), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 386 (delta 147), reused 86 (delta 86), pack-reused 181 (from 2)
接收对象中: 100% (386/386), 17.43 MiB | 13.39 MiB/s, 完成.
处理 delta 中: 100% (236/236), 完成.
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ ls
DROID-SLAM GroundingDINO LightGlue pytorch3d recognize-anything segment-anything-2
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ cd LightGlue/
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ git checkout edb2b83
注意:正在切换到 'edb2b83'。
..............................
HEAD 目前位于 edb2b83 fix compilation for torch v2.2.1 (#124)
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ python -m pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG/third_party/LightGlue
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
................
Successfully built lightglue
Installing collected packages: kornia_rs, kornia, lightglue
Successfully installed kornia-0.8.1 kornia_rs-0.1.9 lightglue-0.0
再安装 Faiss库:
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl
8、安装 PyTorch3d
这些需要指定 PyTorch3d 的代码版本为 '05cbea1',兼容后面其他依赖库
执行下面命令:
cd ..
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d/
git checkout 05cbea1
python setup.py install
等待安装完成~
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$ python setup.py install
......................
Using /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Finished processing dependencies for pytorch3d==0.7.7
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$
9、安装其他依赖包和dovsg库
首先安装一些依赖包,执行下面命令:
cd ../../
pip install ipython cmake pybind11 ninja scipy==1.10.1 scikit-learn==1.4.0 pandas==2.0.3 hydra-core opencv-python openai-clip timm matplotlib==3.7.2 imageio timm open3d numpy-quaternion more-itertools pyliblzfse einops transformers pytorch-lightning wget gdown tqdm zmq torch_geometric numpy==1.23.0 # -i https://pypi.tuna.tsinghua.edu.cn/simple
再安装 protobuf、MinkowskiEngine 、graspnet api
pip install protobuf==3.19.0
pip install git+https://github.com/pccws/MinkowskiEngine
pip install graspnetAPI
还需要安装 torch-cluster(先用wget下载xx.whl文件到本地,在用pip进行安装)
wget https://data.pyg.org/whl/torch-2.3.0%2Bcu121/torch_cluster-1.6.3%2Bpt23cu121-cp39-cp39-linux_x86_64.whl
pip install torch_cluster-1.6.3+pt23cu121-cp39-cp39-linux_x86_64.whl
安装一些依赖包,执行下面命令:
pip install numpy==1.23.0 supervision==0.14.0 shapely alphashape
pip install pyrealsense2 open_clip_torch graphviz pyrender
pip install openai==1.56.1
pip install transforms3d==0.3.1 scikit-image==0.19.3
最后安装 dovsg:
pip install -e .
等待安装完成~
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$ pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG
Preparing metadata (setup.py) ... done
Installing collected packages: dovsg
Running setup.py develop for dovsg
Successfully installed dovsg(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$
补丁2025/7/4:可视化需要 graphviz
sudo apt-get install graphviz
conda install -c conda-forge graphviz python-graphviz
10、安装 DROID-SLAM
这里的 DROID-SLAM 和 DOV-SG 需要分割开,创建一个新的Conda环境进行搭建。
这些需要指定 DROID-SLAM 的代码版本为 8016d2b,兼容其他依赖库,执行下面命令:
cd ./third_party/
git clone https://github.com/princeton-vl/DROID-SLAM.git
cd DROID-SLAM/
git checkout 8016d2b
等待下载完成~
在DROID-SLAM/thirdparty/中需要存放:eigen、lietorch、tartanair_tools等依赖库,需要执行:
git submodule update --init thirdparty/lietorch
这样拉取并初始化所有子模块,在 DROID-SLAM 根目录下执行上面命令,这样会把 thirdparty/lietorch
等子模块都拉下来。
1、创建Conda环境
首先创建一个Conda环境,名字为droidenv,python版本为3.9,然后进入droidenv环境
对于的两行执行命令:
conda create -n droidenv python=3.9 -y
conda activate droidenv
2、安装PyTorch
conda install pytorch=1.10 torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
3、安装依赖包
conda install suitesparse -c conda-forge -y
pip install open3d==0.15.2 scipy opencv-python==4.7.0.72 matplotlib pyyaml==6.0.2 tensorboard # -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install evo --upgrade --no-binary evo
pip install gdown
pip install numpy==1.23.0 numpy-quaternion==2023.0.4
等待下载完成~

4、安装torch-sactter
wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
5、安装 DROID-SLAM
配置使用gcc-10/g++10
sudo apt install gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10
系统默认 CUDA 12.1,临时切换为 CUDA 11.3
注意:临时切换只在当前 shell session 生效,关闭终端后恢复原状态(CUDA 12.1)
export CUDA_HOME=/usr/local/cuda-11.3
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
(droidenv) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/DROID-SLAM$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
进行安装DROID-SLAM:
python setup.py install
等待编译完成~

11、下载模型权重
在项目中,一共使用了7个模型(有些太多了),各个模型的版本及下载链接/方法如下:
- anygrasp: when you get anygrasp license from here, it will provid checkpoint for you.
- bert-base-uncased: https://huggingface.co/google-bert/bert-base-uncased
- CLIP-ViT-H-14-laion2B-s32B-b79K: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
- droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
- GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
- recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
- segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints
模型权重存放的路径:
DovSG/
├── checkpoints
│ ├── anygrasp
│ │ ├── checkpoint_detection.tar
│ │ └── checkpoint_tracking.tar
│ ├── bert-base-uncased
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── tokenizer_config.json
│ │ ├── tokenizer.json
│ │ └── vocab.txt
│ ├── CLIP-ViT-H-14-laion2B-s32B-b79K
│ │ └── open_clip_pytorch_model.bin
│ ├── droid-slam
│ │ └── droid.pth
│ ├── GroundingDINO
│ │ ├── groundingdino_swint_ogc.pth
│ │ └── GroundingDINO_SwinT_OGC.py
│ ├── recognize_anything
│ │ └── ram_swin_large_14m.pth
│ └── segment-anything-2
│ └── sam2_hiera_large.pt
└── license
├── licenseCfg.json
├── ZhijieYan.lic
├── ZhijieYan.public_key
└── ZhijieYan.signature
...
下载大模型的权重需要:
需要在本地安装 Git LFS 工具(用于处理大文件):
sudo apt-get install git-lfs
安装后,在终端执行以下命令启用 LFS 支持:
git lfs install
2、bert-base-uncased 权重
执行下面命令,进行下载:
mkdir checkpoints
cd checkpoints/
git clone https://huggingface.co/google-bert/bert-base-uncased
等待下载完成~
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ mkdir checkpoints
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ cd checkpoints/
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/google-bert/bert-base-uncased
正克隆到 'bert-base-uncased'...
remote: Enumerating objects: 85, done.
remote: Total 85 (delta 0), reused 0 (delta 0), pack-reused 85 (from 1)
展开对象中: 100% (85/85), 330.58 KiB | 912.00 KiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased
拉取对应的权重:
cd bert-base-uncased
git lfs pull
3、CLIP-ViT-H-14-laion2B-s32B-b79K 权重
执行下面命令,进行下载:
cd ../
git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
等待下载完成~
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
正克隆到 'CLIP-ViT-H-14-laion2B-s32B-b79K'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 47 (delta 2), reused 0 (delta 0), pack-reused 39 (from 1)
展开对象中: 100% (47/47), 1.08 MiB | 1.64 MiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased CLIP-ViT-H-14-laion2B-s32B-b79K
拉取对应的权重:
cd CLIP-ViT-H-14-laion2B-s32B-b79K
git lfs pull
4、droid-slam 、GroundingDINO、recognize_anything、segment-anything-2 权重
执行下面命令,创建不同权重的文件夹:
cd ../
mkdir droid-slam
mkdir GroundingDINO
mkdir recognize_anything
mkdir segment-anything-2
这些权重只能在网页下载后,复制到对应文件夹中
- droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
- GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
- recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
- segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints
12、下载数据集
数据集下载地址:https://drive.google.com/drive/folders/13v5QOrqjxye__kJwDIuD7kTdeSSNfR5x

下载后解压到DovSG主目录中,生成一个data_example目录;

备注:poses_droidslam是后面运行生成的,这里忽略~
13、使用DROID-SLAM进行姿势估计
激活 Conda 环境,执行下面命令:
conda deactivate
conda activate droidenv
修改代码:third_party/DROID-SLAM/droid_slam/trajectory_filler.py
在第90行的for循环需要修改:
python
# for (tstamp, image, intrinsic) in image_stream:
for (tstamp, image, pose, intrinsic) in image_stream:
tstamps.append(tstamp)
images.append(image)
intrinsics.append(intrinsic)
if len(tstamps) == 16:
pose_list += self.__fill(tstamps, images, intrinsics)
tstamps, images, intrinsics = [], [], []
因为在image_stream返回了四个值的,这样才对:for (tstamp, image, pose, intrinsic) in image_stream
运行姿势估计,执行下面命令:
python
python dovsg/scripts/pose_estimation.py \
--datadir "data_example/room1" \
--calib "data_example/room1/calib.txt" \
--t0 0 \
--stride 1 \
--weights "checkpoints/droid-slam/droid.pth" \
--buffer 2048
程序运行结束后,我们将看到一个名为 的新文件夹poses_droidslam
,data_example/room1
其中包含所有视点的姿势。
运行信息:
Pose Estimation:: 100%|██████████████████████████████████████████████████████████| 739/739 [00:25<00:00, 29.32it/s]
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
Global BA Iteration #8
Global BA Iteration #9
Global BA Iteration #10
Global BA Iteration #11
Global BA Iteration #12
Result Pose Number is 739
14、可视化重建的场景
根据DROID-SLAM
估计的姿势,可视化重建场景
激活 Conda 环境,执行下面命令:
python
conda deactivate
conda activate dovsg
重建3D场景,执行下面命令:
python
python dovsg/scripts/show_pointcloud.py \
--tags "room1" \
--pose_tags "poses_droidslam"
可视化效果:

15、进行DOV-SG推理
执行下面命令:
python demo.py \
--tags "room1" \
--preprocess \
--debug \
--task_scene_change_level "Minor Adjustment" \
--task_description "Please move the red pepper to the plate, then move the green pepper to plate."
该代码的思路流程:
- 使用相机对房间进行扫描,收集 RGB-D 数据。
- 基于收集到的 RGB-D 数据估计相机姿态。
- 根据检测到的地面(floor)进行坐标系变换。
- 训练重定位模型(ACE),为后续操作提供支持。
- 生成视图数据集(View Dataset)。
- 利用视觉语言模型(VLMs)对现实世界中的对象进行表示,使其如同 3D 场景图(3D Scene Graph)中的节点(nodes)一般;同时采用基于规则(rule-based)的方法提取对象间的关系(relationships)。
- 提取 LightGlue 特征,以辅助后续的重定位任务。
- 将其应用于大语言模型(LLM)的任务规划中。
- 在执行重定位(relocalization)子任务时,对 3D 场景图进行持续更新(continuously updating)。
根据检测到的地面,进行坐标系变换:
get floor pcd and transform scene.: 100%|████████████████████████████████████████| 247/247 [00:41<00:00, 5.93it/s]

训练重定位模型(ACE),为后续操作提供支持:
Train ACE
create save folder: data_example/room1/ace
filling training buffers with 1000000/8000000 samples
filling training buffers with 2000000/8000000 samples
filling training buffers with 3000000/8000000 samples
filling training buffers with 4000000/8000000 samples
filling training buffers with 5000000/8000000 samples
filling training buffers with 6000000/8000000 samples
filling training buffers with 7000000/8000000 samples
filling training buffers with 8000000/8000000 samples
Train ACE Over!
运行打印信息:
final text_encoder_type: bert-base-uncased
==> Initializing CLIP model...
==> Done initializing CLIP model.
BertLMHeadModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
If you are not the owner of the model architecture class, please contact the model code owner to update it.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/\~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
checkpoints/recognize_anything/ram_swin_large_14m.pth
load checkpoint from checkpoints/recognize_anything/ram_swin_large_14m.pth
vit: swin_l
semantic meomry: 100%|███████████████████████████████████████████████████████████| 247/247 [04:15<00:00, 1.03s/it]
.........
检测出的物体:

LLM大语言模型任务规划过程,打印信息:
{'action': 'Go to', 'object1': 'red pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'red pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'red pepper', 'object2': 'plate'}, {'action': 'Go to', 'object1': 'green pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'green pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'green pepper', 'object2': 'plate'}
Initializing Instance Localizer.
Data process over!
===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/0_start.npy
Sampling 64 hypotheses.
通过ICP匹配,执行重定位子任务,对 3D 场景图进行持续更新
IPC Number: 5182, 7589, 6760
IPC Number: 20009, 35374, 27853
IPC Number: 80797, 179609, 129217
导航过程:(绿色点是当前位置 ,红色点目标位置,紫红色是导航轨迹)
Now are in step 0
Runing Go to(red pepper, None) Task.
A is red pepper
B is None
====> A* planning.
\[2.33353067 0.83389901 3.92763996
2.05 0.55 4.19324287
1.85 0.2 5.09701148\]

机器人找到物体,进行操作(请将红辣椒移到盘子里,然后将青椒移到盘子里):
data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/navigation_vis.jpg
please move the agent to target point (Press Enter).===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/1_after_Go to(red pepper, None).npy

分享完成~