【机器人】具身导航 VLN 最新论文汇总 | Vision-and-Language Navigation

本文汇总了具身导航的论文，供大家参考学习，涵盖2025、2024、2023等

覆盖的会议和期刊：CVPR、IROS、ICRA、RSS、arXiv等等

论文和方法会持续更新的～

一、🏠 中文标题版

2025 😆

$2025\] **WMNav：将视觉语言模型集成到世界模型中以实现对象目标导航** \[ [论文](https://arxiv.org/pdf/2503.02247 "论文 ")\] \[ [项目](https://b0b8k1ng.github.io/WMNav/ "项目 ")\] \[ [GitHub](https://github.com/B0B8K1ng/WMNavigation "GitHub")$
$2025\] **UniGoal：迈向通用零样本目标导向导航** \[ [论文](https://arxiv.org/pdf/2503.10630 "论文 ")\] \[ [项目](https://bagh2178.github.io/UniGoal/ "项目 ")\] \[ [GitHub](https://github.com/bagh2178/UniGoal "GitHub")$
$2025\] CityNavAgent：具有分层语义规划和全局记忆的空中视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2505.05622 "论文 ")\] \[ [GitHub](https://github.com/VinceOuti/CityNavAgent "GitHub ")$
$2025\] VL-Nav：基于空间推理的实时视觉语言导航 \[ [论文](https://arxiv.org/pdf/2502.00931 "论文 ")$
$2025\] **HA-VLN：具有动态多人交互、真实世界验证和开放排行榜的离散-连续环境中人机感知导航基准** \[ [论文](https://arxiv.org/pdf/2503.14229 "论文 ")\] \[ [项目](https://ha-vln-project.vercel.app/ "项目 ")\] \[ [GitHub](https://github.com/F1y1113/HA-VLN "GitHub")$
$2025\] FlexVLN：灵活适应多样化视觉和语言导航任务 \[ [论文](https://arxiv.org/pdf/2503.13966 "论文 ")$
$2025\] 3D-Mem：用于具身探索和推理的 3D 场景记忆 \[ [论文](https://arxiv.org/pdf/2411.17735 "论文")\] \[ [项目](https://umass-embodied-agi.github.io/3D-Mem/ "项目") \] \[ [GitHub](https://github.com/UMass-Embodied-AGI/3D-Mem "GitHub")$
$2025\] EfficientEQA：一种高效的开放词汇具体化问答方法 \[ [论文](https://arxiv.org/pdf/2410.20263 "论文")$
$2025\] 用于安全和平台感知机器人导航的学习感知前向动力学模型 \[ [论文](https://arxiv.org/pdf/2504.19322 "论文 ")\] \[ [GitHub](https://github.com/leggedrobotics/fdm "GitHub")$
$2025\] 室内体现人工智能中的语义映射------全面综述及未来方向 \[ [论文](https://arxiv.org/pdf/2501.05750 "论文 ")$
$2025\] TRAVEL：用于视觉和语言导航的免训练检索与对齐 \[ [论文](https://arxiv.org/pdf/2502.07306 "论文 ")$
$2025\] VR-Robo：用于视觉机器人导航和运动的真实到模拟到真实的框架 \[ [论文](https://arxiv.org/pdf/2502.01536 "论文 ")$
$2025\] NavigateDiff：视觉预测器是零样本导航助手 \[ [论文](https://arxiv.org/pdf/2502.13894 "论文 ")$
$2025\] MapNav：一种通过带注释的语义图实现的新型记忆表征，用于基于 VLM 的视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2502.13451 "论文 ")$
$2025\] OpenFly：用于空中视觉语言导航的多功能工具链和大规模基准测试 \[ [论文](https://arxiv.org/pdf/2502.18041 "论文 ")$
$2025\] 连续环境中的地面视点视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2502.19024 "论文 ")$
$2025\] 基于 LLM 推理的运动代理动态路径导航 \[ [论文](https://arxiv.org/pdf/2503.07323 "论文 ")$
$2025\] SmartWay：增强型航点预测和回溯，用于零样本视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2503.10069 "论文 ")$
$2025\] Vi-LAD：视觉语言注意力蒸馏在动态环境中实现社交感知机器人导航 \[ [论文](https://arxiv.org/pdf/2503.09820 "论文 ")$
$2025\] PanoGen++：面向视觉和语言导航的领域自适应文本引导全景环境生成 \[ [论文](https://arxiv.org/pdf/2503.09938 "论文 ")$
$2025\] 视觉想象能改善视觉和语言导航代理吗？\[ [论文](https://arxiv.org/pdf/2503.16394 "论文 ")\] \[ [项目](https://www.akhilperincherry.com/VLN-Imagine-website/ "项目 ")$
$2025\] P3Nav：集成感知、规划和预测的体现导航统一框架 \[ [论文](https://arxiv.org/pdf/2503.18525 "论文 ")$
$2025\] 从所见到未见：使用基础模型重写观察-指令以增强视觉-语言导航 \[ [论文](https://arxiv.org/pdf/2503.18065 "论文 ")\] \[ [GitHub](https://github.com/SaDil13/VLN-RAM "GitHub")$
$2025\] COSMO：结合选择性记忆实现低成本视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2503.24065 "论文 ")$
$2025\] ForesightNav：学习场景想象以实现高效探索 \[ [论文](https://arxiv.org/pdf/2504.16062 "论文 ")\] \[ [GitHub](https://github.com/uzh-rpg/foresight-nav "GitHub")$
$2025\] NavDP：利用特权信息引导学习模拟到现实的导航扩散策略 \[ [论文](https://arxiv.org/pdf/2505.08712 "论文 ")$
$2025\] VISTA：视觉和语言导航的生成视觉想象 \[ [论文](https://arxiv.org/pdf/2505.07868 "论文 ")$
$2025\] Dynam3D：动态分层 3D 令牌赋能 VLM 实现视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2505.11383 "论文 ")\] \[ [GitHub](https://github.com/MrZihan/Dynam3D "GitHub")$
$2025\] Aux-Think：探索数据高效视觉语言导航的推理策略 \[ [论文](https://arxiv.org/pdf/2505.11886 "论文 ")$

2024 😄

$2024\] E2Map：基于语言模型的自反思机器人导航体验与情感地图 [\[论文\]](https://arxiv.org/pdf/2409.10027 "[论文] ") [\[GitHub\]](https://github.com/knwoo/e2map " [GitHub] ")$
$2024\] 通过像素引导导航技能连接零样本目标导航和基础模型 [\[论文\]](https://arxiv.org/pdf/2309.10309 "[论文] ") [\[GitHub\]](https://github.com/wzcai99/Pixel-Navigator " [GitHub] ")$
$2024\] **NaVILA：用于导航的腿式机器人视觉 - 语言 - 行动模型** [\[论文\]](https://arxiv.org/pdf/2412.04453 "[论文] ") [\[GitHub\]](https://navila-bot.github.io/ " [GitHub] ")$
$2024\] Aim My Robot：对任何物体的精准局部导航 [\[论文\]](https://arxiv.org/pdf/2411.14770 "[论文] ")$
$2024\] MapGPT：用于视觉 - 语言导航的基于地图引导的提示与自适应路径规划 [\[论文\]](https://arxiv.org/pdf/2401.07314 "[论文] ") [\[GitHub\]](https://github.com/chen-judge/MapGPT/ " [GitHub] ")$
$2024\] VLFM：用于零样本语义导航的视觉 - 语言前沿地图 [\[论文\]](https://arxiv.org/pdf/2312.03275 "[论文] ") [\[GitHub\]](https://github.com/bdaiinstitute/vlfm " [GitHub] ")$
$2024\] 从想象中规划：用于视觉 - 语言导航的情景模拟和情景记忆 [\[论文\]](https://arxiv.org/pdf/2412.01857 "[论文] ")$
$2024\] 持续的视觉 - 语言导航 [\[论文\]](https://arxiv.org/pdf/2403.15049 "[论文] ")$
$2024\] 查找一切：多目标搜索的通用视觉语言模型方法 [\[论文\]](https://arxiv.org/pdf/2410.00388 "[论文] ") [\[GitHub\]](https://find-all-my-things.github.io/ " [GitHub] ")$
$2024\] NavGPT-2：释放大型视觉 - 语言模型的导航推理能力 [\[论文\]](https://arxiv.org/pdf/2407.12366 "[论文] ") [\[GitHub\]](https://github.com/GengzeZhou/NavGPT-2 " [GitHub] ")$
$2024\] 通过 3D 特征场实现视觉 - 语言导航的仿真到现实转移 [\[论文\]](https://arxiv.org/pdf/2406.09798 "[论文] ") [\[GitHub\]](https://github.com/MrZihan/Sim2Real-VLN-3DFF " [GitHub] ")$
$2024\] 使用大型语言模型模块化构建协作具身智能体 [\[论文\]](https://openreview.net/pdf?id=EnXJfQqy0K "[论文] ") [\[GitHub\]](https://github.com/UMass-Foundation-Model/Co-LLM-Agents " [GitHub] ")$
$2024\] The One RING：机器人室内导航通才 \[ [论文](https://arxiv.org/pdf/2412.14401 "论文 ")$
$2024\] Mobility VLA：基于长上下文 VLM 和拓扑图的多模态指令导航 \[ [论文](https://arxiv.org/pdf/2407.07775 "论文 ")$

2023 😲

$2023\] 通过像素引导导航技能连接零样本对象导航和基础模型 \[ [论文](https://arxiv.org/pdf/2309.10309 "论文 ")$
$2023\] 视觉目标导航的前沿语义探索[\[论文\]](https://arxiv.org/pdf/2304.05506 " [论文] ") [\[GitHub\]](https://github.com/ybgdgh/Frontier-Semantic-Exploration " [GitHub] ")$
$2023\] LANA：用于指令跟踪和生成的语言导航器[\[论文\]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LANA_A_Language-Capable_Navigator_for_Instruction_Following_and_Generation_CVPR_2023_paper.pdf " [论文] ") [\[GitHub\]](https://github.com/wxh1996/LANA-VLN " [GitHub] ")$
$2023\] A2Nav：利用基础模型的视觉和语言能力实现动作感知零样本机器人导航[\[论文\]](https://arxiv.org/pdf/2308.07997 " [论文] ")$

二、🔄 英文原版

2025 🐻

$2025\] 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning \[ [论文](https://arxiv.org/pdf/2411.17735 "论文")\] \[ [项目](https://umass-embodied-agi.github.io/3D-Mem/ "项目")$
$2025\] EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering \[ [论文](https://arxiv.org/pdf/2410.20263 "论文")$
$2025\] Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation \[[paper](https://arxiv.org/pdf/2504.19322 "paper")\] \[[project](https://github.com/leggedrobotics/fdm "project")$
$2025\] Semantic Mapping in Indoor Embodied AI - A Comprehensive Survey and Future Directions \[[paper](https://arxiv.org/pdf/2501.05750 "paper")$
$2025\] VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning \[[paper](https://arxiv.org/pdf/2502.00931 "paper")$
$2025\] TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2502.07306 "paper")$
$2025\] VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion \[[paper](https://arxiv.org/pdf/2502.01536 "paper")$
$2025\] NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants \[[paper](https://arxiv.org/pdf/2502.13894 "paper")$
$2025\] MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2502.13451 "paper")$
$2025\] OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2502.18041 "paper")$
$2025\] Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments \[[paper](https://arxiv.org/pdf/2502.19024 "paper")$
$2025\] WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation \[[paper](https://arxiv.org/pdf/2503.02247 "paper")\] \[[project](https://b0b8k1ng.github.io/WMNav/ "project")$
$2025\] Dynamic Path Navigation for Motion Agents with LLM Reasoning \[[paper](https://arxiv.org/pdf/2503.07323 "paper")$
$2025\] SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.10069 "paper")$
$2025\] Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments \[[paper](https://arxiv.org/pdf/2503.09820 "paper")$
$2025\] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation \[[paper](https://arxiv.org/pdf/2503.10630 "paper")\] \[[project](https://bagh2178.github.io/UniGoal/ "project")$
$2025\] PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.09938 "paper")$
$2025\] Do Visual Imaginations Improve Vision-and-Language Navigation Agents? \[[paper](https://arxiv.org/pdf/2503.16394 "paper")\] \[[project](https://www.akhilperincherry.com/VLN-Imagine-website/ "project")$
$2025\] HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard \[[paper](https://arxiv.org/pdf/2503.14229 "paper")\] \[[project](https://ha-vln-project.vercel.app/ "project")$
$2025\] FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks \[[paper](https://arxiv.org/pdf/2503.13966 "paper")$
$2025\] P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction \[[paper](https://arxiv.org/pdf/2503.18525 "paper")$
$2025\] Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2503.18065 "paper")\] \[[project](https://github.com/SaDil13/VLN-RAM "project")$
$2025\] COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.24065 "paper")$
$2025\] ForesightNav: Learning Scene Imagination for Efficient Exploration \[[paper](https://arxiv.org/pdf/2504.16062 "paper")\] \[[project](https://github.com/uzh-rpg/foresight-nav "project")$
$2025\] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory \[[paper](https://arxiv.org/pdf/2505.05622 "paper")\] \[[project](https://github.com/VinceOuti/CityNavAgent "project")$
$2025\] NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance \[[paper](https://arxiv.org/pdf/2505.08712 "paper")$
$2025\] VISTA: Generative Visual Imagination for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2505.07868 "paper")$
$2025\] Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2505.11383 "paper")\] \[[project](https://github.com/MrZihan/Dynam3D "project")$
$2025\] Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2505.11886 "paper")$

2024 🐵

$2024\] \[RSS 24\] Navid: Video-based vlm plans the next step for vision-and-language navigation \[[paper](https://arxiv.org/pdf/2402.15852 "paper")$
$2024\] \[RSS 24\] NaVILA: Legged Robot Vision-Language-Action Model for Navigation \[[paper](https://arxiv.org/pdf/2412.04453 "paper")$
$2024\] The One RING: a Robotic Indoor Navigation Generalist \[[paper](https://arxiv.org/pdf/2412.14401 "paper")$
$2024\] Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs \[[paper](https://arxiv.org/pdf/2407.07775 "paper")$
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models[Paper] [GitHub]
Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots[Paper] [GitHub]
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill[Paper] [GitHub]
InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment[Paper] [GitHub]
NaVILA: Legged Robot Vision-Language-Action Model for Navigation[Paper] [GitHub]
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation[Paper] [GitHub]
Aim My Robot: Precision Local Navigation to Any Object[Paper]
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models[Paper] [Project Page]
Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation[Paper] [GitHub]
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation[Paper] [GitHub]
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction[Paper] [GitHub]
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation[Paper] [GitHub]
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation[Paper] [GitHub]
Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation[Paper]
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains[Paper]
Continual Vision-and-Language Navigation[Paper]
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs[Paper]
Find Everything: A General Vision Language Model Approach to Multi-Object Search[Paper] [GitHub]
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models[Paper] [GitHub]
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models [Paper] [GitHub]
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation[Paper] [GitHub]
Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation[Paper] [GitHub]
LangNav: Language as a Perceptual Representation for Navigation[Paper] [GitHub]
Building Cooperative Embodied Agents Modularly with Large Language Models[Paper] [GitHub]

2023 🦆

$2023\] Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill \[[paper](https://arxiv.org/pdf/2309.10309 "paper")$
$2023\] Frontier semantic exploration for visual target navigation[\[论文\]](https://arxiv.org/pdf/2304.05506 " [论文] ") [\[GitHub\]](https://github.com/ybgdgh/Frontier-Semantic-Exploration " [GitHub] ")$
$2023\] LANA: A Language-Capable Navigator for Instruction Following and Generatio**n** [\[论文\]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LANA_A_Language-Capable_Navigator_for_Instruction_Following_and_Generation_CVPR_2023_paper.pdf " [论文] ") [\[GitHub\]](https://github.com/wxh1996/LANA-VLN " [GitHub] ")$
$2023\] A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models [\[论文\]](https://arxiv.org/pdf/2308.07997 " [论文] ")$

分享完成～