【机器人】具身导航 VLN 最新论文汇总 | Vision-and-Language Navigation

本文汇总了具身导航的论文,供大家参考学习,涵盖2025、2024、2023等

覆盖的会议和期刊:CVPR、IROS、ICRA、RSS、arXiv等等

论文和方法会持续更新的~

一、🏠 中文标题版

2025 😆

  • 2025\] **WMNav:将视觉语言模型集成到世界模型中以实现对象目标导航** \[ [论文](https://arxiv.org/pdf/2503.02247 "论文 ")\] \[ [项目](https://b0b8k1ng.github.io/WMNav/ "项目 ")\] \[ [GitHub](https://github.com/B0B8K1ng/WMNavigation "GitHub")

  • 2025\] **UniGoal:迈向通用零样本目标导向导航** \[ [论文](https://arxiv.org/pdf/2503.10630 "论文 ")\] \[ [项目](https://bagh2178.github.io/UniGoal/ "项目 ")\] \[ [GitHub](https://github.com/bagh2178/UniGoal "GitHub")

  • 2025\] CityNavAgent:具有分层语义规划和全局记忆的空中视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2505.05622 "论文 ")\] \[ [GitHub](https://github.com/VinceOuti/CityNavAgent "GitHub ")

  • 2025\] VL-Nav:基于空间推理的实时视觉语言导航 \[ [论文](https://arxiv.org/pdf/2502.00931 "论文 ")

  • 2025\] **HA-VLN:具有动态多人交互、真实世界验证和开放排行榜的离散-连续环境中人机感知导航基准** \[ [论文](https://arxiv.org/pdf/2503.14229 "论文 ")\] \[ [项目](https://ha-vln-project.vercel.app/ "项目 ")\] \[ [GitHub](https://github.com/F1y1113/HA-VLN "GitHub")

  • 2025\] FlexVLN:灵活适应多样化视觉和语言导航任务 \[ [论文](https://arxiv.org/pdf/2503.13966 "论文 ")

  • 2025\] 3D-Mem:用于具身探索和推理的 3D 场景记忆 \[ [论文](https://arxiv.org/pdf/2411.17735 "论文")\] \[ [项目](https://umass-embodied-agi.github.io/3D-Mem/ "项目") \] \[ [GitHub](https://github.com/UMass-Embodied-AGI/3D-Mem "GitHub")

  • 2025\] EfficientEQA:一种高效的开放词汇具体化问答方法 \[ [论文](https://arxiv.org/pdf/2410.20263 "论文")

  • 2025\] 用于安全和平台感知机器人导航的学习感知前向动力学模型 \[ [论文](https://arxiv.org/pdf/2504.19322 "论文 ")\] \[ [GitHub](https://github.com/leggedrobotics/fdm "GitHub")

  • 2025\] 室内体现人工智能中的语义映射------全面综述及未来方向 \[ [论文](https://arxiv.org/pdf/2501.05750 "论文 ")

  • 2025\] TRAVEL:用于视觉和语言导航的免训练检索与对齐 \[ [论文](https://arxiv.org/pdf/2502.07306 "论文 ")

  • 2025\] VR-Robo:用于视觉机器人导航和运动的真实到模拟到真实的框架 \[ [论文](https://arxiv.org/pdf/2502.01536 "论文 ")

  • 2025\] NavigateDiff:视觉预测器是零样本导航助手 \[ [论文](https://arxiv.org/pdf/2502.13894 "论文 ")

  • 2025\] MapNav:一种通过带注释的语义图实现的新型记忆表征,用于基于 VLM 的视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2502.13451 "论文 ")

  • 2025\] OpenFly:用于空中视觉语言导航的多功能工具链和大规模基准测试 \[ [论文](https://arxiv.org/pdf/2502.18041 "论文 ")

  • 2025\] 连续环境中的地面视点视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2502.19024 "论文 ")

  • 2025\] 基于 LLM 推理的运动代理动态路径导航 \[ [论文](https://arxiv.org/pdf/2503.07323 "论文 ")

  • 2025\] SmartWay:增强型航点预测和回溯,用于零样本视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2503.10069 "论文 ")

  • 2025\] Vi-LAD:视觉语言注意力蒸馏在动态环境中实现社交感知机器人导航 \[ [论文](https://arxiv.org/pdf/2503.09820 "论文 ")

  • 2025\] PanoGen++:面向视觉和语言导航的领域自适应文本引导全景环境生成 \[ [论文](https://arxiv.org/pdf/2503.09938 "论文 ")

  • 2025\] 视觉想象能改善视觉和语言导航代理吗?\[ [论文](https://arxiv.org/pdf/2503.16394 "论文 ")\] \[ [项目](https://www.akhilperincherry.com/VLN-Imagine-website/ "项目 ")

  • 2025\] P3Nav:集成感知、规划和预测的体现导航统一框架 \[ [论文](https://arxiv.org/pdf/2503.18525 "论文 ")

  • 2025\] 从所见到未见:使用基础模型重写观察-指令以增强视觉-语言导航 \[ [论文](https://arxiv.org/pdf/2503.18065 "论文 ")\] \[ [GitHub](https://github.com/SaDil13/VLN-RAM "GitHub")

  • 2025\] COSMO:结合选择性记忆实现低成本视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2503.24065 "论文 ")

  • 2025\] ForesightNav:学习场景想象以实现高效探索 \[ [论文](https://arxiv.org/pdf/2504.16062 "论文 ")\] \[ [GitHub](https://github.com/uzh-rpg/foresight-nav "GitHub")

  • 2025\] NavDP:利用特权信息引导学习模拟到现实的导航扩散策略 \[ [论文](https://arxiv.org/pdf/2505.08712 "论文 ")

  • 2025\] VISTA:视觉和语言导航的生成视觉想象 \[ [论文](https://arxiv.org/pdf/2505.07868 "论文 ")

  • 2025\] Dynam3D:动态分层 3D 令牌赋能 VLM 实现视觉和语言导航 \[ [论文](https://arxiv.org/pdf/2505.11383 "论文 ")\] \[ [GitHub](https://github.com/MrZihan/Dynam3D "GitHub")

  • 2025\] Aux-Think:探索数据高效视觉语言导航的推理策略 \[ [论文](https://arxiv.org/pdf/2505.11886 "论文 ")

2024 😄

  • 2024\] E2Map:基于语言模型的自反思机器人导航体验与情感地图 [\[论文\]](https://arxiv.org/pdf/2409.10027 "[论文] ") [\[GitHub\]](https://github.com/knwoo/e2map " [GitHub] ")

  • 2024\] 通过像素引导导航技能连接零样本目标导航和基础模型 [\[论文\]](https://arxiv.org/pdf/2309.10309 "[论文] ") [\[GitHub\]](https://github.com/wzcai99/Pixel-Navigator " [GitHub] ")

  • 2024\] **NaVILA:用于导航的腿式机器人视觉 - 语言 - 行动模型** [\[论文\]](https://arxiv.org/pdf/2412.04453 "[论文] ") [\[GitHub\]](https://navila-bot.github.io/ " [GitHub] ")

  • 2024\] Aim My Robot:对任何物体的精准局部导航 [\[论文\]](https://arxiv.org/pdf/2411.14770 "[论文] ")

  • 2024\] MapGPT:用于视觉 - 语言导航的基于地图引导的提示与自适应路径规划 [\[论文\]](https://arxiv.org/pdf/2401.07314 "[论文] ") [\[GitHub\]](https://github.com/chen-judge/MapGPT/ " [GitHub] ")

  • 2024\] VLFM:用于零样本语义导航的视觉 - 语言前沿地图 [\[论文\]](https://arxiv.org/pdf/2312.03275 "[论文] ") [\[GitHub\]](https://github.com/bdaiinstitute/vlfm " [GitHub] ")

  • 2024\] 从想象中规划:用于视觉 - 语言导航的情景模拟和情景记忆 [\[论文\]](https://arxiv.org/pdf/2412.01857 "[论文] ")

  • 2024\] 持续的视觉 - 语言导航 [\[论文\]](https://arxiv.org/pdf/2403.15049 "[论文] ")

  • 2024\] 查找一切:多目标搜索的通用视觉语言模型方法 [\[论文\]](https://arxiv.org/pdf/2410.00388 "[论文] ") [\[GitHub\]](https://find-all-my-things.github.io/ " [GitHub] ")

  • 2024\] NavGPT-2:释放大型视觉 - 语言模型的导航推理能力 [\[论文\]](https://arxiv.org/pdf/2407.12366 "[论文] ") [\[GitHub\]](https://github.com/GengzeZhou/NavGPT-2 " [GitHub] ")

  • 2024\] 通过 3D 特征场实现视觉 - 语言导航的仿真到现实转移 [\[论文\]](https://arxiv.org/pdf/2406.09798 "[论文] ") [\[GitHub\]](https://github.com/MrZihan/Sim2Real-VLN-3DFF " [GitHub] ")

  • 2024\] 使用大型语言模型模块化构建协作具身智能体 [\[论文\]](https://openreview.net/pdf?id=EnXJfQqy0K "[论文] ") [\[GitHub\]](https://github.com/UMass-Foundation-Model/Co-LLM-Agents " [GitHub] ")

  • 2024\] The One RING:机器人室内导航通才 \[ [论文](https://arxiv.org/pdf/2412.14401 "论文 ")

  • 2024\] Mobility VLA:基于长上下文 VLM 和拓扑图的多模态指令导航 \[ [论文](https://arxiv.org/pdf/2407.07775 "论文 ")

2023 😲

  • 2023\] 通过像素引导导航技能连接零样本对象导航和基础模型 \[ [论文](https://arxiv.org/pdf/2309.10309 "论文 ")

  • 2023\] 视觉目标导航的前沿语义探索[\[论文\]](https://arxiv.org/pdf/2304.05506 " [论文] ") [\[GitHub\]](https://github.com/ybgdgh/Frontier-Semantic-Exploration " [GitHub] ")

  • 2023\] LANA:用于指令跟踪和生成的语言导航器[\[论文\]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LANA_A_Language-Capable_Navigator_for_Instruction_Following_and_Generation_CVPR_2023_paper.pdf " [论文] ") [\[GitHub\]](https://github.com/wxh1996/LANA-VLN " [GitHub] ")

  • 2023\] A2Nav:利用基础模型的视觉和语言能力实现动作感知零样本机器人导航[\[论文\]](https://arxiv.org/pdf/2308.07997 " [论文] ")

二、🔄 英文原版

2025 🐻

  • 2025\] 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning \[ [论文](https://arxiv.org/pdf/2411.17735 "论文")\] \[ [项目](https://umass-embodied-agi.github.io/3D-Mem/ "项目")

  • 2025\] EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering \[ [论文](https://arxiv.org/pdf/2410.20263 "论文")

  • 2025\] Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation \[[paper](https://arxiv.org/pdf/2504.19322 "paper")\] \[[project](https://github.com/leggedrobotics/fdm "project")

  • 2025\] Semantic Mapping in Indoor Embodied AI - A Comprehensive Survey and Future Directions \[[paper](https://arxiv.org/pdf/2501.05750 "paper")

  • 2025\] VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning \[[paper](https://arxiv.org/pdf/2502.00931 "paper")

  • 2025\] TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2502.07306 "paper")

  • 2025\] VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion \[[paper](https://arxiv.org/pdf/2502.01536 "paper")

  • 2025\] NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants \[[paper](https://arxiv.org/pdf/2502.13894 "paper")

  • 2025\] MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2502.13451 "paper")

  • 2025\] OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2502.18041 "paper")

  • 2025\] Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments \[[paper](https://arxiv.org/pdf/2502.19024 "paper")

  • 2025\] WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation \[[paper](https://arxiv.org/pdf/2503.02247 "paper")\] \[[project](https://b0b8k1ng.github.io/WMNav/ "project")

  • 2025\] Dynamic Path Navigation for Motion Agents with LLM Reasoning \[[paper](https://arxiv.org/pdf/2503.07323 "paper")

  • 2025\] SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.10069 "paper")

  • 2025\] Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments \[[paper](https://arxiv.org/pdf/2503.09820 "paper")

  • 2025\] UniGoal: Towards Universal Zero-shot Goal-oriented Navigation \[[paper](https://arxiv.org/pdf/2503.10630 "paper")\] \[[project](https://bagh2178.github.io/UniGoal/ "project")

  • 2025\] PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.09938 "paper")

  • 2025\] Do Visual Imaginations Improve Vision-and-Language Navigation Agents? \[[paper](https://arxiv.org/pdf/2503.16394 "paper")\] \[[project](https://www.akhilperincherry.com/VLN-Imagine-website/ "project")

  • 2025\] HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard \[[paper](https://arxiv.org/pdf/2503.14229 "paper")\] \[[project](https://ha-vln-project.vercel.app/ "project")

  • 2025\] FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks \[[paper](https://arxiv.org/pdf/2503.13966 "paper")

  • 2025\] P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction \[[paper](https://arxiv.org/pdf/2503.18525 "paper")

  • 2025\] Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2503.18065 "paper")\] \[[project](https://github.com/SaDil13/VLN-RAM "project")

  • 2025\] COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2503.24065 "paper")

  • 2025\] ForesightNav: Learning Scene Imagination for Efficient Exploration \[[paper](https://arxiv.org/pdf/2504.16062 "paper")\] \[[project](https://github.com/uzh-rpg/foresight-nav "project")

  • 2025\] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory \[[paper](https://arxiv.org/pdf/2505.05622 "paper")\] \[[project](https://github.com/VinceOuti/CityNavAgent "project")

  • 2025\] NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance \[[paper](https://arxiv.org/pdf/2505.08712 "paper")

  • 2025\] VISTA: Generative Visual Imagination for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2505.07868 "paper")

  • 2025\] Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation \[[paper](https://arxiv.org/pdf/2505.11383 "paper")\] \[[project](https://github.com/MrZihan/Dynam3D "project")

  • 2025\] Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation \[[paper](https://arxiv.org/pdf/2505.11886 "paper")

2024 🐵

  • 2024\] \[RSS 24\] Navid: Video-based vlm plans the next step for vision-and-language navigation \[[paper](https://arxiv.org/pdf/2402.15852 "paper")

  • 2024\] \[RSS 24\] NaVILA: Legged Robot Vision-Language-Action Model for Navigation \[[paper](https://arxiv.org/pdf/2412.04453 "paper")

  • 2024\] The One RING: a Robotic Indoor Navigation Generalist \[[paper](https://arxiv.org/pdf/2412.14401 "paper")

  • 2024\] Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs \[[paper](https://arxiv.org/pdf/2407.07775 "paper")

  • E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models[Paper] [GitHub]
  • Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots[Paper] [GitHub]
  • Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill[Paper] [GitHub]
  • InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment[Paper] [GitHub]
  • NaVILA: Legged Robot Vision-Language-Action Model for Navigation[Paper] [GitHub]
  • ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation[Paper] [GitHub]
  • Aim My Robot: Precision Local Navigation to Any Object[Paper]
  • Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models[Paper] [Project Page]
  • Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation[Paper] [GitHub]
  • MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation[Paper] [GitHub]
  • CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction[Paper] [GitHub]
  • VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation[Paper] [GitHub]
  • Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation[Paper] [GitHub]
  • Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation[Paper]
  • MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains[Paper]
  • Continual Vision-and-Language Navigation[Paper]
  • Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs[Paper]
  • Find Everything: A General Vision Language Model Approach to Multi-Object Search[Paper] [GitHub]
  • NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models[Paper] [GitHub]
  • NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models [Paper] [GitHub]
  • Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation[Paper] [GitHub]
  • Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation[Paper] [GitHub]
  • LangNav: Language as a Perceptual Representation for Navigation[Paper] [GitHub]
  • Building Cooperative Embodied Agents Modularly with Large Language Models[Paper] [GitHub]

2023 🦆

  • 2023\] Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill \[[paper](https://arxiv.org/pdf/2309.10309 "paper")

  • 2023\] Frontier semantic exploration for visual target navigation[\[论文\]](https://arxiv.org/pdf/2304.05506 " [论文] ") [\[GitHub\]](https://github.com/ybgdgh/Frontier-Semantic-Exploration " [GitHub] ")

  • 2023\] LANA: A Language-Capable Navigator for Instruction Following and Generatio**n** [\[论文\]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_LANA_A_Language-Capable_Navigator_for_Instruction_Following_and_Generation_CVPR_2023_paper.pdf " [论文] ") [\[GitHub\]](https://github.com/wxh1996/LANA-VLN " [GitHub] ")

  • 2023\] A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models [\[论文\]](https://arxiv.org/pdf/2308.07997 " [论文] ")

分享完成~

相关推荐
yzx9910131 小时前
Python开发系统项目
人工智能·python·深度学习·django
高效匠人1 小时前
人工智能-Chain of Thought Prompting(思维链提示,简称CoT)
人工智能
要努力啊啊啊3 小时前
GaLore:基于梯度低秩投影的大语言模型高效训练方法详解一
论文阅读·人工智能·语言模型·自然语言处理
先做个垃圾出来………3 小时前
《机器学习系统设计》
人工智能·机器学习
s153353 小时前
6.RV1126-OPENCV 形态学基础膨胀及腐蚀
人工智能·opencv·计算机视觉
jndingxin3 小时前
OpenCV CUDA模块特征检测------角点检测的接口createMinEigenValCorner()
人工智能·opencv·计算机视觉
Tianyanxiao3 小时前
宇树科技更名“股份有限公司”深度解析:机器人企业IPO前奏与资本化路径
人工智能
道可云4 小时前
道可云人工智能每日资讯|北京农业人工智能与机器人研究院揭牌
人工智能·机器人·ar·deepseek
艾醒(AiXing-w)4 小时前
探索大语言模型(LLM):参数量背后的“黄金公式”与Scaling Law的启示
人工智能·语言模型·自然语言处理
极光JIGUANG4 小时前
GPTBots在AI大语言模型应用中敏感数据匿名化探索和实践
人工智能