[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--强化学习、模仿学习、机器人、开放词汇

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持
如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉语言导航VLN

[强化学习 RL](#强化学习 RL)

[模仿学习 IL](#模仿学习 IL)

机器人

开放词汇，检测分割

==RL==

标题: Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

作者: Mustafa Shukor, Alexandre Rame, Corentin Dancette

PubTime: 2024-01-22

Downlink: http://arxiv.org/abs/2310.00647v2

Project: https://evalign-icl.github.io/\|

GitHub: https://github.com/mshukor/EvALign-ICL.\|

中文摘要: 随着大型语言模型（LLMs）的成功，大型多模态模型（LMMs），如Flamingo模型及其后续竞争对手，已经开始成为走向通才代理的自然步骤。然而，与最近的LMM的互动揭示了当前评估基准很难捕捉到的主要局限性。事实上，任务性能（例如，VQA准确性）本身并不能提供足够的线索来理解它们的真实能力、局限性以及这些模型在多大程度上符合人类的期望。为了完善我们对这些缺陷的理解，我们偏离了当前的评估范式，并且（1）在5个不同的轴上评估了10个最近的开源LMM，从3B到80B参数尺度；幻觉、弃权、组合性、可解释性和指令遵循。我们对这些轴的评估揭示了LMMs的主要缺陷。虽然当前调整这些模型的首选解决方案是基于培训，如指令调整或RLHF，但我们宁愿（2）探索免培训情境学习（ICL）作为解决方案，并研究它如何影响这些限制。基于我们的ICL研究，（3）我们进一步推动ICL，并提出新的多模态ICL变体，如；多任务------ICL，后见之明链------ICL，和自我纠正------ICL。我们的发现如下。（1）尽管LMM取得了成功，但它们仍有缺陷，仅通过扩展无法解决。（2）ICL对LMMs缺陷的影响是微妙的；尽管ICL对提高可解释性和答案弃权很有效，但它只是稍微提高了指令遵循，并没有提高写作能力，实际上甚至放大了幻觉。（3）建议的ICL变体作为有效解决其中一些缺陷的事后方法是有希望的。代码可在以下网址获得：https://github.com/mshukor/EvALign-ICL。

摘要: Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not provide enough clues to understand their real capabilities, limitations, and to which extent such models are aligned to human expectations. To refine our understanding of those flaws, we deviate from the current evaluation paradigm, and (1) evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following. Our evaluation on these axes reveals major flaws in LMMs. While the current go-to solution to align these models is based on training, such as instruction tuning or RLHF, we rather (2) explore the training-free in-context learning (ICL) as a solution, and study how it affects these limitations. Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL. Our findings are as follows. (1) Despite their success, LMMs have flaws that remain unsolved with scaling alone. (2) The effect of ICL on LMMs flaws is nuanced; despite its effectiveness for improved explainability, answer abstention, ICL only slightly improves instruction following, does not improve compositional abilities, and actually even amplifies hallucinations. (3) The proposed ICL variants are promising as post-hoc approaches to efficiently tackle some of those flaws. The code is available here: https://github.com/mshukor/EvALign-ICL.

标题: Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints

作者: Yasunori Toshimitsu, Benedek Forrai, Barnabas Gavin Cangan

PubTime: 2024-01-22

Downlink: http://arxiv.org/abs/2308.02453v3

Project: https://srl-ethz.github.io/get-ball-rolling/\|https://youtu.be/YahsMhqNU8o\|

GitHub: https://github.com/srl-ethz/faive_gym_oss\|

中文摘要: 仿生、灵巧的机器人手有潜力复制人类可以完成的许多任务，并获得作为通用操作平台的地位。强化学习（RL）框架的最新进展在四足运动和灵巧操纵任务中取得了显著的性能。结合能够并行模拟数千个机器人的基于GPU的高度并行化模拟，基于RL的控制器变得更加可扩展和可接近。然而，为了将RL训练的策略带到现实世界中，我们需要输出可以与物理致动器和传感器一起工作的策略的训练框架，以及可以用可访问的材料制造但足够健壮以运行交互式策略的硬件平台。本工作介绍了仿生肌腱驱动的Faive手及其系统架构，该系统使用肌腱驱动的滚动接触关节来实现3D可打印、鲁棒的高自由度手设计。我们对手的每个元素进行建模，并将其集成到GPU模拟环境中，用RL训练策略，并实现灵巧的手握球体旋转技能向物理机器人手的零镜头转移。

摘要: Biomimetic, dexterous robotic hands have the potential to replicate much of the tasks that a human can do, and to achieve status as a general manipulation platform. Recent advances in reinforcement learning (RL) frameworks have achieved remarkable performance in quadrupedal locomotion and dexterous manipulation tasks. Combined with GPU-based highly parallelized simulations capable of simulating thousands of robots in parallel, RL-based controllers have become more scalable and approachable. However, in order to bring RL-trained policies to the real world, we require training frameworks that output policies that can work with physical actuators and sensors as well as a hardware platform that can be manufactured with accessible materials yet is robust enough to run interactive policies. This work introduces the biomimetic tendon-driven Faive Hand and its system architecture, which uses tendon-driven rolling contact joints to achieve a 3D printable, robust high-DoF hand design. We model each element of the hand and integrate it into a GPU simulation environment to train a policy with RL, and achieve zero-shot transfer of a dexterous in-hand sphere rotation skill to the physical robot hand.

标题: Training Diffusion Models with Reinforcement Learning

作者: Kevin Black, Michael Janner, Yilun Du

PubTime: 2024-01-04

Downlink: http://arxiv.org/abs/2305.13301v4

Project: http://rl-diffusion.github.io|

中文摘要: 扩散模型是一类灵活的生成模型，其训练近似于对数似然目标。然而，大多数扩散模型的用例并不关注可能性，而是关注下游目标，如人类感知的图像质量或药物有效性。在本文中，我们研究了用于直接优化此类目标的扩散模型的强化学习方法。我们描述了将去噪作为一个多步骤决策问题如何实现一类策略梯度算法，我们称之为去噪扩散策略优化（DDPO），该算法比其他奖励加权似然方法更有效。从经验上讲，DDPO能够使文本到图像的扩散模型适应难以通过提示表达的目标，如图像压缩性，以及来自人类反馈的目标，例如审美质量。最后，我们展示了DDPO可以使用来自视觉语言模型的反馈来改进即时图像对齐，而不需要额外的数据收集或人工注释。该项目的网站位于http://rl-diffusion.github.io.

摘要: Diffusion models are a class of flexible generative models trained with an

approximation to the log-likelihood objective. However, most use cases of

diffusion models are not concerned with likelihoods, but instead with

downstream objectives such as human-perceived image quality or drug

effectiveness. In this paper, we investigate reinforcement learning methods for

directly optimizing diffusion models for such objectives. We describe how

posing denoising as a multi-step decision-making problem enables a class of

policy gradient algorithms, which we refer to as denoising diffusion policy

optimization (DDPO), that are more effective than alternative reward-weighted

likelihood approaches. Empirically, DDPO is able to adapt text-to-image

diffusion models to objectives that are difficult to express via prompting,

such as image compressibility, and those derived from human feedback, such as

aesthetic quality. Finally, we show that DDPO can improve prompt-image

alignment using feedback from a vision-language model without the need for

additional data collection or human annotation. The project's website can be

found at http://rl-diffusion.github.io .

标题: Bridging the Gap Between Target Networks and Functional Regularization

作者: Alexandre Piche, Valentin Thomas, Joseph Marino

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2210.12282v2

Project: https://openreview.net/forum?id=BFvoemrmqX\|

中文摘要: 自举是深度强化学习许多成功的背后原因。然而，通过自举学习价值函数往往会由于目标值的快速变化而导致训练不稳定。通过使用一组附加的滞后参数来估计目标值，目标网络被用来稳定训练。尽管目标网络很受欢迎，但它们对优化的影响仍然被误解。在这项工作中，我们表明，他们作为一个隐式正则化。这种正则化器具有不灵活和非凸等缺点。为了克服这些问题，我们提出了一个显式函数正则化，它是函数空间中的一个凸正则化子，并且易于调整。我们从理论上分析了我们的方法的收敛性，并从经验上证明了用更有理论基础的函数正则化方法代替目标网络导致更好的样本效率和性能改进。

摘要: Bootstrapping is behind much of the successes of Deep Reinforcement Learning.

However, learning the value function via bootstrapping often leads to unstable

training due to fast-changing target values. Target Networks are employed to

stabilize training by using an additional set of lagging parameters to estimate

the target values. Despite the popularity of Target Networks, their effect on

the optimization is still misunderstood. In this work, we show that they act as

an implicit regularizer. This regularizer has disadvantages such as being

inflexible and non convex. To overcome these issues, we propose an explicit

Functional Regularization that is a convex regularizer in function space and

can easily be tuned. We analyze the convergence of our method theoretically and

empirically demonstrate that replacing Target Networks with the more

theoretically grounded Functional Regularization approach leads to better

sample efficiency and performance improvements.

标题: Understanding the Effects of RLHF on LLM Generalisation and Diversity

作者: Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2310.06452v2

GitHub: https://github.com/facebookresearch/rlfh-gen-div\|

中文摘要: 大型语言模型（LLMs）通过从人类反馈（RLHF）的强化学习进行了微调，已被用于迄今为止一些部署最广泛的人工智能模型，如OpenAI的ChatGPT或Anthropic的Claude。%，或Meta的美洲驼-2。虽然在开发这些方法方面已经做了大量的工作，但是我们对RLHF每个阶段的优点和缺点的理解仍然有限。为了填补这一空白，我们对该过程的每个阶段（即监督微调（SFT）、奖励建模和RLHF）如何影响两个关键属性进行了广泛的分析：分布外（OOD）概括和输出多样性。考虑到这些模型被使用的真实世界场景的广泛范围，OOD泛化是至关重要的，而输出多样性是指模型生成不同输出的能力，并且对于各种用例是重要的。我们对总结和指导任务的两个基本模型进行分析，后者与当前的LLM用例高度相关。我们发现RLHF比SFT更能推广到新的输入，特别是当训练和测试之间的分布偏移变大时。然而，与SFT相比，RLHF在各种测量中显著降低了输出多样性，这意味着当前LLM微调方法在泛化和多样性之间进行了权衡。我们的结果为根据应用应该使用哪种微调方法提供了指导，并表明需要更多的研究来改善普遍性和多样性之间的权衡。

摘要: Large language models (LLMs) fine-tuned with reinforcement learning from

human feedback (RLHF) have been used in some of the most widely deployed AI

models to date, such as OpenAI's ChatGPT or Anthropic's Claude. % , or Meta's

LLaMA-2. While there has been significant work developing these methods, our

understanding of the benefits and downsides of each stage in RLHF is still

limited. To fill this gap, we present an extensive analysis of how each stage

of the process (i.e.~supervised fine-tuning (SFT), reward modelling, and RLHF)

affects two key properties: out-of-distribution (OOD) generalisation and output

diversity. OOD generalisation is crucial given the wide range of real-world

scenarios in which these models are being used, while output diversity refers

to the model's ability to generate varied outputs and is important for a

variety of use cases. We perform our analysis across two base models on both

summarisation and instruction following tasks, the latter being highly relevant

for current LLM use cases. We find that RLHF generalises better than SFT to new

inputs, particularly as the distribution shift between train and test becomes

larger. However, RLHF significantly reduces output diversity compared to SFT

across a variety of measures, implying a tradeoff in current LLM fine-tuning

methods between generalisation and diversity. Our results provide guidance on

which fine-tuning method should be used depending on the application, and show

that more research is needed to improve the tradeoff between generalisation and

diversity.

标题: Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

作者: Marco Pleines, Matthias Pallasch, Frank Zimmer

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2309.17207v3

GitHub: https://github.com/MarcoMeter/endless-memory-gym/\|

中文摘要: Memory Gym提供了一套2D部分可观察的环境，即迫击炮伤害、神秘路径和灼热的聚光灯，旨在对决策代理的记忆能力进行基准测试。这些最初任务有限的环境被扩展成创新的、无止境的格式，反映了累积记忆游戏（如"我打包了我的包"）不断升级的挑战。任务设计的这一进展将重点从仅仅评估样本效率转移到探索动态、长时间场景中的记忆效率水平。为了解决可用的基于内存的深度强化学习基线中的差距，我们引入了一种将Transformer model-XL（TrXL）与近似策略优化相集成的实现。这种方法利用TrXL作为情景记忆的一种形式，采用滑动窗口技术。我们对门控循环单元（GRU）和TrXL的比较研究揭示了不同设置下的不同性能。在有限环境下，TrXL在神秘路径中表现出优越的采样效率，在迫击炮伤害中表现出色。然而，GRU在灼热的聚光灯下效率更高。最值得注意的是，在所有没完没了的任务中，GRU取得了显著的复苏，持续大幅超过TrXL。网站和源代码：https://github.com/MarcoMeter/endless-memory-gym/

摘要: Memory Gym presents a suite of 2D partially observable environments, namely

Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark

memory capabilities in decision-making agents. These environments, originally

with finite tasks, are expanded into innovative, endless formats, mirroring the

escalating challenges of cumulative memory games such as ``I packed my bag''.

This progression in task design shifts the focus from merely assessing sample

efficiency to also probing the levels of memory effectiveness in dynamic,

prolonged scenarios. To address the gap in available memory-based Deep

Reinforcement Learning baselines, we introduce an implementation that

integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This

approach utilizes TrXL as a form of episodic memory, employing a sliding window

technique. Our comparative study between the Gated Recurrent Unit (GRU) and

TrXL reveals varied performances across different settings. TrXL, on the finite

environments, demonstrates superior sample efficiency in Mystery Path and

outperforms in Mortar Mayhem. However, GRU is more efficient on Searing

Spotlights. Most notably, in all endless tasks, GRU makes a remarkable

resurgence, consistently outperforming TrXL by significant margins. Website and

Source Code: https://github.com/MarcoMeter/endless-memory-gym/

== Imitation Learning ==

标题: FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning

作者: Jianlan Luo, Charles Xu, Fangchen Liu

PubTime: 2024-01-16

Downlink: http://arxiv.org/abs/2401.08553v1

Project: https://functional-manipulation-benchmark.github.io|

中文摘要: 在本文中，我们提出了一个在功能操纵背景下研究机器人学习的真实世界基准：机器人需要通过以功能相关的方式组合个体操纵技能来完成复杂的长期行为。我们的功能操作基准（FMB）的核心设计原则强调复杂性和可访问性之间的和谐平衡。任务的范围被有意缩小，以确保可管理规模的模型和数据集可以被有效地用来跟踪进度。同时，它们的多样性足以构成重大的一般化挑战。此外，该基准测试旨在易于复制，包含所有基本的硬件和软件组件。为了实现这一目标，FMB由各种3D打印物体组成，旨在让其他研究人员轻松准确地复制。对象是按程序生成的，提供了一个原则性的框架，以受控的方式研究泛化。我们专注于基本的操作技能，包括抓取、重新定位和一系列组装行为。FMB可用于评估获得单个技能的方法，以及组合和排序这些技能以解决复杂、多阶段操作任务的方法。我们还提供了一个模仿学习框架，其中包括一套经过训练的策略来解决提出的任务。这使得研究人员能够利用我们的任务作为一个通用的工具包来检查管道的各个部分。例如，研究人员可以为抓取控制器提出更好的设计，并结合我们的基线重定向和组装策略进行评估，作为解决多阶段任务的管道的一部分。我们的数据集、对象CAD文件、代码和评估视频可以在我们的项目网站上找到：https：//functional-manipulation-benchmark.github.io

摘要: In this paper, we propose a real-world benchmark for studying robotic learning in the context of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by composing individual manipulation skills in functionally relevant ways. The core design principles of our Functional Manipulation Benchmark (FMB) emphasize a harmonious balance between complexity and accessibility. Tasks are deliberately scoped to be narrow, ensuring that models and datasets of manageable scale can be utilized effectively to track progress. Simultaneously, they are diverse enough to pose a significant generalization challenge. Furthermore, the benchmark is designed to be easily replicable, encompassing all essential hardware and software components. To achieve this goal, FMB consists of a variety of 3D-printed objects designed for easy and accurate replication by other researchers. The objects are procedurally generated, providing a principled framework to study generalization in a controlled fashion. We focus on fundamental manipulation skills, including grasping, repositioning, and a range of assembly behaviors. The FMB can be used to evaluate methods for acquiring individual skills, as well as methods for combining and ordering such skills to solve complex, multi-stage manipulation tasks. We also offer an imitation learning framework that includes a suite of policies trained to solve the proposed tasks. This enables researchers to utilize our tasks as a versatile toolkit for examining various parts of the pipeline. For example, researchers could propose a better design for a grasping controller and evaluate it in combination with our baseline reorientation and assembly policies as part of a pipeline for solving multi-stage tasks. Our dataset, object CAD files, code, and evaluation videos can be found on our project website: https://functional-manipulation-benchmark.github.io

标题: Multi-task robot data for dual-arm fine manipulation

作者: Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

PubTime: 2024-01-15

Downlink: http://arxiv.org/abs/2401.07603v1

Project: https://sites.google.com/view/multi-task-fine\|

中文摘要: 在机器人操纵领域，深度模仿学习被认为是一种很有前途的获取操纵技能的方法。此外，从不同的机器人数据集学习被认为是实现多功能性和适应性的可行方法。在这样的研究中，通过学习各种任务，机器人实现了跨多个对象的通用性。然而，这种多任务机器人数据集主要集中在相对不精确的单臂任务上，而没有解决机器人在现实世界中预期执行的细粒度对象操作。本文介绍了一个不同对象操作的数据集，包括双臂任务和/或需要精细操作的任务。为此，我们生成了224k集（150小时，1,104种语言指令）的数据集，其中包括双臂精细任务，如移动碗、打开铅笔盒或剥香蕉，这些数据是公开可用的。此外，该数据集包括视觉注意力信号以及双动作标签，该信号将动作分成稳健的到达轨迹和与对象的精确交互，以及实现稳健和精确的对象操作的语言指令。我们将该数据集应用于我们的双动作和注意力（DAA），这是一个为细粒度双臂操作任务设计的模型，对协变量偏移具有鲁棒性。该模型在实际机器人操作任务中进行了超过7k次试验，证明了其精细操作能力。

摘要: In the field of robotic manipulation, deep imitation learning is recognized as a promising approach for acquiring manipulation skills. Additionally, learning from diverse robot datasets is considered a viable method to achieve versatility and adaptability. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single-arm tasks that are relatively imprecise, not addressing the fine-grained object manipulation that robots are expected to perform in the real world. This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. To this end, we have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling, and this data is publicly available. Additionally, this dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention (DAA), a model designed for fine-grained dual arm manipulation tasks and robust against covariate shifts. The model was tested with over 7k total trials in real robot manipulation tasks, demonstrating its capability in fine manipulation.

标题: Residual Q-Learning: Offline and Online Policy Customization without Value

作者: Chenran Li, Chen Tang, Haruki Nishimura

PubTime: 2024-01-15

Downlink: http://arxiv.org/abs/2306.09526v3

Project: https://sites.google.com/view/residualq-learning.\|

中文摘要: 模仿学习（IL）是一个广泛使用的框架，用于从演示中学习模仿行为。它对于解决复杂的现实世界任务特别有吸引力，在这些任务中手工制作奖励功能是困难的，或者当目标是模仿人类专家行为时。然而，习得的模仿政策只能遵循示范中的行为。在应用模仿策略时，我们可能需要定制策略行为，以满足来自不同下游任务的不同需求。同时，我们仍然希望定制策略保持其模仿性。为此，我们制定了一个新的问题集，称为策略定制。它将学习任务定义为训练一个策略，该策略继承了先前策略的特征，同时满足目标下游任务施加的一些额外要求。我们提出了一种新颖的原则性方法来解释和确定两个任务目标之间的权衡。具体来说，我们将定制问题公式化为马尔可夫决策过程（MDP），其奖励函数结合了1）演示的固有奖励；以及2）下游任务指定的附加奖励。我们提出了一个新的框架，残差Q学习，它可以在不知道先验策略的内在回报或价值函数的情况下，通过利用先验策略来求解公式化的MDP。我们推导了一族可以实现离线和在线策略定制的残差Q学习算法，并表明所提出的算法可以在各种环境下有效地完成策略定制任务。演示视频和代码可在我们的网站上获得：https：//sites.google.com/view/residualq-learning。

摘要: Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. It is especially appealing for solving complex real-world tasks where handcrafting reward function is difficult, or when the goal is to mimic human expert behavior. However, the learned imitative policy can only follow the behavior in the demonstration. When applying the imitative policy, we may need to customize the policy behavior to meet different requirements coming from diverse downstream tasks. Meanwhile, we still want the customized policy to maintain its imitative nature. To this end, we formulate a new problem setting called policy customization. It defines the learning task as training a policy that inherits the characteristics of the prior policy while satisfying some additional requirements imposed by a target downstream task. We propose a novel and principled approach to interpret and determine the trade-off between the two task objectives. Specifically, we formulate the customization problem as a Markov Decision Process (MDP) with a reward function that combines 1) the inherent reward of the demonstration; and 2) the add-on reward specified by the downstream task. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy without knowing the inherent reward or value function of the prior policy. We derive a family of residual Q-learning algorithms that can realize offline and online policy customization, and show that the proposed algorithms can effectively accomplish policy customization tasks in various environments. Demo videos and code are available on our website: https://sites.google.com/view/residualq-learning.

标题: Multi-Stage Cable Routing through Hierarchical Imitation Learning

作者: Jianlan Luo, Charles Xu, Xinyang Geng

PubTime: 2024-01-13

Downlink: http://arxiv.org/abs/2307.08927v5

Project: https://sites.google.com/view/cablerouting.\|

中文摘要: 我们研究学习执行多阶段机器人操纵任务的问题，并应用于电缆布线，机器人必须将电缆穿过一系列夹子。这种设置提出了具有代表性的挑战复杂的多阶段机器人操作场景：处理可变形物体，结束视觉感知的循环，以及处理由多个步骤组成的扩展行为，这些步骤必须成功执行才能完成整个任务。在这种情况下，为每个阶段学习以足够高的成功率成功执行完整的时间扩展任务的单个原语是不切实际的：如果每个阶段都必须成功完成并且具有不可忽略的失败概率，则成功完成整个任务的可能性变得可以忽略不计。因此，用于这种多阶段任务的成功控制器必须能够从故障中恢复，并通过智能地选择在任何给定时间触发哪些控制器、重试或根据需要采取纠正措施来补偿低级控制器中的缺陷。为此，我们描述了一个模仿学习系统，该系统使用从较低（电机控制）和较高（排序）级别的演示中训练的基于视觉的策略，提出了一个用于实例化该方法以学习电缆布线任务的系统，并执行了在推广到非常具有挑战性的剪辑放置变化方面显示出良好性能的评估。补充视频、数据集和代码可在https://sites.google.com/view/cablerouting。

摘要: We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.

标题: Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

作者: Xiang Li, Varun Belagali, Jinghuan Shang

PubTime: 2024-01-11

Downlink: http://arxiv.org/abs/2307.01849v3

Project: https://youtu.be/9deKHueZBuk\|

GitHub: https://github.com/LostXine/crossway_diffusion\|

中文摘要: 序列建模方法在机器人模仿学习中显示出有希望的结果。最近，扩散模型已经以序列建模的方式被用于行为克隆，受益于它们在建模复杂数据分布方面的卓越能力。标准的基于扩散的策略根据输入状态从随机噪声中迭代地生成动作序列。尽管如此，扩散策略的模型可以在视觉表示方面进一步改进。在这项工作中，我们提出了交叉扩散，这是一种简单而有效的方法，通过精心设计的状态解码器和辅助的自我监督学习（SSL）目标来增强基于扩散的视觉运动策略学习。状态解码器从反向扩散过程的中间表示重建原始图像像素和其他状态信息。利用SSL目标和初始扩散损失对整个模型进行了联合优化。我们的实验证明了交叉扩散在各种模拟和真实世界机器人任务中的有效性，证实了它相对于标准的基于扩散的策略的一致优势以及对基线的实质性改进。

摘要: Sequence modeling approaches have shown promising results in robot imitation learning. Recently, diffusion models have been adopted for behavioral cloning in a sequence modeling fashion, benefiting from their exceptional capabilities in modeling complex data distributions. The standard diffusion-based policy iteratively generates action sequences from random noise conditioned on the input states. Nonetheless, the model for diffusion policy can be further improved in terms of visual representations. In this work, we propose Crossway Diffusion, a simple yet effective method to enhance diffusion-based visuomotor policy learning via a carefully designed state decoder and an auxiliary self-supervised learning (SSL) objective. The state decoder reconstructs raw image pixels and other state information from the intermediate representations of the reverse diffusion process. The whole model is jointly optimized by the SSL objective and the original diffusion loss. Our experiments demonstrate the effectiveness of Crossway Diffusion in various simulated and real-world robot tasks, confirming its consistent advantages over the standard diffusion-based policy and substantial improvements over the baselines.

标题: Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint

作者: Zhipeng Chen, Kun Zhou, Wayne Xin Zhao

PubTime: 2024-01-11

Downlink: http://arxiv.org/abs/2401.06081v1

GitHub: https://github.com/RUCAIBox/RLMEC\|

中文摘要: 强化学习（RL）已被广泛用于训练大型语言模型，以防止意外输出，例如减少危害和错误。然而，现有的RL方法大多采用实例级奖励，无法对复杂的推理任务提供细粒度的监督，也无法关注导致错误的少数关键令牌。为了解决这一问题，我们提出了一种新的RL方法，名为\textbf{RLMEC}，该方法结合了一个生成模型作为奖励模型，该模型由错误解重写任务在最小编辑约束下进行训练，并可以为RL训练产生令牌级奖励。基于生成奖励模型，我们设计了用于训练的令牌级RL目标和用于稳定RL过程的基于模仿的正则化。这两个目标都集中在学习错误解决方案的关键令牌上，减少其他不重要令牌的影响。数学任务和问答任务的实验结果证明了该方法的有效性。我们的代码和数据位于\url{https://github.com/RUCAIBox/RLMEC}.

摘要: Reinforcement learning (RL) has been widely used in training large language models~(LLMs) for preventing unexpected outputs, \eg reducing harmfulness and errors. However, existing RL methods mostly adopt the instance-level reward, which is unable to provide fine-grained supervision for complex reasoning tasks, and can not focus on the few key tokens that lead to the incorrectness. To address it, we propose a new RL method named \textbf{RLMEC} that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process. And the both objectives focus on the learning of the key tokens for the erroneous solution, reducing the effect of other unimportant tokens. The experiment results on mathematical tasks and question-answering tasks have demonstrated the effectiveness of our approach. Our code and data are available at \url{https://github.com/RUCAIBox/RLMEC}.

== robotic agent ==

标题: The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language

作者: Linus Nwankwo, Elmar Rueckert

PubTime: 2024-01-22

Downlink: http://arxiv.org/abs/2401.11838v1

Project: https://osf.io/wzyf6\|

GitHub: https://github.com/LinusNEP/TCC_IRoNL.git).|

中文摘要: 近年来，自主代理在现实世界环境中激增，如我们的家庭、办公室和公共场所。然而，自然的人机交互仍然是一个关键的挑战。在本文中，我们介绍了一种协同利用大型语言模型（LLMs）和多模态视觉语言模型（VLMs）的能力的方法，使人类能够通过对话与自主机器人进行自然交互。我们利用LLMs解码来自人类的高级自然语言指令，并将其抽象为精确的机器人可操作命令或查询。此外，我们利用VLMs来提供对机器人任务环境的视觉和语义理解。我们99.13%的命令识别准确率和97.96%的命令执行成功率表明，我们的方法可以增强现实世界应用中的人机交互。本文的视频演示可以在https：//osf.io/wzyf6找到，代码可以在我们的GitHub资源库（https：//github.com/LinusNEP/tcc_iron.git）找到。

摘要: In recent years, autonomous agents have surged in real-world environments such as our homes, offices, and public spaces. However, natural human-robot interaction remains a key challenge. In this paper, we introduce an approach that synergistically exploits the capabilities of large language models (LLMs) and multimodal vision-language models (VLMs) to enable humans to interact naturally with autonomous robots through conversational dialogue. We leveraged the LLMs to decode the high-level natural language instructions from humans and abstract them into precise robot actionable commands or queries. Further, we utilised the VLMs to provide a visual and semantic understanding of the robot's task environment. Our results with 99.13% command recognition accuracy and 97.96% commands execution success show that our approach can enhance human-robot interaction in real-world applications. The video demonstrations of this paper can be found at https://osf.io/wzyf6 and the code is available at our GitHub repository (https://github.com/LinusNEP/TCC_IRoNL.git).

标题: Augmented Reality User Interface for Command, Control, and Supervision of Large Multi-Agent Teams

作者: Frank Regal, Chris Suarez, Fabian Parra

PubTime: 2024-01-11

Downlink: http://arxiv.org/abs/2401.05665v1

Project: https://sites.google.com/view/xr-robotics-iros2023/home?authuser=0\|

中文摘要: 多智能体人------机器人团队通过利用和结合人类和机器人的优势，可以更有效地收集各种环境的信息。在国防、搜索和救援、急救等行业，异构人机团队有望通过将人类从未知和潜在危险的情况中移除来加速数据收集和提高团队安全性。这项工作建立在AugRE的基础上，AugRE是一个基于增强现实（AR）的可扩展人机团队框架。它使用户能够本地化并与50多个自主代理通信。通过我们的努力，用户能够指挥、控制和监督大型团队中的代理，无论是视距还是非视距，而无需事先修改环境，也无需用户使用典型的硬件（即操纵杆、键盘、笔记本电脑、平板电脑等）。）在外地。演示的工作表明，早期迹象表明，将这些基于AR-HMD的用户交互模式结合起来进行指挥、控制和监督，将有助于改善人机团队协作、健壮性和信任。

摘要: Multi-agent human-robot teaming allows for the potential to gather information about various environments more efficiently by exploiting and combining the strengths of humans and robots. In industries like defense, search and rescue, first-response, and others alike, heterogeneous human-robot teams show promise to accelerate data collection and improve team safety by removing humans from unknown and potentially hazardous situations. This work builds upon AugRE, an Augmented Reality (AR) based scalable human-robot teaming framework. It enables users to localize and communicate with 50+ autonomous agents. Through our efforts, users are able to command, control, and supervise agents in large teams, both line-of-sight and non-line-of-sight, without the need to modify the environment prior and without requiring users to use typical hardware (i.e. joysticks, keyboards, laptops, tablets, etc.) in the field. The demonstrated work shows early indications that combining these AR-HMD-based user interaction modalities for command, control, and supervision will help improve human-robot team collaboration, robustness, and trust.

标题: Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

作者: Shaunak A. Mehta, Dylan P. Losey

PubTime: 2024-01-09

Downlink: http://arxiv.org/abs/2207.03395v2

Project: https://youtu.be/FSUJsTYvEKU\|

中文摘要: 人类可以利用物理交互来教授机器人手臂。这种物理交互有多种形式，取决于任务、用户和机器人到目前为止学到的东西。最先进的方法专注于从单一模态中学习，或者通过假设机器人具有关于人类预期任务的先验信息来组合多种交互类型。相比之下，在本文中，我们介绍了一种算法形式主义，它将从演示、纠正和偏好中学习结合起来。我们的方法对人类想要教给机器人的任务没有任何假设；相反，我们通过将人类的输入与附近的替代方案进行比较，从头开始学习奖励模型。我们首先导出一个损失函数，它训练一组奖励模型来匹配人类的演示、纠正和偏好。反馈的类型和顺序由人类老师决定：我们让机器人被动或主动地收集反馈。然后，我们应用约束优化将我们学习到的奖励转换成期望的机器人轨迹。通过模拟和用户研究，我们证明了我们提出的方法比现有的基线更准确地从物理人类交互中学习操纵任务，特别是当机器人面临新的或意想不到的目标时。我们的用户研究视频可在以下网站获得：https：//youtu.be/FSUJsTYvEKU

摘要: Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human's intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human's inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human's demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

标题: StROL: Stabilized and Robust Online Learning from Humans

作者: Shaunak A. Mehta, Forrest Meng, Andrea Bajcsy

PubTime: 2024-01-04

Downlink: http://arxiv.org/abs/2308.09863v2

GitHub: https://github.com/VT-Collab/StROL_RAL\|

中文摘要: 在当前的互动中，机器人经常需要在线学习人类的奖励功能。这种实时学习需要快速但近似的学习规则：当人类的行为有噪声或次优时，当前的近似会导致机器人学习不稳定。因此，在本文中，我们试图增强梯度下降学习规则在推断人类奖励参数时的鲁棒性和收敛性。我们将机器人的学习算法建模为基于人类偏好参数的动态系统，其中人类的真实（但未知）偏好是平衡点。这使我们能够执行李亚普诺夫稳定性分析，以推导机器人学习动力学收敛的条件。我们提出的算法（StROL）使用这些条件来学习设计鲁棒的学习规则：给定原始的学习动态，StROL输出修改的学习规则，该规则现在在更大的人类输入集下收敛到人类的真实参数。在实践中，这些自主生成的学习规则可以正确地推断出人类试图传达的内容，即使人类是嘈杂的、有偏见的和次优的。通过模拟和用户研究，我们发现StROL比最先进的在线奖励学习方法产生更准确的估计和更少的遗憾。请点击此处查看视频和代码：https://github.com/VT-Collab/StROL_RAL

摘要: Robots often need to learn the human's reward function online, during the

current interaction. This real-time learning requires fast but approximate

learning rules: when the human's behavior is noisy or suboptimal, current

approximations can result in unstable robot learning. Accordingly, in this

paper we seek to enhance the robustness and convergence properties of gradient

descent learning rules when inferring the human's reward parameters. We model

the robot's learning algorithm as a dynamical system over the human preference

parameters, where the human's true (but unknown) preferences are the

equilibrium point. This enables us to perform Lyapunov stability analysis to

derive the conditions under which the robot's learning dynamics converge. Our

proposed algorithm (StROL) uses these conditions to learn robust-by-design

learning rules: given the original learning dynamics, StROL outputs a modified

learning rule that now converges to the human's true parameters under a larger

set of human inputs. In practice, these autonomously generated learning rules

can correctly infer what the human is trying to convey, even when the human is

noisy, biased, and suboptimal. Across simulations and a user study we find that

StROL results in a more accurate estimate and less regret than state-of-the-art

approaches for online reward learning. See videos and code here:

https://github.com/VT-Collab/StROL_RAL

标题: Sample-efficient Reinforcement Learning in Robotic Table Tennis

作者: Jonas Tebbe, Lukas Krauch, Yapeng Gao

PubTime: 2024-01-04

Downlink: http://arxiv.org/abs/2011.03275v4

Project: https://youtu.be/uRAtdoL6Wpw.\|

中文摘要: 强化学习（RL）最近在各种计算机游戏和模拟中取得了一些令人印象深刻的成功。这些成功中的大多数都是基于代理人可以从中学习的大量情节。然而，在典型的机器人应用中，可行的尝试次数非常有限。在本文中，我们提出了一个样本有效的RL算法应用于一个乒乓球机器人的例子。在乒乓球比赛中，每一次击球都是不同的，位置、速度和旋转都不同。因此，必须根据高维连续状态空间找到精确的返回。为了使在少数试验中学习成为可能，该方法被嵌入到我们的机器人系统中。这样我们就可以使用一步到位的环境。状态空间取决于击球时的球（位置、速度、旋转），动作是击球时的球拍状态（方向、速度）。提出了一种基于行动者------批评家的确定性策略梯度算法用于加速学习。在许多具有挑战性的场景中，我们的方法在模拟和真实机器人上都具有竞争力。在不到200美元的训练中，无需预训练即可获得准确的结果。展示我们实验的视频可在https：//youtu.be/uRAtdoL6Wpw。

摘要: Reinforcement learning (RL) has achieved some impressive recent successes in

various computer games and simulations. Most of these successes are based on

having large numbers of episodes from which the agent can learn. In typical

robotic applications, however, the number of feasible attempts is very limited.

In this paper we present a sample-efficient RL algorithm applied to the example

of a table tennis robot. In table tennis every stroke is different, with

varying placement, speed and spin. An accurate return therefore has to be found

depending on a high-dimensional continuous state space. To make learning in few

trials possible the method is embedded into our robot system. In this way we

can use a one-step environment. The state space depends on the ball at hitting

time (position, velocity, spin) and the action is the racket state

(orientation, velocity) at hitting. An actor-critic based deterministic policy

gradient algorithm was developed for accelerated learning. Our approach

performs competitively both in a simulation and on the real robot in a number

of challenging scenarios. Accurate results are obtained without pre-training in

under 200 200 200 episodes of training. The video presenting our experiments is

available at https://youtu.be/uRAtdoL6Wpw.

标题: Motion Control of Interactive Robotic Arms Based on Mixed Reality Development

作者: Hanxiao Chen

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2401.01644v1

Project: http://www.icca.net/,\|

中文摘要: 混合现实（MR）正在不断发展，以激发机器人的新模式

摘要: Mixed Reality (MR) is constantly evolving to inspire new patterns of robot

manipulation for more advanced Human- Robot Interaction under the 4th

Industrial Revolution Paradigm. Consider that Mixed Reality aims to connect

physical and digital worlds to provide special immersive experiences, it is

necessary to establish the information exchange platform and robot control

systems within the developed MR scenarios. In this work, we mainly present

multiple effective motion control methods applied on different interactive

robotic arms (e.g., UR5, UR5e, myCobot) for the Unity-based development of MR

applications, including GUI control panel, text input control panel,

end-effector object dynamic tracking and ROS-Unity digital-twin connection.

== Object Detection ==

标题: OMG-Seg: Is One Model Good Enough For All Segmentation?

作者: Xiangtai Li, Haobo Yuan, Wei Li

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2401.10229v1

Project: https://lxtgh.github.io/project/omg_seg/\|

GitHub: https://github.com/lxtGH/OMG-Seg.\|

中文摘要: 在这项工作中，我们解决了各种分割任务，每个任务传统上都由不同的或部分统一的模型来解决。我们提出了OMG-Seg，这是一个足够好的模型，可以高效和有效地处理所有分割任务，包括图像语义、实例和全景分割，以及它们的视频对应物、开放词汇设置、提示驱动的交互式分割（如SAM）和视频对象分割。据我们所知，这是第一个在一个模型中处理所有这些任务并实现令人满意的性能的模型。我们表明，OMG-Seg是一种基于Transformer model的编码器------解码器架构，具有特定于任务的查询和输出，可以支持十多种不同的分割任务，同时显著降低各种任务和数据集的计算和参数开销。我们严格评估了合作训练中任务间的影响和相关性。代码和模型可在https：//github.com/lxtGH/OMG-Seg获得。

摘要: In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation. To our knowledge, this is the first model to handle all these tasks in one model and achieve satisfactory performance. We show that OMG-Seg, a transformer-based encoder-decoder architecture with task-specific queries and outputs, can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead across various tasks and datasets. We rigorously evaluate the inter-task influences and correlations during co-training. Code and models are available at https://github.com/lxtGH/OMG-Seg.

标题: RAP-SAM: Towards Real-Time All-Purpose Segment Anything

作者: Shilin Xu, Haobo Yuan, Qingyu Shi

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2401.10228v1

Project: https://xushilin1.github.io/rap_sam/\|

GitHub: https://github.com/xushilin1/RAP-SAM/.\|

中文摘要: 由Transformer model架构推进，视觉基础模型（VFMs）在性能和泛化能力方面取得了显著进步。Segment Anything模型（SAM）是一种能够实现广义分割的出色模型。然而，大多数VFM不能实时运行，这使得很难将它们转移到几个产品中。另一方面，目前的实时分割主要有一个目的，比如对驾驶场景进行语义分割。我们认为实际应用需要不同的输出。因此，本工作探索了一种新的实时分段设置，称为实时通用分段，以在实时部署中传输VFMs。它包含三个不同的任务，包括交互式分割、全景分割和视频分割。我们的目标是使用一个模型来实时完成上述任务。我们首先对几个强基线进行基准测试。然后，我们提出了实时通用SAM（RAP-SAM）。它包含一个高效的编码器和一个高效的解耦解码器来执行提示驱动解码。此外，我们进一步探索不同的训练策略和调整方法，以进一步提高共同训练的表现。我们的代码和模型可在https：//github.com/xushilin1/RAP-SAM/获得。

摘要: Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/.

标题: Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive

作者: Yumeng Li, Margret Keuper, Dan Zhang

PubTime: 2024-01-16

Downlink: http://arxiv.org/abs/2401.08815v1

Project: https://yumengli007.github.io/ALDM/\|

GitHub: https://github.com/boschresearch/ALDM\|

中文摘要: 尽管大规模扩散模型最近取得了进展，但布局到图像（L2I）合成任务进展甚微。当前的L2I模型要么通过文本的可编辑性差，要么生成的图像和输入布局之间的对齐弱。这限制了它们在实践中的可用性。为了减轻这一点，我们建议将对抗性监督整合到L2I扩散模型（ALDM）的传统训练管道中。具体来说，我们采用基于分割的鉴别器，该鉴别器向扩散发生器提供关于去噪图像和输入布局之间的像素级对齐的显式反馈。为了鼓励在采样步骤中一致地遵守输入布局，我们进一步引入了多步展开策略。我们不是查看单个时间步长，而是递归地展开几个步骤来模拟推理过程，并要求鉴别器在特定时间窗口内评估去噪图像与布局的对齐情况。我们的实验表明，ALDM能够实现生成图像的布局忠实性，同时允许通过文本提示进行广泛的编辑。此外，我们展示了它在实际应用中的有用性：通过文本控制合成目标分布样本，我们大大提高了语义分割模型的领域泛化能力（约1200万分）。

摘要: Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM). Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout. To encourage consistent adherence to the input layout over the sampling steps, we further introduce the multistep unrolling strategy. Instead of looking at a single timestep, we unroll a few steps recursively to imitate the inference process, and ask the discriminator to assess the alignment of denoised images with the layout over a certain time window. Our experiments show that ALDM enables layout faithfulness of the generated images, while allowing broad editability via text prompts. Moreover, we showcase its usefulness for practical applications: by synthesizing target distribution samples via text control, we improve domain generalization of semantic segmentation models by a large margin (~12 mIoU points).

标题: LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation

作者: Alou Diakite, Cheng Li, Lei Xie

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2401.01654v1

GitHub: https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-\|

中文摘要: 最近的研究显示了深度学习在基于多参数MRI的视觉路径（VP）分割中的潜力。然而，获取用于训练的标记数据既费力又耗时。因此，在标记样本有限的情况下开发有效的算法至关重要。在这项工作中，我们提出了一种标签有效的自集成深度学习方法（LESEN）。LESEN结合了监督和非监督损失，使学生和教师模型能够相互学习，形成一个自我集成的平均教师框架。此外，我们引入了可靠的无标记样本选择（RUSS）机制，以进一步提高LESEN的有效性。我们在人类连接体项目（HCP）数据集上的实验证明了我们的方法与最先进的技术相比的卓越性能，推进了临床和研究环境中综合分析的多模态VP分割。实现代码可在以下网址获得：https：//github.com/aldiak/semi-supervised-multimodal-visual-pathway-delineation。

摘要: Recent research has shown the potential of deep learning in multi-parametric

MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for

training is laborious and time-consuming. Therefore, it is crucial to develop

effective algorithms in situations with limited labeled samples. In this work,

we propose a label-efficient deep learning method with self-ensembling (LESEN).

LESEN incorporates supervised and unsupervised losses, enabling the student and

teacher models to mutually learn from each other, forming a self-ensembling

mean teacher framework. Additionally, we introduce a reliable unlabeled sample

selection (RUSS) mechanism to further enhance LESEN's effectiveness. Our

experiments on the human connectome project (HCP) dataset demonstrate the

superior performance of our method when compared to state-of-the-art

techniques, advancing multimodal VP segmentation for comprehensive analysis in

clinical and research settings. The implementation code will be available at:

https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-

Delineation.

标题: S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

作者: Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2401.01643v1

GitHub: https://github.com/CVEO/S3Net.\|

中文摘要: 立体匹配和语义分割是双目卫星三维重建中的重要任务。然而，以前的研究主要将这些任务视为独立的并行任务，缺乏一个完整的多任务学习框架。本文介绍了一种解决方案，单分支语义立体网络（S3Net），它创新性地将语义分割和立体匹配结合起来，使用自融合和互融合模块。与以前独立利用语义或差异信息的方法不同，我们的方法确定并利用这两个任务之间的内在联系，导致对语义信息和差异估计的更准确理解。在US3D数据集上的对比测试证明了我们的S3Net的有效性。我们的模型将语义分割中的mIoU从61.38提高到67.39，并将视差估计中的D1误差和平均端点误差（EPE）分别从10.051降低到9.579和1.439降低到1.403，超过了现有的竞争方法。我们的代码可在以下网址查阅：https://github.com/CVEO/S3Net。

摘要: Stereo matching and semantic segmentation are significant tasks in binocular

satellite 3D reconstruction. However, previous studies primarily view these as

independent parallel tasks, lacking an integrated multitask learning framework.

This work introduces a solution, the Single-branch Semantic Stereo Network

(S3Net), which innovatively combines semantic segmentation and stereo matching

using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize

semantic or disparity information independently, our method dentifies and

leverages the intrinsic link between these two tasks, leading to a more

accurate understanding of semantic information and disparity estimation.

Comparative testing on the US3D dataset proves the effectiveness of our S3Net.

Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and

reduces the D1-Error and average endpoint error (EPE) in disparity estimation

from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing

competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.

标题: Context-Aware Interaction Network for RGB-T Semantic Segmentation

作者: Ying Lv, Zhi Liu, Gongyang Li

PubTime: 2024-01-03

Downlink: http://arxiv.org/abs/2401.01624v1

GitHub: https://github.com/YingLv1106/CAINet.\|

中文摘要: RGB-T语义分割是自动驾驶场景理解的关键技术。然而，对于现有的RGB-T语义分割方法，没有在多层次的信息交互中实现对不同模态之间互补关系的有效探索。为了解决这一问题，提出了用于RGB-T语义分割的上下文感知交互网络（CAINet），该网络构建交互空间以利用辅助任务和全局上下文进行显式引导学习。具体来说，我们提出了一个上下文感知互补推理（CACR）模块，旨在建立多模态特征与长期上下文在空间和通道维度上的互补关系。此外，考虑到全局上下文和细节信息的重要性，我们提出了全局上下文建模（GCM）模块和细节聚合（DA）模块，并引入了特定的辅助监督来明确指导上下文交互和细化分割图。在MFNet和PST900的两个基准数据集上的大量实验表明，所提出的CAINet实现了最先进的性能。代码可在https://github.com/YingLv1106/CAINet。

摘要: RGB-T semantic segmentation is a key technique for autonomous driving scenes

understanding. For the existing RGB-T semantic segmentation methods, however,

the effective exploration of the complementary relationship between different

modalities is not implemented in the information interaction between multiple

levels. To address such an issue, the Context-Aware Interaction Network

(CAINet) is proposed for RGB-T semantic segmentation, which constructs

interaction space to exploit auxiliary tasks and global context for explicitly

guided learning. Specifically, we propose a Context-Aware Complementary

Reasoning (CACR) module aimed at establishing the complementary relationship

between multimodal features with the long-term context in both spatial and

channel dimensions. Further, considering the importance of global contextual

and detailed information, we propose the Global Context Modeling (GCM) module

and Detail Aggregation (DA) module, and we introduce specific auxiliary

supervision to explicitly guide the context interaction and refine the

segmentation map. Extensive experiments on two benchmark datasets of MFNet and

PST900 demonstrate that the proposed CAINet achieves state-of-the-art

performance. The code is available at https://github.com/YingLv1106/CAINet.

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持。谢谢提供建议
如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文