offline rl

MoonOut5 天前
offline rl·meta-rl
offline meta-RL | 近期工作速读记录也请参见:offline meta-RL | 经典论文速读记录主要内容:结果 1:随机意图确实产生多样且高质量行为。实验显示,UBER提取的行为策略:
MoonOut11 天前
offline rl·meta rl
offline meta RL | 论文速读记录主要内容:结果 1:随机意图确实产生多样且高质量行为。实验显示,UBER提取的行为策略:结果 2:在线学习加速显著。在Mujoco运动任务中,UBER相比基线方法:
MoonOut1 年前
offline rl·pbrl
offline RL · PbRL | LiRE:构造 A>B>C 的 RLT 列表,得到更多 preference 数据In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. Ho
MoonOut2 年前
offline rl
offline RL | D4RL:最常用的 offline 数据集之一
MoonOut2 年前
offline rl
offline RL | 读读 Decision Transformer(著名的 GPT 的全称是 Generative Pre-trained Transformer)学习 Transformer:
MoonOut2 年前
offline rl
offline 2 online | Cal-QL:校准保守 offline 训出的 Q value,让它与真实 reward 尺度相当A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing off
MoonOut2 年前
offline rl
offline 2 online | 重要性采样,把 offline + online 数据化为 on-policy samplesRecent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fi
MoonOut2 年前
offline rl
offline 2 online | AWAC:基于 AWR 的 policy update + online 补充数据集Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in r
MoonOut2 年前
offline rl
offline RL | ABM:从 offline dataset 的好 transition 提取 prior policyA learned prior for offline off-policy RL from imperfect data - 从不完美数据中学习 offline RL 的先验。
我是有底线的