offline rl

MoonOut8 个月前
offline rl
offline RL | D4RL:最常用的 offline 数据集之一
MoonOut9 个月前
offline rl
offline RL | 读读 Decision Transformer(著名的 GPT 的全称是 Generative Pre-trained Transformer)学习 Transformer:
MoonOut10 个月前
offline rl
offline 2 online | Cal-QL:校准保守 offline 训出的 Q value,让它与真实 reward 尺度相当A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing off
MoonOut10 个月前
offline rl
offline 2 online | 重要性采样,把 offline + online 数据化为 on-policy samplesRecent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fi
MoonOut10 个月前
offline rl
offline 2 online | AWAC:基于 AWR 的 policy update + online 补充数据集Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in r
MoonOut10 个月前
offline rl
offline RL | ABM:从 offline dataset 的好 transition 提取 prior policyA learned prior for offline off-policy RL from imperfect data - 从不完美数据中学习 offline RL 的先验。