offline rl

offline RL · PbRL | LiRE：构造 A>B>C 的 RLT 列表，得到更多 preference 数据In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. Ho

offline RL | D4RL：最常用的 offline 数据集之一

offline RL | 读读 Decision Transformer（著名的 GPT 的全称是 Generative Pre-trained Transformer）学习 Transformer：

offline 2 online | Cal-QL：校准保守 offline 训出的 Q value，让它与真实 reward 尺度相当A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing off

offline 2 online | 重要性采样，把 offline + online 数据化为 on-policy samplesRecent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets. However, depending on the quality of the trained agents and the application being considered, it is often desirable to fi

offline 2 online | AWAC：基于 AWR 的 policy update + online 补充数据集Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in r

offline RL | ABM：从 offline dataset 的好 transition 提取 prior policyA learned prior for offline off-policy RL from imperfect data - 从不完美数据中学习 offline RL 的先验。

我是有底线的