TorchRL-MADDPG

MARL

MPE

MADDPG

off-policy

教程

流程还是一样;

1.依赖项
2.超参数
3.环境

就是说智能体被分组了;

可以看到分组的map;

此外,还跟以前一样,说明ENV不仅包括了基本simulator和transforms,这些元数据描述了执行过程中可能出现的情况。为了提高效率,TorchRL 对环境规范的要求相当严格,但您可以轻松检查您的环境规范是否足够。

python 复制代码
action_spec: Composite(
    adversary: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 2, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 1, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
reward_spec: Composite(
    adversary: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 2, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 1, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
done_spec: Composite(
    done: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    terminated: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
observation_spec: Composite(
    adversary: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 2, 14]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 1, 12]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)

其实就是对环境做上层的包装

这段和之前的描述无区别

4.Policy

流程:

构造原始网络得到分布参数

根据参数构造分布,采样动作

add噪声,增加探索;

5.Critic
6.Collector
7.Replybuffer

严格分离

8.utils

将done和terminated扩展到group而不再是全局,因为后面的V需要能被正确识别是否是done;

9.Train_loop
相关推荐
好家伙VCC37 分钟前
**神经编码新视角:用Python实现生物启发的神经信号压缩与解码算法**在人工智能飞速发展的今天
java·人工智能·python·算法
踏着七彩祥云的小丑7 小时前
pytest——Mark标记
开发语言·python·pytest
不爱吃炸鸡柳8 小时前
Python入门第一课:零基础认识Python + 环境搭建 + 基础语法精讲
开发语言·python
Dxy12393102169 小时前
Python基于BERT的上下文纠错详解
开发语言·python·bert
SiYuanFeng10 小时前
Colab复现 NanoChat:从 Tokenizer(CPU)、Base Train(CPU) 到 SFT(GPU) 的完整踩坑实录
python·colab
炸炸鱼.11 小时前
Python 操作 MySQL 数据库
android·数据库·python·adb
sinat_2869451911 小时前
AI Coding 时代的 TDD:从理念到工程落地
人工智能·深度学习·算法·tdd
_深海凉_11 小时前
LeetCode热题100-颜色分类
python·算法·leetcode
AC赳赳老秦12 小时前
OpenClaw email技能:批量发送邮件、自动回复,高效处理工作邮件
运维·人工智能·python·django·自动化·deepseek·openclaw
zhaoshuzhaoshu12 小时前
Python 语法之数据结构详细解析
python