TorchRL-MADDPG

MARL

MPE

MADDPG

off-policy

教程

流程还是一样;

1.依赖项
2.超参数
3.环境

就是说智能体被分组了;

可以看到分组的map;

此外,还跟以前一样,说明ENV不仅包括了基本simulator和transforms,这些元数据描述了执行过程中可能出现的情况。为了提高效率,TorchRL 对环境规范的要求相当严格,但您可以轻松检查您的环境规范是否足够。

python 复制代码
action_spec: Composite(
    adversary: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 2, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 1, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
reward_spec: Composite(
    adversary: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 2, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 1, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
done_spec: Composite(
    done: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    terminated: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
observation_spec: Composite(
    adversary: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 2, 14]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 1, 12]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)

其实就是对环境做上层的包装

这段和之前的描述无区别

4.Policy

流程:

构造原始网络得到分布参数

根据参数构造分布,采样动作

add噪声,增加探索;

5.Critic
6.Collector
7.Replybuffer

严格分离

8.utils

将done和terminated扩展到group而不再是全局,因为后面的V需要能被正确识别是否是done;

9.Train_loop
相关推荐
猿界零零七7 小时前
pip install mxnet 报错解决方案
python·pip·mxnet
不只会拍照的程序猿9 小时前
《嵌入式AI筑基笔记02:Python数据类型01,从C的“硬核”到Python的“包容”》
人工智能·笔记·python
Jay_Franklin9 小时前
Quarto与Python集成使用
开发语言·python·markdown
吴佳浩 Alben10 小时前
GPU 生产环境实践:硬件拓扑、显存管理与完整运维体系
运维·人工智能·pytorch·语言模型·transformer·vllm
Oueii10 小时前
Django全栈开发入门:构建一个博客系统
jvm·数据库·python
2401_8318249611 小时前
使用Fabric自动化你的部署流程
jvm·数据库·python
njidf11 小时前
Python日志记录(Logging)最佳实践
jvm·数据库·python
@我漫长的孤独流浪11 小时前
Python编程核心知识点速览
开发语言·数据库·python
宇擎智脑科技11 小时前
A2A Python SDK 源码架构解读:一个请求是如何被处理的
人工智能·python·架构·a2a
2401_8512729911 小时前
实战:用Python分析某电商销售数据
jvm·数据库·python