TorchRL-MADDPG

MARL

MPE

MADDPG

off-policy

教程

流程还是一样;

1.依赖项
2.超参数
3.环境

就是说智能体被分组了;

可以看到分组的map;

此外,还跟以前一样,说明ENV不仅包括了基本simulator和transforms,这些元数据描述了执行过程中可能出现的情况。为了提高效率,TorchRL 对环境规范的要求相当严格,但您可以轻松检查您的环境规范是否足够。

python 复制代码
action_spec: Composite(
    adversary: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 2, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        action: BoundedContinuous(
            shape=torch.Size([10, 1, 2]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 2]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
reward_spec: Composite(
    adversary: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 2, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        reward: UnboundedContinuous(
            shape=torch.Size([10, 1, 1]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 1]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
done_spec: Composite(
    done: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    terminated: Categorical(
        shape=torch.Size([10, 1]),
        space=CategoricalBox(n=2),
        device=cuda:0,
        dtype=torch.bool,
        domain=discrete),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)
observation_spec: Composite(
    adversary: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 2, 14]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 2, 14]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 2]),
        data_cls=None),
    agent: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([10, 1, 12]),
            space=ContinuousBox(
                low=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True),
                high=Tensor(shape=torch.Size([10, 1, 12]), device=cuda:0, dtype=torch.float32, contiguous=True)),
            device=cuda:0,
            dtype=torch.float32,
            domain=continuous),
        device=cuda:0,
        shape=torch.Size([10, 1]),
        data_cls=None),
    device=cuda:0,
    shape=torch.Size([10]),
    data_cls=None)

其实就是对环境做上层的包装

这段和之前的描述无区别

4.Policy

流程:

构造原始网络得到分布参数

根据参数构造分布,采样动作

add噪声,增加探索;

5.Critic
6.Collector
7.Replybuffer

严格分离

8.utils

将done和terminated扩展到group而不再是全局,因为后面的V需要能被正确识别是否是done;

9.Train_loop
相关推荐
vvoennvv3 小时前
【Python TensorFlow】 TCN-LSTM时间序列卷积长短期记忆神经网络时序预测算法(附代码)
python·神经网络·机器学习·tensorflow·lstm·tcn
nix.gnehc3 小时前
PyTorch
人工智能·pytorch·python
小殊小殊3 小时前
【论文笔记】视频RAG-Vgent:基于图结构的视频检索推理框架
论文阅读·人工智能·深度学习
z***3354 小时前
SQL Server2022版+SSMS安装教程(保姆级)
后端·python·flask
I'm Jie4 小时前
从零开始学习 TOML,配置文件的新选择
python·properties·yaml·toml
hans汉斯4 小时前
【数据挖掘】基于深度学习的生产车间智能管控研究
人工智能·深度学习·数据挖掘
brave and determined4 小时前
可编程逻辑器件学习(day34):半导体编年史:从法拉第的意外发现到塑造现代文明的硅基浪潮
人工智能·深度学习·fpga开发·verilog·fpga·设计规范·嵌入式设计
二川bro4 小时前
Scikit-learn全流程指南:Python机器学习项目实战
python·机器学习·scikit-learn
代码的乐趣5 小时前
支持selenium的chrome driver更新到142.0.7444.175
chrome·python·selenium