0. 写在前面
0.1 强化学习综述/资料(更新中)
鹏程实验室: 中文报道:学术分享丨具身智能综述:鹏城实验室&中大调研近400篇文献,英文原文:Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
文章对比了现有的几款强化学习平台的功能,包括是否支持:物理仿真器、图形渲染、机器人库、深度学习支持、大规模并行计算、强化学习、大规模并行仿真、多机器人系统、机器人仿真。
0.2 github代码仓库(更新中)
1. Gymnasium
官方课程学习相关网址:
一些博主的教程:
文字教程:【学习笔记】Gymnasium入门
视频教程:Install Gymnasium (OpenAI Gym) on Windows
Isaac Gym的官方文档Gym Documentation首页写道:
Gym 的所有开发都已转移到 Gymnasium,这是 Farama 基金会的一个新软件包,由过去 18 个月维护 Gym 的同一开发团队维护。如果您已经在使用最新版本的 Gym (v0.26.2),那么您只需将 import gym
替换为 import gymnasium as gym
即可切换到 v0.27.0,无需额外步骤。Gym 将不会收到任何未来的更新或错误修复,并且不会对 Gymnasium 中的核心 API 进行进一步的更改。
Gym Documentation
Gymnasium Documentation
2. Isaac Gym (成功运行案例)
安装教程:Ubuntu22.04 1650显卡4G安装isaacgym&legged_gym
IsaacGymEnvs
是一个基于 NVIDIA Isaac Gym 的开源 Python 环境库,专为机器人训练提供高效的仿真环境。Isaac Gym 是由 NVIDIA 开发的一个高性能物理仿真引擎,专为机器人学和强化学习等应用设计。它利用 GPU 加速,支持大规模的并行仿真,使得多智能体强化学习和机器学习任务的训练速度大大提升。
2.1 运行训练
设置机器人数量为1024,不显示(打开后会报内存错误,估计是显卡内存太小)
c
python legged_gym/scripts/train.py --task=anymal_c_flat --num_envs=1024 --headless
报错:ValueError: too many values to unpack (expected 2)
c
(gym) xj@xj:~/isaacgym/legged_gym$ python legged_gym/scripts/train.py --task=anymal_c_flat --num_envs=1024 --headless
Importing module 'gym_38' (/home/xj/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/xj/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 2.2.2
Device count 1
/home/xj/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/xj/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Emitting ninja build file /home/xj/.cache/torch_extensions/py38_cu121/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
Setting seed: 1
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/xj/anaconda3/envs/gym/lib/python3.8/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1711403382592/work/aten/src/ATen/native/TensorShape.cpp:3549.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "legged_gym/scripts/train.py", line 47, in <module>
train(args)
File "legged_gym/scripts/train.py", line 42, in train
ppo_runner, train_cfg = task_registry.make_alg_runner(env=env, name=args.task, args=args)
File "/home/xj/isaacgym/legged_gym/legged_gym/utils/task_registry.py", line 147, in make_alg_runner
runner = OnPolicyRunner(env, train_cfg_dict, log_dir, device=args.rl_device)
File "/home/xj/isaacgym/rsl_rl/rsl_rl/runners/on_policy_runner.py", line 29, in __init__
obs, extras = self.env.get_observations()
ValueError: too many values to unpack (expected 2)
(gym) xj@xj:~/isaacgym/legged_gym$
【已解决】安装rsl_rl时,git clone后,必须:git checkout v1.0.2
,将版本切换到v1.0.2
:
c
gym) xj@xj:~/isaacgym/legged_gym$ cd ..
(gym) xj@xj:~/isaacgym$ cd rsl_rl/
(gym) xj@xj:~/isaacgym/rsl_rl$ git checkout v1.0.2
Note: switching to 'v1.0.2'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 2ad79cf Update README.md
后面就能正常运行了
训练结束
2.2 测试训练结果
走的确实挺好的~~
终端:
Isaacgym窗口:
界面自己停了:
2.3 IsaacGymEnv报错修改(python train.py task=Ant)
源码地址:IsaacGymEnvs
训练文件是:
c
python train.py task=Ant
报错:
c
(gym) xj@xj:~/isaacgym/IsaacGymEnvs/isaacgymenvs$ python train.py task=Ant
下面这个报错问题不大,先不管。。。
c
Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx'
FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html
安装fbxsdk_python的方法 (也不用管)
要修改的错是这里:
c
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
解决方法:
参照AttributeError: module numpy has no attribute int .报错解决方案,
numpy.int
在NumPy 1.20
中已弃用,在NumPy 1.24
中已删除。
方案一:重新安装numpy,降numpy版本
c
pip uninstall numpy
pip install numpy==1.22
方案二:改代码(不推荐)
修改后正常训练了,终端也输出了训练数据:
2.4 示例详解:FrankaCubeStack
Isaac Gym内包含的示例代码的原理、训练结果、性能等,都在论文中进行了详细解释Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning,或者查看国内博主的中文教程第2章 IsaacGymEnvs安装,里面介绍了参数设置的含义,训练注意事项等,很全面~
参数配置文件路径:
python
# 所有任务的共有参数配置文件
isaacgymenvs/cfg/config.yaml
# FrankaCubeStack任务的参数配置文件
isaacgymenvs/cfg/task/FrankaCubeStack.yaml
上面修改后,没有报错了~
2.4.1. 训练:
从随机参数模型开始训练:
c
python train.py task=FrankaCubeStack
训练时禁止显示图形界面:
c
python train.py task=FrankaCubeStack headless=True
如果显卡不太行:为了避免渲染消耗显存,可以减少环境数量num_envs=64
或者更少。默认是8192(文件cfg/task/FrankaCubeStack.yaml
中的代码numEnvs: ${resolve_default:8192,${...num_envs}}
):
c
python train.py task=FrankaCubeStack headless=True num_envs=64
2.4.2. 训练中断后继续
训练中断后,导入预训练模型继续训练:
c
python train.py task=FrankaCubeStack checkpoint=runs/FrankaCubeStack_09-12-14-22/nn/last_FrankaCubeStack_ep_500_rew_773.4196.pth headless=True
2.4.3. 测试
假如已经训练好了,要加载经过训练的模型并仅进行推理测试模型好坏(无训练),代码是什么呢?
测试也是用train.py
这个脚本,设置test=True
就可以了;如果想可视化环境,设置headless=False
:
c
python train.py task=FrankaCubeStack test=True checkpoint=runs/FrankaCubeStack_09-12-14-22/nn/last_FrankaCubeStack_ep_500_rew_773.4196.pth headless=False num_envs=64
2.4.4. 视频获取
isaac gym实现了标准 env.render(mode='rgb_rray')
gym API 来提供模拟器查看器的图像。 此外,可以利用gym.wrappers.RecordVideo
来帮助录制显示智能体运行过程的视频。 官方给出了示例代码如下,该文件应在视频文件夹中生成视频。
c
python train.py task=FrankaCubeStack test=True checkpoint=runs/FrankaCubeStack_09-12-14-22/nn/last_FrankaCubeStack_ep_500_rew_773.4196.pth headless=False num_envs=64 capture_video=True
报错及修改参见我的另一篇博文:解决:ubuntu22.04中IsaacGymEnv保存视频报错的问题
成功写出视频:
rl-video-step-0
2.5 之后就是精读代码,学习方法了。
3. Isaac Sim (笔记本硬件不支持,暂停)
3.1 安装和检查
中文教程:【Isaac Sim】Ubuntu 22.04 安装 Isaac Sim 详细教程
- Isaac Sim是建立在Omniverse平台之上的一个应用程序,所以要先安装Omniverse。
(Omniverse Launcher类似于一个软件商店?) - 检查电脑系统配置和对 Isaac Sim 的兼容性
NVIDIA Omniverse 提供了 Isaac Sim Compatibility Checker 软件,用于检查 Isaac Sim 对电脑系统配置和兼容性的要求。 - 如果系统支持的话, 在交易所(Exchange)中搜索 Isaac Sim
3.2 先检查一下我的系统是否支持 Isaac Sim
对配置要求分了4个等级,其中,深绿色(优秀)、浅绿色(良好)、橙色(足够,建议更高)和红色(不够/不支持)
我的(笔记本)的RAM和GPU不够,官网给的最低配置要求(见下表):
RAM 32GB*
GPU GeForce RTX 3070
Element | Minimum Spec | Good | Ideal |
---|---|---|---|
OS | Ubuntu 20.04/22.04 Windows 10/11 | Ubuntu 20.04/22.04 Windows 10/11 | Ubuntu 20.04/22.04 Windows 10/11 |
CPU | Intel Core i7 (7th Generation) AMD Ryzen 5 | Intel Core i7 (9th Generation) AMD Ryzen 7 | Intel Core i9, X-series or higher AMD Ryzen 9, Threadripper or higher |
Cores | 4 | 8 | 16 |
RAM | 32GB* | 64GB* | 64GB* |
Storage | 50GB SSD | 500GB SSD | 1TB NVMe SSD |
GPU | GeForce RTX 3070 | GeForce RTX 4080 | RTX Ada 6000 |
VRAM | 8GB* | 16GB* | 48GB* |
3.3 硬件不支持的情况下安装了Isaac Sim会怎么样?
是运行卡顿?还是根本无法运行?
目前是能安装、能打开,后面找一个示例程序,看能不能成功运行。。。
文字教程:【具身智能利器】NVIDIA Isaac Sim 仿真平台体验测评。
看这个结果应该是有些问题,我在场景中创建了地面和胶囊,但是无法显示,和上面教程不一样。。。。
终端的一些提示,目前还看不懂,,,,
c
2025-01-08 08:14:12 [1,116ms] [Error] [gpu.foundation.plugin] No device could be created. Some known system issues:
- The driver is not installed properly and requires a clean re-install.
- Your GPUs do not support RayTracing: DXR or Vulkan ray_tracing, or hardware is excluded due to performance.
- The driver cannot enumerate any GPU: driver, display, TCC mode or a docker issue. For Vulkan, test it with Vulkaninfo tool from Vulkan SDK, instead of nvidia-smi.
- For Ubuntu, it requires server-xorg-core 1.20.7+ and a display to work without --no-window.
- For Linux dockers, the setup is not complete. Install the latest driver, xServer and NVIDIA container runtime.
2025-01-08 08:14:24 [12,926ms] [Warning] [omni.hydra.rtx] HydraEngine rtx failed creating scene renderer.
4. PyBullet
4.1 安装
文字教程:Ubuntu安装PyBullet | PyBullet 导入Ur5 | PyBullet简单使用 | 关于机械臂强化学习仿真引擎的选择 | PyBullet入门操作
官方网站:
安装没问题:
c
pip3 install pybullet
跑官方示例报错:
c
python -m pybullet_robots.panda.loadpanda
报错:
c
(gym) xj@xj:~$ python -m pybullet_robots.panda.loadpanda
pybullet build time: Nov 28 2023 23:51:11
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .
Traceback (most recent call last):
File "/home/xj/anaconda3/envs/gym/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/xj/anaconda3/envs/gym/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/xj/anaconda3/envs/gym/lib/python3.8/site-packages/pybullet_robots/panda/loadpanda.py", line 1, in <module>
import pybullet as p
ImportError: numpy.core.multiarray failed to import
这位博主说是因为opencv与numpy版本冲突,方案解决numpy.core.multiarray failed to import(numpy不降级方案)是
在另一个环境里重新安装、并运行示例后报错:
c
(gymenv) xj@xj:~$ python -m pybullet_robots.panda.loadpanda
pybullet build time: Nov 28 2023 23:48:36
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/xj/anaconda3/envs/gymenv/lib/python3.11/site-packages/pybullet_robots/panda/loadpanda.py", line 1, in <module>
import pybullet as p
Traceback (most recent call last):
File "/home/xj/anaconda3/envs/gymenv/lib/python3.11/site-packages/numpy/core/_multiarray_umath.py", line 44, in __getattr__
raise ImportError(msg)
ImportError:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/xj/anaconda3/envs/gymenv/lib/python3.11/site-packages/pybullet_robots/panda/loadpanda.py", line 1, in <module>
import pybullet as p
ImportError: numpy.core.multiarray failed to import
意思是numpy版本高于2.0,而程序使用numpy1.0编译的。解决方案是改代码或者降numpy版本,最简单的是降Numpy版本。
这个环境里我的numpy版本是numpy 2.2.1
,因为上面试了低于1.22的不行,所以这次尝试高一点的版本:conda install numpy==1.26
再次运行:python -m pybullet_robots.panda.loadpanda
最终在另一个环境里成功打开了示例(下图),表明安装成功。
同时终端打印:
c
(gymenv) xj@xj:~$ python -m pybullet_robots.panda.loadpanda
pybullet build time: Nov 28 2023 23:48:36
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=2
argv[0] = --unused
argv[1] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce GTX 1650/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 550.120
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
pthread_getconcurrency()=0
Version = 3.3.0 NVIDIA 550.120
Vendor = NVIDIA Corporation
Renderer = NVIDIA GeForce GTX 1650/PCIe/SSE2
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = NVIDIA Corporation
ven = NVIDIA Corporation
4.2 尝试一个强化学习的仓库:panda-gym
panda-gym 基于 PyBullet 引擎开发,围绕 panda 机械臂封装了 reach、push、slide、pick&place、stack、flip 等 6 个任务,主要也是受 OpenAI Fetch 启发,发表在了 NeurIPS 2021 的 workshop 上。本文以 v2.0.0 为例,安装同样很简单:
python
pip install panda-gym==2.0.0
如果想修改现有代码,或者自定义更多的任务,可以下载源码后,以 -e 方式安装:
python
git clone https://github.com/qgallouedec/panda-gym.git
pip install -e panda-gym
安装完成后运行 examples/reach.py,可检查是否正常。
这一节我的硬件都是是win11+python3.8。
4.2.1 安装和测试
比如机械臂强化学习实战(stable baselines3+panda-gym),这个是转载自知乎的机械臂强化学习实战(stable baselines3+panda-gym)。前者是学习笔记,包含报错和修改经验,知乎的原文更专业。
panda-gym的官方文档:https://panda-gym.readthedocs.io/en/latest/index.html
panda-gym的github链接:https://github.com/qgallouedec/panda-gym
pip install panda-gym
时也会同时安装pybullet和gymnasium:
用import gym
的代码版本会报错:
c
No registered env with id: PandaReach-v2
KeyError: 'PandaReach-v2'
应该替换为官网的下面代码进行测试,就可以了。
c
import gymnasium as gym
import panda_gym
env = gym.make('PandaReach-v3', render_mode="human")
observation, info = env.reset()
for _ in range(1000):
action = env.action_space.sample() # random action
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
弹出显示屏:
显然,由于选择的动作随机,目前没有任何学习。要访问学习任务的部分,请参阅使用章节Train with stable-baselines3。
4.2.2 报错和修改
gymnasium和gym能导入的环境不一样,如果环境不存在的话会报错。
4.2.2.1 正确的版本
总结:
c
gym 0.17.3 # 用于PandaReach-v2
gymnasium 1.0.0 # 用于PandaReach-v3
stable-baselines3 1.3.0
panda-gym 2.0.0 # 官网可训练版本,与stable-baselines3兼容
panda-gym 3.0.7 # 自动安装版本,与stable-baselines3不兼容
比如:
gymnasium是新版本,环境包括"PandaReach-v3"
c
import gymnasium as gym
import panda_gym # 显式地导入 panda-gym,没有正确导入panda-gym也会出问题
env = gym.make("PandaReach-v3")
gym是旧版本,环境包括"PandaReach-v2"
c
import gym
import panda_gym # 显式地导入 panda-gym,没有正确导入panda-gym也会出问题
env = gym.make("PandaReach-v2")
经过试验,发现gym的版本只能在vscode界面运行,相同环境在pycharm报错。pycharm环境只能正确运行v3版本的环境配置。
可以用如下代码打印所有环境:
c
import gym # 或者import gymnasium as gym
import panda_gym
all_envs = gym.envs.registry.all()
for env_spec in all_envs:
print(env_spec.id)
4.2.2.2 报错的版本
参考:gym.error.UnregisteredEnv: No registered env with id: PandaReach-v2或类似env丢失问题解决。
gymnasium和gym导入环境不存在的话会报错,比如:
训练代码:
c
import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG
env = gym.make("PandaReach-v2")
model = DDPG(policy="MultiInputPolicy", env=env)
model.learn(30_000)
报错:
c
发生异常: DeprecatedEnv
Environment version v2 for `PandaReach` is deprecated. Please use `PandaReach-v3` instead.
File "C:\CodeFeng\VALIDATE_PANDA_GYM\train_with_sb3.py", line 5, in <module>
env = gym.make("PandaReach-v2")
gymnasium.error.DeprecatedEnv: Environment version v2 for `PandaReach` is deprecated. Please use `PandaReach-v3` instead.
SB3 is not compatible with panda-gym v3 for the moment. (See SB3/PR#780). The following documentation is therefore not yet valid. To use panda-gym with SB3, you will have to use panda-gym==2.0.0.
SB3目前与panda-gym v3不兼容。(SeeSB3/PR#780)。因此,以下文档尚未有效。要在SB 3中使用panda-gym,您必须使用panda-gym==2.0.0。
官网说SB3与panda-gym v3不兼容,必须使用panda-gym==2.0.0,所以安装pip install panda-gym==2.0.0
,再训练
但是报新的错:环境PandaReach
不存在。。。
c
发生异常: NameNotFound
Environment `PandaReach` doesn't exist.
File "C:\CodeFeng\VALIDATE_PANDA_GYM\train_with_sb3.py", line 5, in <module>
env = gym.make("PandaReach-v2")
gymnasium.error.NameNotFound: Environment `PandaReach` doesn't exist.
前面说过,gymnasium环境包括"PandaReach-v3"
,gym环境包括"PandaReach-v2"
,而官网提示train with sb3肯定能用于PandaReach-v2
,因此,此处把import gymnasium as gym
换成import gym
:
c
import gym
import panda_gym
from stable_baselines3 import DDPG
env = gym.make("PandaReach-v2")
model = DDPG(policy="MultiInputPolicy", env=env)
model.learn(30_000)
训练成功,终端显示:
4.2.3 RL训练
以 PandaReach-v2 任务为例,训练 DDPG/TD3/SAC+HER 算法,方便做横向对比。
reach 任务比较简单,要求机械臂到达指定位置,误差在一定范围之内即代表成功,默认采用稀疏奖励。
c
import gym
from stable_baselines3 import DDPG, TD3, SAC, HerReplayBuffer
env = gym.make("PandaReach-v2")
log_dir = './panda_reach_v2_tensorboard/'
# DDPG
model = DDPG(policy="MultiInputPolicy", env=env, buffer_size=100000, replay_buffer_class=HerReplayBuffer, verbose=1, tensorboard_log=log_dir)
model.learn(total_timesteps=20000)
model.save("ddpg_panda_reach_v2")
# TD3
model = TD3(policy="MultiInputPolicy", env=env, buffer_size=100000, replay_buffer_class=HerReplayBuffer, verbose=1, tensorboard_log=log_dir)
model.learn(total_timesteps=20000)
model.save("td3_panda_reach_v2")
# SAC
model = SAC(policy="MultiInputPolicy", env=env, buffer_size=100000, replay_buffer_class=HerReplayBuffer, verbose=1, tensorboard_log=log_dir)
model.learn(total_timesteps=20000)
model.save("sac_panda_reach_v2")
训练时渲染默认是关闭的,以加快训练速度。每个算法训练大概需要10分钟,训练结果在 ddpg/td3/sac_panda_reach_v2.zip 中,训练过程曲线在 ./panda_reach_v2_tensorboard 中。
4.2.4 效果测试
训练曲线首先在命令行启动 tensorboard:
c
tensorboard --logdir ./panda_reach_v2_tensorboard/
tensorboard 的默认端口是 6006,点击或者在浏览器中输入网址:http://localhost:6006/ 即可查看曲线。
从训练曲线来看,对于 PandaReach-v2 而言,DDPG 的学习速度稍快,但稳定性不如 TD3 和 SAC。三个算法的成功率都接近 100%,接下来我们来看实际的测试效果。
实际效果
对于机械臂的实际运行效果,眼见为实,代码如下:
c
import gym
# import panda_gym # 原文没有,不加报错
from stable_baselines3 import DDPG, TD3, SAC, HerReplayBuffer
env = gym.make("PandaReach-v2", render=True)
model = DDPG.load('ddpg_panda_reach_v2', env=env)
# model = TD3.load('td3_panda_reach_v2', env=env)
# model = SAC.load('sac_panda_reach_v2', env=env)
obs = env.reset()
for i in range(1000):
action, _state = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
env.render()
if done:
print('Done')
obs = env.reset()
报错:
上面代码把import panda_gym
取消注释就可以了:
4.2.5 后记:PandaReach-v3版环境
以上都是在vscode中运行和调试,没有问题。
因为我比较熟悉pycharm,所以打算在pycharm中运行(同样的环境、同样的代码),但是在pycharm中打开,却报错了:缺 gym-robotics,
不安装/卸载环境中的任何包,相同的环境,在vscode中是可以正常运行的。。。。这是为什么啊?
根据报错安装gym-robotics,自动安装版本是gym-robotics 1.0.1,但是它需要gym>=0.26
c
gym-robotics 1.0.1 requires gym>=0.26, but you have gym 0.17.3 which is incompatible.
但是装了gym-0.26.2,又提示与stable-baselines3版本不匹配:
c
stable-baselines3 1.3.0 requires gym<0.20,>=0.17, but you have gym 0.26.2 which is incompatible.
Successfully installed gym-0.26.2 gym-robotics-1.0.1
gym-robotics 1.0.1 & gym 0.17.3,报错:
经过测试,得出:与gymanasium兼容的stable-baselines3、在pycharm中能运行 、PandaReach-v3版的环境组合是:
c
Package Version Editable project location
----------------------- ----------- -------------------------
gymnasium 1.0.0
mujoco 3.2.6
numpy 1.26.4
panda_gym 3.0.8 C:\CodeFeng\panda-gym
pybullet 3.2.6
pygame 2.6.1
stable-baselines3 2.4.1
test_with_sb3.py又报错:
c
import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG, TD3, SAC, HerReplayBuffer
env = gym.make("PandaReach-v3")
model = DDPG.load('ddpg_panda_reach_v3', env=env)
# model = TD3.load('td3_panda_reach_v3', env=env)
# model = SAC.load('sac_panda_reach_v3', env=env)
obs = env.reset()
for i in range(1000):
print(i)
action, _state = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
env.render()
if done:
print('Done')
obs = env.reset()
c
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
按chatgpt尝试后,正确的测试代码test_with_sb3.py如下:
c
import gymnasium as gym
import panda_gym
from stable_baselines3 import TD3
from stable_baselines3.common.vec_env import DummyVecEnv
# 创建 PandaReach-v3 环境并启用渲染
def make_env():
env = gym.make("PandaReach-v3", render_mode="human") # 设置 render_mode 为 "human" 以显示操作界面
return env
# 使用 DummyVecEnv 包装环境
env = DummyVecEnv([make_env])
# 加载模型
model = TD3.load('td3_panda_reach_v3', env=env)
# 重置环境
obs = env.reset()
# 执行强化学习策略
for i in range(1000):
action, _state = model.predict(obs, deterministic=True) # 使用模型预测动作
obs, reward, done, info = env.step(action) # 执行动作
env.envs[0].render() # 渲染环境(对 DummyVecEnv 的第一个环境进行渲染)
if done:
print('Episode finished')
obs = env.reset() # 重置环境
5. 一些疑问和答案
python的gymnasium和gym包、以及Isaac gym有什么关系呢?这些名字很像。。。有点晕
为什么导入了gymnasium或gym后,还要显示的导入panda_gym呢?
5.1 panda-gym: 开源的目标条件机器人学习环境
panda-gym是一个基于PyBullet物理引擎和Gymnasium框架开发的开源机器人强化学习环境集。该项目由Quentin Gallouédec等人开发,旨在为机器人学习研究提供一个灵活、易用的仿真平台。panda-gym的核心是模拟Franka Emika公司的Panda机器人,并提供了一系列经典的机器人操作任务。
相关论文:arXiv:2106.13687
项目特点
panda-gym具有以下几个主要特点:
- 基于开源物理引擎PyBullet,保证了仿真的准确性和可扩展性。
- 兼容OpenAI Gym接口,可以无缝对接各种强化学习算法。
- 提供多种经典机器人操作任务,如抓取、推动、翻转等。
- 支持目标条件学习,适用于多目标强化学习研究。
- 代码结构清晰,易于二次开发和自定义新任务。
环境介绍
panda-gym目前包含以下6种经典任务环境:
- Reach: 控制机械臂末端执行器到达目标位置
- Push: 推动物体到指定目标位置
- Slide: 滑动物体到指定目标位置
- Pick and Place: 抓取物体并放置到目标位置
- Stack: 将两个物体堆叠在目标位置
- Flip: 将物体翻转到目标朝向
自定义环境
panda-gym的一大优势是其良好的可扩展性。研究者可以基于现有代码轻松创建自定义的机器人或任务:
- 自定义机器人:继承Robot基类,实现机器人的运动学和动力学。
- 自定义任务:继承Task基类,定义任务的目标、奖励函数等。
- 组合新环境:将自定义的机器人和任务组合,注册为新的Gym环境。
这种模块化的设计使得panda-gym非常适合进行各种机器人学习的研究工作。
基线结果
panda-gym的开发者还提供了一些基于state-of-the-art强化学习算法的基线结果,可以在rl-baselines3-zoo项目中找到。这些预训练模型也已经上传到了Hugging Face Hub,方便研究者直接使用或进行对比实验。
Hugging Face Hub打开以后是这样的:
惊!panda-gym只有v1版本吗!!??