EasyCarla-RL 架构设计文档
目录
训练脚本架构
整体架构
train_offline_bc_planning.py
├── 模型定义
│ ├── PlanningBCNetwork (支持轨迹预测的BC网络)
│ └── Diffusion_BC_Planning (Diffusion BC + 轨迹预测)
├── 数据加载
│ ├── load_dataset() (从HDF5加载)
│ ├── extract_trajectory_from_obs() (提取轨迹目标)
│ └── ReplayBuffer (经验回放缓冲区)
├── 训练配置 (config)
└── 训练主循环
├── 损失计算 (BC损失 + 轨迹损失)
├── 反向传播
└── 模型保存
核心组件
1. PlanningBCNetwork
python
class PlanningBCNetwork(nn.Module):
"""
支持轨迹预测的BC网络
输入: (noise_action, time, state)
输出: 动作(3维) + 轨迹(20个点,每个点4维: x, y, yaw, speed)
"""
网络结构:
Input: noise_action (3) + time_embed (16) + state (307) = 326维
↓
Time Embedding
- SinusoidalPosEmb (16维)
- Linear (16 → 32)
- Mish
- Linear (32 → 16)
↓
Shared Backbone
- Linear (326 → 512)
- Mish
- Linear (512 → 512)
- Mish
- Linear (512 → 256)
- Mish
↓
├─→ Action Head
│ - Linear (256 → 128)
│ - Mish
│ - Linear (128 → 3) [Throttle, Steer, Brake]
│
└─→ Trajectory Head
- Linear (256 → 256)
- Mish
- Linear (256 → 80) # 20点 × 4维
- Reshape: (B, 20, 4) [x, y, yaw, speed]
2. Diffusion_BC_Planning
python
class Diffusion_BC_Planning(Diffusion_BC):
"""
支持轨迹预测的Diffusion BC
继承自Diffusion_BC,但使用PlanningBCNetwork替代MLP
"""
训练流程:
1. 从ReplayBuffer采样 (state, action, next_state, reward, not_done, trajectory_target)
2. 随机采样时间步 t
3. 添加噪声: x_noisy = q_sample(action, t, noise)
4. 模型前向传播: pred_action, pred_trajectory = model(x_noisy, t, state)
5. 计算损失:
- BC损失 = MSE(pred_action, noise/action)
- 轨迹损失 = MSE(pred_trajectory, trajectory_target)
- 总损失 = BC损失 + 0.5 * 轨迹损失
6. 反向传播并优化
推理流程:
1. 使用Diffusion采样获取action (100步去噪)
2. 使用采样得到的action + t=0 获取trajectory
3. 返回: action (3维), trajectory (20×4维)
数据集加载
HDF5数据结构
python
data = {
'observations': (N, 307), # 观测向量
'actions': (N, 3), # 动作 [throttle, steer, brake]
'rewards': (N, 1), # 奖励
'next_observations': (N, 307), # 下一观测
'dones': (N, 1), # 是否结束
}
观测向量结构 (307维)
[0-8] ego_state (9维)
- ego_state[0]: 位置 x (全局坐标系)
- ego_state[1]: 位置 y (全局坐标系)
- ego_state[2]: 航向角 yaw (弧度)
- ego_state[3]: 速度 (m/s)
- ego_state[4]: 角速度 z (rad/s)
- ego_state[5]: 加速度 x (m/s²)
- ego_state[6]: 加速度 y (m/s²)
- ego_state[7]: 前方距离 (m)
- ego_state[8]: 前车速度差 (m/s)
[9-10] lane_info (2维)
- lane_info[0]: 车道宽度
- lane_info[1]: 横向偏移
[11-250] lidar (240维)
- 激光雷达探测距离
[251-270] nearby_vehicles (20维)
- 5个周围车辆 × 4维 [dist, speed, dx, dy]
[271-306] waypoints (36维)
- 12个路径点 × 3维 [local_x, local_y, local_yaw]
轨迹目标提取
python
def extract_trajectory_from_obs(obs, trajectory_len=20, point_dim=4):
"""
从观测中提取轨迹目标
输入: (B, 307) 观测
输出: (B, 20, 4) 轨迹 [x, y, yaw, speed]
流程:
1. 提取waypoints (最后36维) → (B, 12, 3)
2. 前12个点使用真实waypoint
3. 目标速度使用自车速度 (obs[:, 3])
4. 后8个点通过外推获得 (沿最后有效waypoint方向延伸)
"""
模型输出
训练阶段输出
python
# 训练时的模型输出
pred_action: (B, 3) # [throttle, steer, brake]
pred_trajectory: (B, 20, 4) # [x, y, yaw, speed]
推理阶段输出
python
# 推理时的模型输出 (通过select_action)
action: (3,) # [throttle, steer, brake]
trajectory: (20, 4) # [x, y, yaw, speed]
训练配置
python
config = {
'state_dim': 307,
'action_dim': 3,
'max_action': 1.0,
'trajectory_len': 20,
'point_dim': 4,
'trajectory_loss_weight': 0.5,
'epochs': 200,
'iters_per_epoch': 1000,
'batch_size': 128,
'lr': 2e-4,
'n_timesteps': 100,
'beta_schedule': 'linear',
'dataset_path': 'easycarla_offline_dataset.hdf5',
'save_dir': 'params_bc_planning',
'save_interval': 25,
}
测试脚本架构
整体架构
run_bc_in_carla.py
├── 模型定义
│ ├── PlanningBCNetwork (与训练脚本一致)
│ └── Diffusion_BC_Planning (与训练脚本一致)
├── 速度规划
│ ├── idm_acceleration() (IDM加速度计算)
│ ├── idm_action() (加速度转动作)
│ └── smooth_speed_planning() (平滑速度规划)
├── 辅助函数
│ └── convert_obs_dict_to_vector() (观测字典转向量)
├── 环境配置 (carla_params)
└── 主测试流程
├── 连接CARLA环境
├── 加载模型
├── 仿真循环
│ ├── 获取观测
│ ├── BC推理
│ ├── IDM速度调整
│ ├── 执行动作
│ └── 可视化
└── 统计输出
核心组件
1. IDM速度规划
python
def idm_acceleration(v, v_lead, s, v0=8.0, a_max=2.0, b=1.5, T=1.5, s0=4.0, delta=4.0):
"""
IDM(Intelligent Driver Model)核心公式计算加速度
参数:
v: 当前自车速度 (m/s)
v_lead: 前车速度 (m/s)
s: 当前与前车的距离 (m)
v0: 期望速度 (m/s)
a_max: 最大加速度 (m/s²)
b: 舒适减速度 (m/s²)
T: 时间头距 (s)
s0: 最小安全距离 (m)
delta: 加速度指数
返回:
acceleration: 计算得到的加速度 (m/s²)
"""
# 动态安全距离计算
if v_lead < 0.5:
dynamic_s0 = s0 + v * 1.5 + 2.0 # 静止前车
elif v_lead < v0 * 0.5:
dynamic_s0 = s0 + v * T * 0.5 # 中低速
else:
dynamic_s0 = s0 + v * T * 0.3 # 高速
delta_v = v - v_lead
s_star = dynamic_s0 + v * T + (v * delta_v) / (2 * np.sqrt(a_max * b))
if s <= 0.0:
return -10.0 # 即将碰撞
elif s < s_star:
s_ratio = s_star / s
else:
s_ratio = 1.0
term1 = 1.0 - (v / v0) ** delta
term2 = s_ratio ** 2
acceleration = a_max * (term1 - term2)
acceleration = max(acceleration, -5.0) # 限制最大减速度
return acceleration
2. 平滑速度规划
python
def smooth_speed_planning(current_speed, desired_speed, trajectory,
comfort_accel=2.0, dt=0.1):
"""
平滑速度规划:根据当前车速、舒适加速度,规划到目标车速
输入:
current_speed: 当前自车速度 (m/s)
desired_speed: 目标速度 (m/s)
trajectory: 预测轨迹 (N, 4)
输出:
updated_trajectory: 更新目标速度后的轨迹
"""
speed_diff = desired_speed - current_speed
accel_time = abs(speed_diff) / comfort_accel
for i in range(n_points):
t = (i + 1) * dt
if t <= accel_time:
ratio = t / accel_time
smooth_ratio = ratio * ratio * (3 - 2 * ratio) # S曲线
new_speed = current_speed + speed_diff * smooth_ratio
else:
new_speed = desired_speed
updated_trajectory[i, 3] = max(0.0, new_speed)
return updated_trajectory
环境配置
python
carla_params = {
'number_of_vehicles': 100,
'number_of_walkers': 100,
'dt': 0.2,
'ego_vehicle_filter': 'vehicle.tesla.model3',
'surrounding_vehicle_spawned_randomly': True,
'port': 2000,
'town': 'Town03',
'max_time_episode': 3000,
'max_waypoints': 12,
'visualize_waypoints': True,
'desired_speed': 8,
'max_ego_spawn_times': 200,
'view_mode': 'top',
'traffic': 'off',
'lidar_max_range': 50.0,
'max_nearby_vehicles': 5,
# 控制模式:'direct' 或 'trajectory'
'control_mode': 'trajectory',
# PID控制器参数
'steer_kp': 0.5,
'steer_kd': 0.1,
'speed_kp': 3.0,
'speed_ki': 0.1,
'speed_kd': 1.0,
}
测试流程
1. 连接CARLA环境
└─> gym.make('carla-v0', params=carla_params)
2. 加载BC规划模型
└─> Diffusion_BC_Planning(trajectory_len=20, point_dim=4)
└─> load_model('params_bc_planning', id=200)
3. 环境重置
└─> env.reset()
└─> 5次tick激活物理引擎
4. 仿真循环 (max_steps=3000)
│
├─> 获取观测
│ ├─> convert_obs_dict_to_vector(obs) → state (307维)
│ ├─> current_speed = obs['ego_state'][3]
│ └─> front_distance = obs['ego_state'][7]
│
├─> 寻找前车
│ └─> 遍历nearby_vehicles,找到前方车道内最近车辆
│ └─> lead_vehicle_dist, lead_vehicle_speed
│
├─> BC推理
│ ├─> action, trajectory = agent.select_action(state)
│ └─> action: (3,), trajectory: (20, 4)
│
├─> IDM速度调整
│ ├─> if lead_vehicle_dist < 50 and lead_vehicle_dist > 0:
│ │ ├─> 计算IDM加速度
│ │ ├─> 逐点更新轨迹速度
│ │ └─> final_trajectory = adjusted_trajectory
│ └─> else:
│ ├─> smooth_speed_planning()
│ └─> final_trajectory = adjusted_trajectory
│
├─> 执行动作
│ ├─> action_dict = {'action': action, 'trajectory': final_trajectory}
│ └─> env.step(action_dict)
│
├─> 可视化
│ ├─> 绿色:环境参考轨迹 (12点)
│ ├─> 蓝色:模型预测轨迹 (20点)
│ └─> 红色:前车标记
│
├─> 统计
│ ├─> 碰撞次数
│ ├─> 偏离车道次数
│ ├─> 总行驶距离
│ └─> 实时控制输出
│
└─> if done:
└─> break
5. 仿真完成
└─> env.close()
数据集加载
训练数据集
HDF5文件格式
python
# 文件: easycarla_offline_dataset.hdf5
with h5py.File('easycarla_offline_dataset.hdf5', 'r') as f:
observations = f['observations'][:] # (N, 307)
actions = f['actions'][:] # (N, 3)
rewards = f['rewards'][:] # (N, 1)
next_observations = f['next_observations'][:] # (N, 307)
dones = f['done'][:] # (N, 1)
数据采样
python
class ReplayBuffer:
def sample(self, batch_size):
idx = np.random.randint(0, self.n_samples, batch_size)
state = torch.FloatTensor(self.data['observations'][idx])
action = torch.FloatTensor(self.data['actions'][idx])
next_state = torch.FloatTensor(self.data['next_observations'][idx])
reward = torch.FloatTensor(self.data['rewards'][idx]).unsqueeze(1)
not_done = torch.FloatTensor(1 - self.data['dones'][idx]).unsqueeze(1)
# 提取轨迹目标
trajectory_target = torch.FloatTensor(
extract_trajectory_from_obs(self.data['observations'][idx])
)
return state, action, next_state, reward, not_done, trajectory_target
测试数据集
实时观测获取
python
# CARLA环境观测字典
obs = {
'ego_state': (9,), # 自车状态
'lane_info': (2,), # 车道信息
'lidar': (240,), # 激光雷达
'nearby_vehicles': (20,), # 周围车辆
'waypoints': (36,), # 路径点
}
# 转换为训练格式
state = convert_obs_dict_to_vector(obs)
# state: (307,) 与训练时一致
模型输出
训练阶段
前向传播
python
# 模型输入
x_noisy: (B, 3) # 带噪声的动作
t: (B,) # 时间步
state: (B, 307) # 状态
# 模型输出
pred_action: (B, 3) # 预测动作 [throttle, steer, brake]
pred_trajectory: (B, 20, 4) # 预测轨迹 [x, y, yaw, speed]
损失计算
python
# BC损失
if predict_epsilon:
bc_loss = F.mse_loss(pred_action, noise)
else:
bc_loss = F.mse_loss(pred_action, action)
# 轨迹损失
trajectory_loss = F.mse_loss(pred_trajectory, trajectory_target)
# 总损失
total_loss = bc_loss + 0.5 * trajectory_loss
推理阶段
Diffusion采样
python
# 采样流程 (100步去噪)
action = sample(state):
x_t = torch.randn(B, 3)
for t in reversed(range(n_timesteps)):
x_t = p_sample(x_t, t, state)
return x_t
轨迹获取
python
# 获取轨迹 (t=0时噪声最小)
t = torch.tensor([0])
_, trajectory = model(action, t, state)
# 输出
action: (3,) # [throttle, steer, brake]
trajectory: (20, 4) # [x, y, yaw, speed]
控制输出
直接控制模式
python
action_dict = {
'action': [throttle, steer, brake],
}
env.step(action_dict)
轨迹控制模式
python
action_dict = {
'action': [throttle, steer, brake], # fallback
'trajectory': (N, 4) # [x, y, yaw, speed]
}
env.step(action_dict)
# 环境内部使用PID控制器跟踪轨迹
代码解读
训练脚本关键代码
1. 网络定义
python
class PlanningBCNetwork(nn.Module):
def forward(self, x, time, state):
# 时间嵌入
t = self.time_mlp(time)
# 拼接输入
x = torch.cat([x, t, state], dim=1) # (B, 326)
# 共享特征提取
features = self.shared_backbone(x)
# 双头输出
action = self.action_head(features)
trajectory_flat = self.trajectory_head(features)
trajectory = trajectory_flat.view(-1, 20, 4)
return action, trajectory
2. 轨迹外推
python
def extract_trajectory_from_obs(obs, trajectory_len=20, point_dim=4):
# 提取waypoints
waypoints_raw = obs[:, -36:].reshape(batch_size, 12, 3)
# 创建20点轨迹
trajectory = np.zeros((batch_size, trajectory_len, point_dim))
# 前12个点使用真实waypoint
trajectory[:, :12, :3] = waypoints_raw
# 目标速度使用自车速度
ego_speed = obs[:, 3:4]
trajectory[:, :, 3] = ego_speed[:, np.newaxis, 0]
# 后8个点通过外推
for i in range(batch_size):
# 找到最后一个有效waypoint
last_idx = find_last_valid_waypoint(waypoints_raw[i])
# 计算方向向量
dir_vec = waypoints_raw[i, last_idx, :2] - waypoints_raw[i, last_idx-1, :2]
dir_vec = dir_vec / np.linalg.norm(dir_vec)
# 外推
for j in range(12, trajectory_len):
offset = (j - last_idx) * 2.0
trajectory[i, j, 0] = waypoints_raw[i, last_idx, 0] + dir_vec[0] * offset
trajectory[i, j, 1] = waypoints_raw[i, last_idx, 1] + dir_vec[1] * offset
trajectory[i, j, 2] = waypoints_raw[i, last_idx, 2]
return trajectory
测试脚本关键代码
1. 前车检测
python
# 寻找前方最近车辆
nearby_vehicles = obs['nearby_vehicles'] # (20,) = 5辆车 × 4维
lead_vehicle_dist = float('inf')
lead_vehicle_speed = -1.0
for i in range(5):
start = i * 4
dist = nearby_vehicles[start]
speed = nearby_vehicles[start + 1]
dx = nearby_vehicles[start + 2]
dy = nearby_vehicles[start + 3]
if dist > 0 and dist < lead_vehicle_dist and dx > 0:
if abs(dy) < 1.85: # 车道宽度的一半
lead_vehicle_dist = dist
lead_vehicle_speed = speed
2. IDM速度调整
python
if lead_vehicle_dist < 50.0 and lead_vehicle_dist > 0:
adjusted_trajectory = predicted_trajectory.copy()
# 计算IDM加速度
v_lead = max(0.0, lead_vehicle_speed)
idm_accel = idm_acceleration(
v=current_speed,
v_lead=v_lead,
s=lead_vehicle_dist,
v0=8.0
)
# 逐点更新轨迹速度
for i in range(len(adjusted_trajectory)):
dt = 0.1
point_dist = max(0.0, lead_vehicle_dist - i * current_speed * dt)
local_idm = idm_acceleration(
v=adjusted_trajectory[i-1, 3] if i > 0 else current_speed,
v_lead=v_lead,
s=point_dist,
v0=8.0
)
prev_speed = adjusted_trajectory[i-1, 3] if i > 0 else current_speed
new_speed = prev_speed + local_idm * dt
adjusted_trajectory[i, 3] = max(0.0, new_speed)
final_trajectory = adjusted_trajectory
3. 平滑速度规划
python
else:
final_trajectory = smooth_speed_planning(
current_speed=current_speed,
desired_speed=8.0,
trajectory=predicted_trajectory,
comfort_accel=4.0,
dt=0.1
)
4. 轨迹可视化
python
# 转换到全局坐标
for i in range(len(predicted_trajectory)):
lx, ly, lyaw, lspd = predicted_trajectory[i]
gx = np.cos(ego_yaw_rad) * lx - np.sin(ego_yaw_rad) * ly + ego_x
gy = np.sin(ego_yaw_rad) * lx + np.cos(ego_yaw_rad) * ly + ego_y
env.world.debug.draw_point(
carla.Location(x=gx, y=gy, z=0.5),
size=0.1,
color=carla.Color(r=0, g=0, b=255),
life_time=traj_life_time
)
架构对比
| 组件 | 训练脚本 | 测试脚本 |
|---|---|---|
| 模型架构 | PlanningBCNetwork (共享backbone + 双头) | 与训练脚本一致 |
| 模型输出 | action (3维) + trajectory (20×4维) | action (3维) + trajectory (20×4维) |
| 数据来源 | HDF5离线数据集 | CARLA实时环境 |
| 观测格式 | 307维向量 | 字典 → 307维向量 |
| 损失函数 | BC损失 + 0.5 × 轨迹损失 | 无 (推理阶段) |
| 优化器 | Adam (lr=2e-4) | 无 (推理阶段) |
| 采样策略 | Diffusion (100步去噪) | Diffusion (100步去噪) |
| 速度规划 | 无 | IDM + 平滑规划 |
| 控制模式 | 无 | 轨迹控制 (PID) |
| 可视化 | 无 | 轨迹点显示 |
数据流对比
训练数据流
HDF5文件
↓
load_dataset()
↓
ReplayBuffer.sample()
↓
(state, action, next_state, reward, not_done, trajectory_target)
↓
Diffusion_BC_Planning.train()
├─> q_sample() → x_noisy
├─> model(x_noisy, t, state) → pred_action, pred_trajectory
├─> bc_loss = MSE(pred_action, noise/action)
├─> trajectory_loss = MSE(pred_trajectory, trajectory_target)
└─> total_loss.backward()
测试数据流
CARLA环境
↓
env.reset() → obs (dict)
↓
convert_obs_dict_to_vector(obs) → state (307)
↓
agent.select_action(state)
├─> Diffusion.sample(state) → action
└─> model(action, t=0, state) → trajectory
↓
IDM速度调整 / 平滑规划
↓
final_trajectory
↓
env.step({'action': action, 'trajectory': final_trajectory})
↓
可视化 / 统计
关键差异
| 方面 | 训练 | 测试 |
|---|---|---|
| 轨迹目标 | 从观测waypoints提取 | 模型预测 |
| 速度规划 | 静态 (使用当前速度) | 动态 (IDM + 平滑) |
| 前车处理 | 无 | IDM跟车 |
| 控制执行 | 无 | PID跟踪轨迹 |
| 可视化 | 无 | 轨迹点显示 |
总结
训练脚本特点
- 纯模仿学习:直接从专家数据学习动作和轨迹
- Diffusion架构:使用扩散模型进行动作采样
- 多任务学习:同时预测动作和轨迹,共享特征提取
- 轨迹外推:从12个waypoint外推到20个轨迹点
- 加权损失:BC损失 + 0.5 × 轨迹损失
测试脚本特点
- 实时推理:在CARLA仿真环境中实时运行
- IDM集成:使用IDM模型进行跟车和速度规划
- 平滑规划:S曲线平滑过渡速度
- 轨迹控制:使用PID控制器跟踪预测轨迹
- 可视化支持:显示参考轨迹和预测轨迹
架构优势
- 端到端训练:直接从原始观测学习控制策略
- 轨迹预测:提供未来轨迹信息,支持更安全的控制
- Diffusion采样:提高动作多样性和鲁棒性
- IDM集成:融合经典跟车模型,提高安全性
- 模块化设计:训练和测试代码分离,易于维护
文件路径
- 训练脚本:
EasyCarla-RL/example/train_offline_bc_planning.py - 测试脚本:
EasyCarla-RL/example/run_bc_in_carla.py - 数据集:
EasyCarla-RL/example/easycarla_offline_dataset.hdf5 - 模型保存:
EasyCarla-RL/example/params_bc_planning/
文档版本 : 1.0
更新时间: 2026-06-28