本文对Realtime-VLA V2 进行复现,记录一下,供大家参考:
开源地址:https://github.com/dexmal/realtime-vla-v2
论文地址:Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate
1、下载代码
执行下面指令,拉取代码:
bash
git clone https://github.com/dexmal/realtime-vla-v2.git
进入代码目录:
bash
cd realtime-vla-v2
2、搭建conda开发环境
执行下面指令,创建开发环境:
bash
conda create -n realtime-vla-v2 python=3.10 -y
等待搭建完成后,进入realtime-vla-v2开发环境
bash
conda activate realtime-vla-v2
3、安装依赖库
执行下面指令,更新pip的版本:
bash
python -m pip install --upgrade pip
然后修改 requirements.txt 文件,将acados_template注释掉,后续通过源码安装
numpy
pyyaml
requests
opencv-python
pillow
fastapi
uvicorn
scipy
osqp
transformers
torch
triton
casadi
# acados_template
rerun-sdk
pyrealsense2
执行下面指令,通过requirements.txt 指定安装依赖库:
bash
pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple/
4、安装 acados_template
4.1. 克隆 acados 仓库(包含 acados_template)
bash
git clone --recursive https://github.com/acados/acados.git
cd acados
4.2. 编译安装 C 核心库
bash
mkdir -p build && cd build
cmake .. -DACADOS_WITH_QPOASES=ON -DACADOS_WITH_OSQP=ON
make -j$(nproc)
sudo make install
4.3. 安装 Python 接口(acados_template 在 interfaces/acados_template 目录下)
bash
cd ../interfaces/acados_template
pip install .
打印信息
Downloading contourpy-1.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Downloading fonttools-4.63.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 90.9 MB/s 0:00:00
Downloading kiwisolver-1.5.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 73.6 MB/s 0:00:00
Downloading pyparsing-3.3.2-py3-none-any.whl (122 kB)
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: acados_template
Building wheel for acados_template (pyproject.toml) ... done
Created wheel for acados_template: filename=acados_template-0.5.1-py3-none-any.whl size=424204 sha256=4929e4fc66cb051321e4ac264c76d0b2850a3e08dcfa1e62a6372bc7de57a9eb
Stored in directory: /tmp/pip-ephem-wheel-cache-nr33iee_/wheels/55/23/3d/aaa0df53ea7235a6723701327f8df34ad0587f16c154f7164b
Successfully built acados_template
Installing collected packages: wrapt, six, pyparsing, kiwisolver, fonttools, cython, cycler, contourpy, python-dateutil, Deprecated, matplotlib, acados_template
Successfully installed Deprecated-1.3.1 acados_template-0.5.1 contourpy-1.3.2 cycler-0.12.1 cython-3.2.5 fonttools-4.63.0 kiwisolver-1.5.0 matplotlib-3.10.9 pyparsing-3.3.2 python-dateutil-2.9.0.post0 six-1.17.0 wrapt-2.2.1
安装完成后,通过pip list能看到 acados_template 依赖库啦
Package Version
---------------------- ------------
acados_template 0.5.1
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.13.0
attrs 26.1.0
casadi 3.7.2
certifi 2026.5.20
charset-normalizer 3.4.7
.....
5、运行示例
这里给出 Realtime-VLA V2 的三个常用任务示例:布料折叠、芯片放置、盒子放置。
建议在同一 conda 环境中,分别打开两个终端窗口,按"先启动服务端,再启动客户端"的顺序运行。
如果需要切换任务,只需替换对应的配置文件(config_*.yaml)。
5.1 布料折叠(Cloth Folding)
服务端启动命令:
bash
python server/infer_server.py --config server/config_cloth.yaml
客户端启动命令:
bash
python client/local_client.py --config client/config_cloth.yaml
5.2 芯片放置(Chip Placement)
服务端启动命令:
bash
python server/infer_server.py --config server/config_chip.yaml
客户端启动命令:
bash
python client/local_client.py --config client/config_chip.yaml
5.3 盒子放置(Box Placement)
服务端启动命令:
bash
python server/infer_server.py --config server/config_box.yaml
客户端启动命令:
bash
python client/local_client.py --config client/config_box.yaml
6、实践示例
让机器人抓瓶子,在服务器端新建一个配置文件 server/config_bottle.yaml
前提依赖:
模型推理权重(在realtime-vla中完成加速转换):pi05_droid_finetune_low_mem_converted.pkl
基础权重:tokenizer_path,如果是pi0.5系列的,默认是paligemma-3b-pt-224
css
server:
host: "0.0.0.0"
port: 8321
endpoint: "/infer"
model:
adapter: "openpi_rtc_triton"
config_name: "pi05_droid"
checkpoint: "/home/liguopu/lgp_dev/project/realtime-vla/pi05_droid_finetune_low_mem_converted.pkl"
prompt: "Pick up the bottle."
adarms_knob: 0
valid_action_num: 15 # 15
action_horizon: 15 # 15
action_type: "joint"
image_size: [640, 480]
tokenizer_path: "/home/liguopu/lgp_dev/project/realtime-vla/paligemma-3b-pt-224"
norm_stats_dir: "/home/liguopu/lgp_dev/project/openpi/checkpoints/pi05_droid_finetune_low_mem/my_experiment/1999/assets/droid"
discrete_state_input: true
state_dim: 14
action_dim: 14
noise_seed: null
inference:
optimizer: "timeaxis_smooth"
timeaxis_dt_ref_s: 0.01
timeaxis_dt_min_s: 0.008
timeaxis_dt_max_s: 0.016
timeaxis_lambda_acc: 10.0
timeaxis_lambda_time: 1.0
timeaxis_stride: 15
timeaxis_optdims: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
timeaxis_v_max: null
timeaxis_lambda_v: 10.0
timeaxis_horizon: 15 # 15
timeaxis_logging: false
服务端启动命令:
bash
python server/infer_server.py --config server/config_bottle.yaml
服务器端,打印信息:
Warmup: compiling prefill lengths 1..15
Warmup complete in 0.59s
[infer_server] listening on 0.0.0.0:8321
INFO: Started server process [3476300]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C to quit)
/home/liguopu/miniconda3/envs/realtime-vla-v2/lib/python3.10/site-packages/osqp/interface.py:229: UserWarning: Converting sparse A to a CSC matrix. This may take a while...
warnings.warn('Converting sparse A to a CSC matrix. This may take a while...')
在客户端新建一个配置文件 client/config_bottle.yaml
css
client:
infer_url: "http://192.188.xxx.xxx:8321"
endpoint: "/infer"
timeout_s: 0.2
run_duration_s: 72000
observer:
name: "mock"
image_size: [640, 480]
fps: 30
state_dim: 14
airbot_host: ""
airbot_left_port: 0
airbot_right_port: 0
top_camera_id: ""
left_camera_id: ""
right_camera_id: ""
enable_cameras: false
executor:
name: "raw_action"
enable_init_action: true
init_action: [0.0, 0.0, 0.0, 1.57, 0.0, -1.57, 0.0, 0.0, 0.0, 0.0, -1.57, -0.0, 1.57, 0.0]
init_steps: 100
init_sleep_s: 0.01
control_dt_s: 0.01
obs_image_delay_ms: 55.0
state_delay_s: 0.05
max_prefill_states: 15
heartbeat_history_len: 200
airbot_host: ""
airbot_left_port: 0
airbot_right_port: 0
left_gripper_bias: 0.0
right_gripper_bias: 0.0
infer_fixed_dims: [7, 8, 9, 10, 11, 12]
infer_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57]
command_fixed_dims: [7, 8, 9, 10, 11, 12, 13]
command_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57, 0.0]
action_interval_ms: 10.0
action_speed_limit_per_s: 10.0
action_speed_limit_dims: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12]
enable_servo_interpolation: true
servo_interval_ms: 10.0
savgol_window_length: 1
savgol_polyorder: 3
forward_track_alpha: 1.0
forward_track_delay_cnt: 5
forward_track_lead_s: 0.15
visualization:
output_dir: "./output_bottle_mock"
enable_recording: True
record_videos: True
record_rerun: True
max_pending_video_frames: 32
客户端启动命令:
bash
python client/local_client.py --config client/config_bottle.yaml
客户端,打印信息:
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=10 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=10 onlyinfer_s=0.048s
......
7、速度自适应模型(执行节拍与延迟对齐)
整体思路:
执行速度并不是靠一个独立"调速器"来控制,而是把"推理耗时、控制周期、动作队列长度"统一映射到同一条时间轴上。服务端先用 prefill 把历史状态对齐到模型输入,客户端再根据延迟预测应当消费到第几步动作,从而避免动作慢半拍。
补充说明:
- 服务端负责"历史对齐":把历史状态裁剪并送入 prefill。
- 客户端负责"节拍对齐":根据 latency 预测执行步数,并在队列不足时补齐尾部动作。
- 两端组合后,动作流会更贴近真实执行时刻,而不是请求发出时刻。
代码位置:
- server/model.py(infer_actions)
- client/executor.py(_predict_steps_with_history、prepare_infer_context)
代码示例 1:服务端 prefill 对齐(server/model.py)
python
def infer_actions(self, state: dict) -> list:
state_sequence = self.process_state_sequence_for_model(_extract_state_sequence(state))
if state_sequence.ndim == 1:
state_sequence = state_sequence[None, :]
obs_state = state_sequence[0]
prefill_len = state_sequence.shape[0]
if self._action_horizon is not None:
prefill_len = min(prefill_len, int(self._action_horizon))
# 把历史状态裁剪到 horizon,作为 prefill 动作输入模型。
prefill_actions = self._pad_prefill_actions(state_sequence[:prefill_len])
inputs = {
**self.inp_images,
"state": obs_state,
"prompt": self.prompt,
"adarms_knob": self._adarms_knob,
}
if prefill_len > 0:
inputs["actions"] = prefill_actions
inputs["action_prefill_len"] = np.int32(prefill_len)
result = self._policy.infer(inputs)["actions"]
# 模型输出后,去掉已经作为 prefill 的前缀,只保留未来动作。
result = result[min(prefill_len, result.shape[0]):]
return self.process_actions_for_robot(result).tolist()
代码示例 2:客户端延迟对齐(client/executor.py)
python
def _predict_steps_with_history(
self,
latency_s: float | None,
request_time: float | None,
history: list[dict] | None,
future_queue: list[list[float]] | None,
control_dt_s: float,
) -> int:
if latency_s is None:
latency_s = 0.0
control_dt_s = max(1e-6, float(control_dt_s))
# 无历史时,按延迟/控制周期估算至少前进多少步。
if request_time is None or not future_queue:
return max(1, int(math.ceil(float(latency_s) / control_dt_s)))
target_time = float(request_time) + max(0.0, float(latency_s))
prev_time = float(history[-1]["timestamp"]) if history else float(request_time)
future_times: list[float] = []
for _action in future_queue:
prev_time = prev_time + control_dt_s
future_times.append(float(prev_time))
predicted_idx = None
for idx, cur_time in enumerate(future_times):
if cur_time >= target_time:
predicted_idx = idx
break
if predicted_idx is None:
extra = int(math.ceil((target_time - future_times[-1]) / control_dt_s))
predicted_idx = len(future_times) - 1 + max(0, extra)
# 至少向前预测 1 步,避免停在当前步导致动作滞后。
return int(max(1, predicted_idx))
python
def prepare_infer_context(self, latency_s: float, current_state: list[float] | None = None, image_timestamp: float | None = None):
with self._action_queue_lock:
queue_snapshot = [list(a) for a in self._future_actions]
predicted_steps = self._predict_steps_with_history(
latency_s=latency_s,
request_time=time.time(),
history=[{"timestamp": float(h["timestamp"]), "action": list(h["action"])} for h in self._heartbeat_action_history],
future_queue=queue_snapshot,
control_dt_s=self.get_control_dt_s(),
)
pad_action = list(queue_snapshot[-1]) if queue_snapshot else (list(current_state) if current_state is not None else None)
# 延迟超过队列长度时,用最后一个动作补齐,保证时间轴连续。
while len(self._future_actions) <= predicted_steps and pad_action is not None:
self._future_actions.append(list(pad_action))
8、时间优化和平滑加速度
整体思路:
这部分是"两级平滑"。第一级在优化器目标函数里直接惩罚一阶差分和二阶差分,让轨迹在时间维度上更连续;第二级在实际下发前再做一次局部滤波,进一步抑制短时抖动。
补充说明:
- dy 约束的是速度突变,ddy 约束的是加速度突变。
- MPC 平滑偏向"全局一致性",SG 滤波偏向"局部可执行性"。
- 两层同时开时,通常比只开其中一层更稳定。
代码位置:
- client/executor.py(AcadosPlanner._build_solver、_savgol_smooth_action)
代码示例 1:MPC 时间平滑项(client/executor.py)
python
e_track = sqrt_w_track * (x_q - r_ref)
e_cmd = ca.sqrt(cfg.w_cmd) * (y_cmd - r_ref)
e_yx = ca.sqrt(cfg.w_yx) * (y_cmd - x_q)
# 一阶差分 dy 抑制速度突变,二阶差分 ddy 抑制加速度突变。
dy = y_cmd - y_prev
ddy = y_cmd - 2.0 * y_prev + y_prev2
e_dy = ca.sqrt(cfg.w_dy) * dy
e_ddy = ca.sqrt(cfg.w_ddy) * ddy
y_expr = ca.vertcat(e_track, e_cmd, e_yx, e_dy, e_ddy)
代码示例 2:发送前平滑(client/executor.py)
python
def _savgol_smooth_action(
action: np.ndarray | None,
history_actions: list[np.ndarray],
future_actions: list[np.ndarray],
weights: np.ndarray | None,
window_length: int,
) -> np.ndarray | None:
if action is None:
return None
action_arr = np.asarray(action, dtype=np.float32).reshape(-1)
if weights is None or window_length <= 1:
return action_arr
# 前后各取半窗,拼成局部窗口,再用 SG 权重卷积输出平滑动作。
half = window_length // 2
past = [np.asarray(item, dtype=np.float32).reshape(-1) for item in history_actions][-half:]
future = [np.asarray(item, dtype=np.float32).reshape(-1) for item in future_actions][:half]
if len(past) < half:
pad_val = past[0] if past else action_arr
past = [pad_val.copy() for _ in range(half - len(past))] + past
if len(future) < half:
pad_val = future[-1] if future else action_arr
future = future + [pad_val.copy() for _ in range(half - len(future))]
window = np.vstack(past + [action_arr] + future)
if window.shape[0] != window_length or window.shape[1] != action_arr.shape[0]:
return action_arr
return weights @ window
9、空间优化(关节空间边界与变化率约束)
整体思路:
空间优化的核心是"先定可行域,再做跟踪"。系统并不直接放开追踪参考轨迹,而是先把位置、速度、一步变化、两步变化都限制在机械臂可执行范围内,再在这个范围内选最优命令。
补充说明:
- q_min/q_max 直接限制关节命令上界和下界。
- e_max 对应速度等效边界,控制短时间内的位移幅度。
- dy_max、ddy_max 让动作变化更连续,减少机械冲击。
代码位置:
- client/executor.py(AcadosPlanner._build_solver、_set_h_bounds)
代码示例 1:空间约束建模(client/executor.py)
python
# 关节命令上下界,直接限制控制量 u(即 y_cmd)。
ocp.constraints.idxbu = np.arange(nu, dtype=np.int64)
ocp.constraints.lbu = cfg.q_min.copy()
ocp.constraints.ubu = cfg.q_max.copy()
# h 约束统一限制位置误差、一阶变化、二阶变化。
h_expr = ca.vertcat((y_cmd - x_q), (y_cmd - y_prev), (y_cmd - 2.0 * y_prev + y_prev2))
ocp.model.con_h_expr = h_expr
e_max = cfg.tau * cfg.v_max
lh = np.concatenate([-e_max, -cfg.dy_max, -cfg.ddy_max])
uh = np.concatenate([+e_max, +cfg.dy_max, +cfg.ddy_max])
ocp.constraints.lh = lh
ocp.constraints.uh = uh
代码示例 2:接触状态下动态收紧约束(client/executor.py)
python
def _set_h_bounds(self, contact_mode: bool):
# 接触模式下缩小 v_max,有助于降低碰撞和过冲风险。
scale = float(self.cfg.contact_v_scale) if contact_mode else 1.0
e_max = self.cfg.tau * (self.cfg.v_max * scale)
lh = np.concatenate([-e_max, -self.cfg.dy_max, -self.cfg.ddy_max])
uh = np.concatenate([+e_max, +self.cfg.dy_max, +self.cfg.ddy_max])
for k in self._h_bound_stages:
self._set_stage_h_bounds(k, lh, uh)
10、MPC 跟踪与硬件约束(在线求解 + 兜底限幅)
整体思路:
MPC 在这里是在线滚动求解器,每个控制周期只执行第一步控制量,然后立刻进入下一轮重规划。这样可以持续吸收最新观测和约束变化。若求解器异常,则立即切换到限幅回退逻辑,保证命令仍然满足硬件边界。
补充说明:
- 正常路径:按预测时域求解,取 u0 下发。
- 接触模式:参考轨迹偏保守,优先稳态。
- 异常路径:位置、dy、ddy、速度边界逐层收紧,确保安全。
代码位置:
- client/executor.py(AcadosPlanner.solve、_rate_limit_fallback)
代码示例 1:在线 MPC 跟踪(client/executor.py)
python
def solve(self, q_hat: np.ndarray, ai_future: np.ndarray, contact_mode: bool = False) -> PlannerOutput:
if not self.initialized:
self.reset(q_hat)
if contact_mode:
# 接触模式优先稳态,参考轨迹退化为当前状态重复。
ai_use = np.repeat(q_hat.reshape(1, -1), self.cfg.N, axis=0)
else:
ai_use = ai_future
# 逐时域注入参考轨迹和跟踪权重。
for k in range(self.cfg.N):
r_k = ai_use[k]
p_k = np.concatenate([r_k, [self.cfg.stage_sqrt_w(k)]])
self.solver.set(k, "p", p_k)
status = int(self.solver.solve())
if status != 0:
# 求解失败时切换到安全兜底策略。
y_cmd = self._rate_limit_fallback(q_hat=q_hat, y_des=q_hat, contact_mode=contact_mode)
return PlannerOutput(y_cmd=y_cmd, alpha=0.0)
# receding horizon,仅执行当前步 u0。
u0 = self.solver.get(0, "u")
y_cmd = np.asarray(u0[0:self.nq]).copy()
self.x_aug = self.solver.get(1, "x").copy()
return PlannerOutput(y_cmd=y_cmd, alpha=(0.0 if contact_mode else 1.0))
代码示例 2:硬件约束兜底(client/executor.py)
python
def _rate_limit_fallback(self, q_hat: np.ndarray, y_des: np.ndarray, contact_mode: bool) -> np.ndarray:
cfg = self.cfg
nq = self.nq
y_prev = self.x_aug[nq : 2 * nq].copy()
y_prev2 = self.x_aug[2 * nq : 3 * nq].copy()
# 先做位置边界限制。
y = np.clip(y_des, cfg.q_min, cfg.q_max)
# 再限制一阶变化率 dy。
dy = np.clip(y - y_prev, -cfg.dy_max, cfg.dy_max)
y = y_prev + dy
# 再限制二阶变化率 ddy(加速度变化)。
d2 = np.clip(y - 2.0 * y_prev + y_prev2, -cfg.ddy_max, cfg.ddy_max)
y = 2.0 * y_prev - y_prev2 + d2
# 最后按速度等效边界 e_max 再收一层。
scale = float(cfg.contact_v_scale) if contact_mode else 1.0
e_max = cfg.tau * (cfg.v_max * scale)
e = np.clip(y - q_hat, -e_max, +e_max)
return np.clip(q_hat + e, cfg.q_min, cfg.q_max)
分享完成~