《VLA 系列》 Realtime-VLA V2 论文复现 | 加速推理

本文对Realtime-VLA V2 进行复现，记录一下，供大家参考：

开源地址：https://github.com/dexmal/realtime-vla-v2

论文地址：Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

1、下载代码

执行下面指令，拉取代码：

bash 复制代码

git clone https://github.com/dexmal/realtime-vla-v2.git

进入代码目录：

bash 复制代码

cd realtime-vla-v2

2、搭建conda开发环境

执行下面指令，创建开发环境：

bash 复制代码

conda create -n realtime-vla-v2 python=3.10 -y

等待搭建完成后，进入realtime-vla-v2开发环境

bash 复制代码

conda activate realtime-vla-v2

3、安装依赖库

执行下面指令，更新pip的版本：

bash 复制代码

python -m pip install --upgrade pip

然后修改 requirements.txt 文件，将acados_template注释掉，后续通过源码安装

复制代码

numpy
pyyaml
requests
opencv-python
pillow
fastapi
uvicorn
scipy
osqp
transformers
torch
triton
casadi
# acados_template
rerun-sdk
pyrealsense2

执行下面指令，通过requirements.txt 指定安装依赖库：

bash 复制代码

pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple/

4、安装 acados_template

4.1. 克隆 acados 仓库（包含 acados_template）

bash 复制代码

git clone --recursive https://github.com/acados/acados.git
cd acados

4.2. 编译安装 C 核心库

bash 复制代码

mkdir -p build && cd build
cmake .. -DACADOS_WITH_QPOASES=ON -DACADOS_WITH_OSQP=ON
make -j$(nproc)
sudo make install

4.3. 安装 Python 接口（acados_template 在 interfaces/acados_template 目录下）

bash 复制代码

cd ../interfaces/acados_template
pip install .

打印信息

复制代码

Downloading contourpy-1.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Downloading fonttools-4.63.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 90.9 MB/s  0:00:00
Downloading kiwisolver-1.5.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 73.6 MB/s  0:00:00
Downloading pyparsing-3.3.2-py3-none-any.whl (122 kB)
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: acados_template
  Building wheel for acados_template (pyproject.toml) ... done
  Created wheel for acados_template: filename=acados_template-0.5.1-py3-none-any.whl size=424204 sha256=4929e4fc66cb051321e4ac264c76d0b2850a3e08dcfa1e62a6372bc7de57a9eb
  Stored in directory: /tmp/pip-ephem-wheel-cache-nr33iee_/wheels/55/23/3d/aaa0df53ea7235a6723701327f8df34ad0587f16c154f7164b
Successfully built acados_template
Installing collected packages: wrapt, six, pyparsing, kiwisolver, fonttools, cython, cycler, contourpy, python-dateutil, Deprecated, matplotlib, acados_template
Successfully installed Deprecated-1.3.1 acados_template-0.5.1 contourpy-1.3.2 cycler-0.12.1 cython-3.2.5 fonttools-4.63.0 kiwisolver-1.5.0 matplotlib-3.10.9 pyparsing-3.3.2 python-dateutil-2.9.0.post0 six-1.17.0 wrapt-2.2.1

安装完成后，通过pip list能看到 acados_template 依赖库啦

复制代码

Package                Version
---------------------- ------------
acados_template        0.5.1
annotated-doc          0.0.4
annotated-types        0.7.0
anyio                  4.13.0
attrs                  26.1.0
casadi                 3.7.2
certifi                2026.5.20
charset-normalizer     3.4.7
.....

5、运行示例

这里给出 Realtime-VLA V2 的三个常用任务示例：布料折叠、芯片放置、盒子放置。

建议在同一 conda 环境中，分别打开两个终端窗口，按"先启动服务端，再启动客户端"的顺序运行。

如果需要切换任务，只需替换对应的配置文件（config_*.yaml）。

5.1 布料折叠（Cloth Folding）

服务端启动命令：

bash 复制代码

python server/infer_server.py --config server/config_cloth.yaml

客户端启动命令：

bash 复制代码

python client/local_client.py --config client/config_cloth.yaml

5.2 芯片放置（Chip Placement）

服务端启动命令：

bash 复制代码

python server/infer_server.py --config server/config_chip.yaml

客户端启动命令：

bash 复制代码

python client/local_client.py --config client/config_chip.yaml

5.3 盒子放置（Box Placement）

服务端启动命令：

bash 复制代码

python server/infer_server.py --config server/config_box.yaml

客户端启动命令：

bash 复制代码

python client/local_client.py --config client/config_box.yaml

6、实践示例

让机器人抓瓶子，在服务器端新建一个配置文件 server/config_bottle.yaml

前提依赖：

模型推理权重（在realtime-vla中完成加速转换）：pi05_droid_finetune_low_mem_converted.pkl
基础权重：tokenizer_path，如果是pi0.5系列的，默认是paligemma-3b-pt-224

css 复制代码

server:
  host: "0.0.0.0"
  port: 8321
  endpoint: "/infer"

model:
  adapter: "openpi_rtc_triton"
  config_name: "pi05_droid"
  checkpoint: "/home/liguopu/lgp_dev/project/realtime-vla/pi05_droid_finetune_low_mem_converted.pkl"
  prompt: "Pick up the bottle."
  adarms_knob: 0
  valid_action_num: 15 # 15
  action_horizon: 15 # 15
  action_type: "joint"
  image_size: [640, 480]
  tokenizer_path: "/home/liguopu/lgp_dev/project/realtime-vla/paligemma-3b-pt-224"
  norm_stats_dir: "/home/liguopu/lgp_dev/project/openpi/checkpoints/pi05_droid_finetune_low_mem/my_experiment/1999/assets/droid"
  discrete_state_input: true
  state_dim: 14
  action_dim: 14
  noise_seed: null

inference:
  optimizer: "timeaxis_smooth"
  timeaxis_dt_ref_s: 0.01
  timeaxis_dt_min_s: 0.008
  timeaxis_dt_max_s: 0.016
  timeaxis_lambda_acc: 10.0
  timeaxis_lambda_time: 1.0
  timeaxis_stride: 15
  timeaxis_optdims: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
  timeaxis_v_max: null
  timeaxis_lambda_v: 10.0
  timeaxis_horizon: 15 # 15
  timeaxis_logging: false

服务端启动命令：

bash 复制代码

python server/infer_server.py --config server/config_bottle.yaml

服务器端，打印信息：

复制代码

Warmup: compiling prefill lengths 1..15
Warmup complete in 0.59s
[infer_server] listening on 0.0.0.0:8321
INFO:     Started server process [3476300]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C to quit)
/home/liguopu/miniconda3/envs/realtime-vla-v2/lib/python3.10/site-packages/osqp/interface.py:229: UserWarning: Converting sparse A to a CSC matrix. This may take a while...
  warnings.warn('Converting sparse A to a CSC matrix. This may take a while...')

在客户端新建一个配置文件 client/config_bottle.yaml

css 复制代码

client:
  infer_url: "http://192.188.xxx.xxx:8321"
  endpoint: "/infer"
  timeout_s: 0.2
  run_duration_s: 72000

observer:
  name: "mock"
  image_size: [640, 480]
  fps: 30
  state_dim: 14
  airbot_host: ""
  airbot_left_port: 0
  airbot_right_port: 0
  top_camera_id: ""
  left_camera_id: ""
  right_camera_id: ""
  enable_cameras: false

executor:
  name: "raw_action"
  enable_init_action: true
  init_action: [0.0, 0.0, 0.0, 1.57, 0.0, -1.57, 0.0, 0.0, 0.0, 0.0, -1.57, -0.0, 1.57, 0.0]
  init_steps: 100
  init_sleep_s: 0.01
  control_dt_s: 0.01
  obs_image_delay_ms: 55.0
  state_delay_s: 0.05
  max_prefill_states: 15
  heartbeat_history_len: 200
  airbot_host: ""
  airbot_left_port: 0
  airbot_right_port: 0
  left_gripper_bias: 0.0
  right_gripper_bias: 0.0
  infer_fixed_dims: [7, 8, 9, 10, 11, 12]
  infer_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57]
  command_fixed_dims: [7, 8, 9, 10, 11, 12, 13]
  command_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57, 0.0]
  action_interval_ms: 10.0
  action_speed_limit_per_s: 10.0
  action_speed_limit_dims: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12]
  enable_servo_interpolation: true
  servo_interval_ms: 10.0
  savgol_window_length: 1
  savgol_polyorder: 3
  forward_track_alpha: 1.0
  forward_track_delay_cnt: 5
  forward_track_lead_s: 0.15

visualization:
  output_dir: "./output_bottle_mock"
  enable_recording: True
  record_videos: True
  record_rerun: True
  max_pending_video_frames: 32

客户端启动命令：

bash 复制代码

python client/local_client.py --config client/config_bottle.yaml

客户端，打印信息：

复制代码

Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=10 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=10 onlyinfer_s=0.048s
......

7、速度自适应模型（执行节拍与延迟对齐）

整体思路：

执行速度并不是靠一个独立"调速器"来控制，而是把"推理耗时、控制周期、动作队列长度"统一映射到同一条时间轴上。服务端先用 prefill 把历史状态对齐到模型输入，客户端再根据延迟预测应当消费到第几步动作，从而避免动作慢半拍。

补充说明：

服务端负责"历史对齐"：把历史状态裁剪并送入 prefill。
客户端负责"节拍对齐"：根据 latency 预测执行步数，并在队列不足时补齐尾部动作。
两端组合后，动作流会更贴近真实执行时刻，而不是请求发出时刻。

代码位置：

server/model.py（infer_actions）
client/executor.py（_predict_steps_with_history、prepare_infer_context）

代码示例 1：服务端 prefill 对齐（server/model.py）

python 复制代码

def infer_actions(self, state: dict) -> list:
  state_sequence = self.process_state_sequence_for_model(_extract_state_sequence(state))
  if state_sequence.ndim == 1:
    state_sequence = state_sequence[None, :]

  obs_state = state_sequence[0]
  prefill_len = state_sequence.shape[0]
  if self._action_horizon is not None:
    prefill_len = min(prefill_len, int(self._action_horizon))

  # 把历史状态裁剪到 horizon，作为 prefill 动作输入模型。
  prefill_actions = self._pad_prefill_actions(state_sequence[:prefill_len])

  inputs = {
    **self.inp_images,
    "state": obs_state,
    "prompt": self.prompt,
    "adarms_knob": self._adarms_knob,
  }
  if prefill_len > 0:
    inputs["actions"] = prefill_actions
    inputs["action_prefill_len"] = np.int32(prefill_len)

  result = self._policy.infer(inputs)["actions"]

  # 模型输出后，去掉已经作为 prefill 的前缀，只保留未来动作。
  result = result[min(prefill_len, result.shape[0]):]
  return self.process_actions_for_robot(result).tolist()

代码示例 2：客户端延迟对齐（client/executor.py）

python 复制代码

def _predict_steps_with_history(
  self,
  latency_s: float | None,
  request_time: float | None,
  history: list[dict] | None,
  future_queue: list[list[float]] | None,
  control_dt_s: float,
) -> int:
  if latency_s is None:
    latency_s = 0.0
  control_dt_s = max(1e-6, float(control_dt_s))

  # 无历史时，按延迟/控制周期估算至少前进多少步。
  if request_time is None or not future_queue:
    return max(1, int(math.ceil(float(latency_s) / control_dt_s)))

  target_time = float(request_time) + max(0.0, float(latency_s))
  prev_time = float(history[-1]["timestamp"]) if history else float(request_time)

  future_times: list[float] = []
  for _action in future_queue:
    prev_time = prev_time + control_dt_s
    future_times.append(float(prev_time))

  predicted_idx = None
  for idx, cur_time in enumerate(future_times):
    if cur_time >= target_time:
      predicted_idx = idx
      break

  if predicted_idx is None:
    extra = int(math.ceil((target_time - future_times[-1]) / control_dt_s))
    predicted_idx = len(future_times) - 1 + max(0, extra)

  # 至少向前预测 1 步，避免停在当前步导致动作滞后。
  return int(max(1, predicted_idx))

python 复制代码

def prepare_infer_context(self, latency_s: float, current_state: list[float] | None = None, image_timestamp: float | None = None):
  with self._action_queue_lock:
    queue_snapshot = [list(a) for a in self._future_actions]
    predicted_steps = self._predict_steps_with_history(
      latency_s=latency_s,
      request_time=time.time(),
      history=[{"timestamp": float(h["timestamp"]), "action": list(h["action"])} for h in self._heartbeat_action_history],
      future_queue=queue_snapshot,
      control_dt_s=self.get_control_dt_s(),
    )

    pad_action = list(queue_snapshot[-1]) if queue_snapshot else (list(current_state) if current_state is not None else None)

    # 延迟超过队列长度时，用最后一个动作补齐，保证时间轴连续。
    while len(self._future_actions) <= predicted_steps and pad_action is not None:
      self._future_actions.append(list(pad_action))

8、时间优化和平滑加速度

整体思路：

这部分是"两级平滑"。第一级在优化器目标函数里直接惩罚一阶差分和二阶差分，让轨迹在时间维度上更连续；第二级在实际下发前再做一次局部滤波，进一步抑制短时抖动。

补充说明：

dy 约束的是速度突变，ddy 约束的是加速度突变。
MPC 平滑偏向"全局一致性"，SG 滤波偏向"局部可执行性"。
两层同时开时，通常比只开其中一层更稳定。

代码位置：

client/executor.py（AcadosPlanner._build_solver、_savgol_smooth_action）

代码示例 1：MPC 时间平滑项（client/executor.py）

python 复制代码

e_track = sqrt_w_track * (x_q - r_ref)
e_cmd = ca.sqrt(cfg.w_cmd) * (y_cmd - r_ref)
e_yx = ca.sqrt(cfg.w_yx) * (y_cmd - x_q)

# 一阶差分 dy 抑制速度突变，二阶差分 ddy 抑制加速度突变。
dy = y_cmd - y_prev
ddy = y_cmd - 2.0 * y_prev + y_prev2
e_dy = ca.sqrt(cfg.w_dy) * dy
e_ddy = ca.sqrt(cfg.w_ddy) * ddy

y_expr = ca.vertcat(e_track, e_cmd, e_yx, e_dy, e_ddy)

代码示例 2：发送前平滑（client/executor.py）

python 复制代码

def _savgol_smooth_action(
  action: np.ndarray | None,
  history_actions: list[np.ndarray],
  future_actions: list[np.ndarray],
  weights: np.ndarray | None,
  window_length: int,
) -> np.ndarray | None:
  if action is None:
    return None
  action_arr = np.asarray(action, dtype=np.float32).reshape(-1)
  if weights is None or window_length <= 1:
    return action_arr

  # 前后各取半窗，拼成局部窗口，再用 SG 权重卷积输出平滑动作。
  half = window_length // 2
  past = [np.asarray(item, dtype=np.float32).reshape(-1) for item in history_actions][-half:]
  future = [np.asarray(item, dtype=np.float32).reshape(-1) for item in future_actions][:half]

  if len(past) < half:
    pad_val = past[0] if past else action_arr
    past = [pad_val.copy() for _ in range(half - len(past))] + past
  if len(future) < half:
    pad_val = future[-1] if future else action_arr
    future = future + [pad_val.copy() for _ in range(half - len(future))]

  window = np.vstack(past + [action_arr] + future)
  if window.shape[0] != window_length or window.shape[1] != action_arr.shape[0]:
    return action_arr
  return weights @ window

9、空间优化（关节空间边界与变化率约束）

整体思路：

空间优化的核心是"先定可行域，再做跟踪"。系统并不直接放开追踪参考轨迹，而是先把位置、速度、一步变化、两步变化都限制在机械臂可执行范围内，再在这个范围内选最优命令。

补充说明：

q_min/q_max 直接限制关节命令上界和下界。
e_max 对应速度等效边界，控制短时间内的位移幅度。
dy_max、ddy_max 让动作变化更连续，减少机械冲击。

代码位置：

client/executor.py（AcadosPlanner._build_solver、_set_h_bounds）

代码示例 1：空间约束建模（client/executor.py）

python 复制代码

# 关节命令上下界，直接限制控制量 u（即 y_cmd）。
ocp.constraints.idxbu = np.arange(nu, dtype=np.int64)
ocp.constraints.lbu = cfg.q_min.copy()
ocp.constraints.ubu = cfg.q_max.copy()

# h 约束统一限制位置误差、一阶变化、二阶变化。
h_expr = ca.vertcat((y_cmd - x_q), (y_cmd - y_prev), (y_cmd - 2.0 * y_prev + y_prev2))
ocp.model.con_h_expr = h_expr

e_max = cfg.tau * cfg.v_max
lh = np.concatenate([-e_max, -cfg.dy_max, -cfg.ddy_max])
uh = np.concatenate([+e_max, +cfg.dy_max, +cfg.ddy_max])
ocp.constraints.lh = lh
ocp.constraints.uh = uh

代码示例 2：接触状态下动态收紧约束（client/executor.py）

python 复制代码

def _set_h_bounds(self, contact_mode: bool):
  # 接触模式下缩小 v_max，有助于降低碰撞和过冲风险。
  scale = float(self.cfg.contact_v_scale) if contact_mode else 1.0
  e_max = self.cfg.tau * (self.cfg.v_max * scale)
  lh = np.concatenate([-e_max, -self.cfg.dy_max, -self.cfg.ddy_max])
  uh = np.concatenate([+e_max, +self.cfg.dy_max, +self.cfg.ddy_max])
  for k in self._h_bound_stages:
    self._set_stage_h_bounds(k, lh, uh)

10、MPC 跟踪与硬件约束（在线求解 + 兜底限幅）

整体思路：

MPC 在这里是在线滚动求解器，每个控制周期只执行第一步控制量，然后立刻进入下一轮重规划。这样可以持续吸收最新观测和约束变化。若求解器异常，则立即切换到限幅回退逻辑，保证命令仍然满足硬件边界。

补充说明：

正常路径：按预测时域求解，取 u0 下发。
接触模式：参考轨迹偏保守，优先稳态。
异常路径：位置、dy、ddy、速度边界逐层收紧，确保安全。

代码位置：

client/executor.py（AcadosPlanner.solve、_rate_limit_fallback）

代码示例 1：在线 MPC 跟踪（client/executor.py）

python 复制代码

def solve(self, q_hat: np.ndarray, ai_future: np.ndarray, contact_mode: bool = False) -> PlannerOutput:
  if not self.initialized:
    self.reset(q_hat)

  if contact_mode:
    # 接触模式优先稳态，参考轨迹退化为当前状态重复。
    ai_use = np.repeat(q_hat.reshape(1, -1), self.cfg.N, axis=0)
  else:
    ai_use = ai_future

  # 逐时域注入参考轨迹和跟踪权重。
  for k in range(self.cfg.N):
    r_k = ai_use[k]
    p_k = np.concatenate([r_k, [self.cfg.stage_sqrt_w(k)]])
    self.solver.set(k, "p", p_k)

  status = int(self.solver.solve())
  if status != 0:
    # 求解失败时切换到安全兜底策略。
    y_cmd = self._rate_limit_fallback(q_hat=q_hat, y_des=q_hat, contact_mode=contact_mode)
    return PlannerOutput(y_cmd=y_cmd, alpha=0.0)

  # receding horizon，仅执行当前步 u0。
  u0 = self.solver.get(0, "u")
  y_cmd = np.asarray(u0[0:self.nq]).copy()
  self.x_aug = self.solver.get(1, "x").copy()
  return PlannerOutput(y_cmd=y_cmd, alpha=(0.0 if contact_mode else 1.0))

代码示例 2：硬件约束兜底（client/executor.py）

python 复制代码

def _rate_limit_fallback(self, q_hat: np.ndarray, y_des: np.ndarray, contact_mode: bool) -> np.ndarray:
  cfg = self.cfg
  nq = self.nq
  y_prev = self.x_aug[nq : 2 * nq].copy()
  y_prev2 = self.x_aug[2 * nq : 3 * nq].copy()

  # 先做位置边界限制。
  y = np.clip(y_des, cfg.q_min, cfg.q_max)

  # 再限制一阶变化率 dy。
  dy = np.clip(y - y_prev, -cfg.dy_max, cfg.dy_max)
  y = y_prev + dy

  # 再限制二阶变化率 ddy（加速度变化）。
  d2 = np.clip(y - 2.0 * y_prev + y_prev2, -cfg.ddy_max, cfg.ddy_max)
  y = 2.0 * y_prev - y_prev2 + d2

  # 最后按速度等效边界 e_max 再收一层。
  scale = float(cfg.contact_v_scale) if contact_mode else 1.0
  e_max = cfg.tau * (cfg.v_max * scale)
  e = np.clip(y - q_hat, -e_max, +e_max)
  return np.clip(q_hat + e, cfg.q_min, cfg.q_max)

分享完成~

《VLA 系列》 Realtime-VLA V2 论文复现 | 加速推理 | 代码分析

1、下载代码

2、搭建conda开发环境

3、安装依赖库

4、安装 acados_template

4.1. 克隆 acados 仓库（包含 acados_template）

4.2. 编译安装 C 核心库

4.3. 安装 Python 接口（acados_template 在 interfaces/acados_template 目录下）

5、运行示例

5.1 布料折叠（Cloth Folding）

5.2 芯片放置（Chip Placement）

5.3 盒子放置（Box Placement）

6、实践示例

7、速度自适应模型（执行节拍与延迟对齐）

8、时间优化和平滑加速度

9、空间优化（关节空间边界与变化率约束）

10、MPC 跟踪与硬件约束（在线求解 + 兜底限幅）