《VLA 系列》 Realtime-VLA V2 论文复现 | 加速推理 | 代码分析

本文对Realtime-VLA V2 进行复现,记录一下,供大家参考:

开源地址:https://github.com/dexmal/realtime-vla-v2

论文地址:Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

1、下载代码

执行下面指令,拉取代码:

bash 复制代码
git clone https://github.com/dexmal/realtime-vla-v2.git

进入代码目录:

bash 复制代码
cd realtime-vla-v2

2、搭建conda开发环境

执行下面指令,创建开发环境:

bash 复制代码
conda create -n realtime-vla-v2 python=3.10 -y

等待搭建完成后,进入realtime-vla-v2开发环境

bash 复制代码
conda activate realtime-vla-v2

3、安装依赖库

执行下面指令,更新pip的版本:

bash 复制代码
python -m pip install --upgrade pip

然后修改 requirements.txt 文件,将acados_template注释掉,后续通过源码安装

复制代码
numpy
pyyaml
requests
opencv-python
pillow
fastapi
uvicorn
scipy
osqp
transformers
torch
triton
casadi
# acados_template
rerun-sdk
pyrealsense2

执行下面指令,通过requirements.txt 指定安装依赖库:

bash 复制代码
pip install -r requirements.txt -i https://mirrors.cloud.tencent.com/pypi/simple/

4、安装 acados_template

4.1. 克隆 acados 仓库(包含 acados_template)

bash 复制代码
git clone --recursive https://github.com/acados/acados.git
cd acados

4.2. 编译安装 C 核心库

bash 复制代码
mkdir -p build && cd build
cmake .. -DACADOS_WITH_QPOASES=ON -DACADOS_WITH_OSQP=ON
make -j$(nproc)
sudo make install

4.3. 安装 Python 接口(acados_template 在 interfaces/acados_template 目录下)

bash 复制代码
cd ../interfaces/acados_template
pip install .

打印信息

复制代码
Downloading contourpy-1.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Downloading fonttools-4.63.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 90.9 MB/s  0:00:00
Downloading kiwisolver-1.5.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 73.6 MB/s  0:00:00
Downloading pyparsing-3.3.2-py3-none-any.whl (122 kB)
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: acados_template
  Building wheel for acados_template (pyproject.toml) ... done
  Created wheel for acados_template: filename=acados_template-0.5.1-py3-none-any.whl size=424204 sha256=4929e4fc66cb051321e4ac264c76d0b2850a3e08dcfa1e62a6372bc7de57a9eb
  Stored in directory: /tmp/pip-ephem-wheel-cache-nr33iee_/wheels/55/23/3d/aaa0df53ea7235a6723701327f8df34ad0587f16c154f7164b
Successfully built acados_template
Installing collected packages: wrapt, six, pyparsing, kiwisolver, fonttools, cython, cycler, contourpy, python-dateutil, Deprecated, matplotlib, acados_template
Successfully installed Deprecated-1.3.1 acados_template-0.5.1 contourpy-1.3.2 cycler-0.12.1 cython-3.2.5 fonttools-4.63.0 kiwisolver-1.5.0 matplotlib-3.10.9 pyparsing-3.3.2 python-dateutil-2.9.0.post0 six-1.17.0 wrapt-2.2.1

安装完成后,通过pip list能看到 acados_template 依赖库啦

复制代码
Package                Version
---------------------- ------------
acados_template        0.5.1
annotated-doc          0.0.4
annotated-types        0.7.0
anyio                  4.13.0
attrs                  26.1.0
casadi                 3.7.2
certifi                2026.5.20
charset-normalizer     3.4.7
.....

5、运行示例

这里给出 Realtime-VLA V2 的三个常用任务示例:布料折叠、芯片放置、盒子放置。

建议在同一 conda 环境中,分别打开两个终端窗口,按"先启动服务端,再启动客户端"的顺序运行。

如果需要切换任务,只需替换对应的配置文件(config_*.yaml)。

5.1 布料折叠(Cloth Folding)

服务端启动命令:

bash 复制代码
python server/infer_server.py --config server/config_cloth.yaml

客户端启动命令:

bash 复制代码
python client/local_client.py --config client/config_cloth.yaml

5.2 芯片放置(Chip Placement)

服务端启动命令:

bash 复制代码
python server/infer_server.py --config server/config_chip.yaml

客户端启动命令:

bash 复制代码
python client/local_client.py --config client/config_chip.yaml

5.3 盒子放置(Box Placement)

服务端启动命令:

bash 复制代码
python server/infer_server.py --config server/config_box.yaml

客户端启动命令:

bash 复制代码
python client/local_client.py --config client/config_box.yaml

6、实践示例

让机器人抓瓶子,在服务器端新建一个配置文件 server/config_bottle.yaml

前提依赖:

模型推理权重(在realtime-vla中完成加速转换):pi05_droid_finetune_low_mem_converted.pkl
基础权重:tokenizer_path,如果是pi0.5系列的,默认是paligemma-3b-pt-224

css 复制代码
server:
  host: "0.0.0.0"
  port: 8321
  endpoint: "/infer"

model:
  adapter: "openpi_rtc_triton"
  config_name: "pi05_droid"
  checkpoint: "/home/liguopu/lgp_dev/project/realtime-vla/pi05_droid_finetune_low_mem_converted.pkl"
  prompt: "Pick up the bottle."
  adarms_knob: 0
  valid_action_num: 15 # 15
  action_horizon: 15 # 15
  action_type: "joint"
  image_size: [640, 480]
  tokenizer_path: "/home/liguopu/lgp_dev/project/realtime-vla/paligemma-3b-pt-224"
  norm_stats_dir: "/home/liguopu/lgp_dev/project/openpi/checkpoints/pi05_droid_finetune_low_mem/my_experiment/1999/assets/droid"
  discrete_state_input: true
  state_dim: 14
  action_dim: 14
  noise_seed: null

inference:
  optimizer: "timeaxis_smooth"
  timeaxis_dt_ref_s: 0.01
  timeaxis_dt_min_s: 0.008
  timeaxis_dt_max_s: 0.016
  timeaxis_lambda_acc: 10.0
  timeaxis_lambda_time: 1.0
  timeaxis_stride: 15
  timeaxis_optdims: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
  timeaxis_v_max: null
  timeaxis_lambda_v: 10.0
  timeaxis_horizon: 15 # 15
  timeaxis_logging: false

服务端启动命令:

bash 复制代码
python server/infer_server.py --config server/config_bottle.yaml

服务器端,打印信息:

复制代码
Warmup: compiling prefill lengths 1..15
Warmup complete in 0.59s
[infer_server] listening on 0.0.0.0:8321
INFO:     Started server process [3476300]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C to quit)
/home/liguopu/miniconda3/envs/realtime-vla-v2/lib/python3.10/site-packages/osqp/interface.py:229: UserWarning: Converting sparse A to a CSC matrix. This may take a while...
  warnings.warn('Converting sparse A to a CSC matrix. This may take a while...')

在客户端新建一个配置文件 client/config_bottle.yaml

css 复制代码
client:
  infer_url: "http://192.188.xxx.xxx:8321"
  endpoint: "/infer"
  timeout_s: 0.2
  run_duration_s: 72000

observer:
  name: "mock"
  image_size: [640, 480]
  fps: 30
  state_dim: 14
  airbot_host: ""
  airbot_left_port: 0
  airbot_right_port: 0
  top_camera_id: ""
  left_camera_id: ""
  right_camera_id: ""
  enable_cameras: false

executor:
  name: "raw_action"
  enable_init_action: true
  init_action: [0.0, 0.0, 0.0, 1.57, 0.0, -1.57, 0.0, 0.0, 0.0, 0.0, -1.57, -0.0, 1.57, 0.0]
  init_steps: 100
  init_sleep_s: 0.01
  control_dt_s: 0.01
  obs_image_delay_ms: 55.0
  state_delay_s: 0.05
  max_prefill_states: 15
  heartbeat_history_len: 200
  airbot_host: ""
  airbot_left_port: 0
  airbot_right_port: 0
  left_gripper_bias: 0.0
  right_gripper_bias: 0.0
  infer_fixed_dims: [7, 8, 9, 10, 11, 12]
  infer_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57]
  command_fixed_dims: [7, 8, 9, 10, 11, 12, 13]
  command_fixed_values: [0.0, 0.0, 0.0, -1.57, 0.0, 1.57, 0.0]
  action_interval_ms: 10.0
  action_speed_limit_per_s: 10.0
  action_speed_limit_dims: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12]
  enable_servo_interpolation: true
  servo_interval_ms: 10.0
  savgol_window_length: 1
  savgol_polyorder: 3
  forward_track_alpha: 1.0
  forward_track_delay_cnt: 5
  forward_track_lead_s: 0.15

visualization:
  output_dir: "./output_bottle_mock"
  enable_recording: True
  record_videos: True
  record_rerun: True
  max_pending_video_frames: 32

客户端启动命令:

bash 复制代码
python client/local_client.py --config client/config_bottle.yaml

客户端,打印信息:

复制代码
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=12 onlyinfer_s=0.048s
Infer latency=0.050s queue_len=10 onlyinfer_s=0.048s
Infer latency=0.049s queue_len=10 onlyinfer_s=0.048s
......

7、速度自适应模型(执行节拍与延迟对齐)

整体思路:

执行速度并不是靠一个独立"调速器"来控制,而是把"推理耗时、控制周期、动作队列长度"统一映射到同一条时间轴上。服务端先用 prefill 把历史状态对齐到模型输入,客户端再根据延迟预测应当消费到第几步动作,从而避免动作慢半拍。

补充说明:

  • 服务端负责"历史对齐":把历史状态裁剪并送入 prefill。
  • 客户端负责"节拍对齐":根据 latency 预测执行步数,并在队列不足时补齐尾部动作。
  • 两端组合后,动作流会更贴近真实执行时刻,而不是请求发出时刻。

代码位置:

  • server/model.py(infer_actions)
  • client/executor.py(_predict_steps_with_history、prepare_infer_context)

代码示例 1:服务端 prefill 对齐(server/model.py)

python 复制代码
def infer_actions(self, state: dict) -> list:
  state_sequence = self.process_state_sequence_for_model(_extract_state_sequence(state))
  if state_sequence.ndim == 1:
    state_sequence = state_sequence[None, :]

  obs_state = state_sequence[0]
  prefill_len = state_sequence.shape[0]
  if self._action_horizon is not None:
    prefill_len = min(prefill_len, int(self._action_horizon))

  # 把历史状态裁剪到 horizon,作为 prefill 动作输入模型。
  prefill_actions = self._pad_prefill_actions(state_sequence[:prefill_len])

  inputs = {
    **self.inp_images,
    "state": obs_state,
    "prompt": self.prompt,
    "adarms_knob": self._adarms_knob,
  }
  if prefill_len > 0:
    inputs["actions"] = prefill_actions
    inputs["action_prefill_len"] = np.int32(prefill_len)

  result = self._policy.infer(inputs)["actions"]

  # 模型输出后,去掉已经作为 prefill 的前缀,只保留未来动作。
  result = result[min(prefill_len, result.shape[0]):]
  return self.process_actions_for_robot(result).tolist()

代码示例 2:客户端延迟对齐(client/executor.py)

python 复制代码
def _predict_steps_with_history(
  self,
  latency_s: float | None,
  request_time: float | None,
  history: list[dict] | None,
  future_queue: list[list[float]] | None,
  control_dt_s: float,
) -> int:
  if latency_s is None:
    latency_s = 0.0
  control_dt_s = max(1e-6, float(control_dt_s))

  # 无历史时,按延迟/控制周期估算至少前进多少步。
  if request_time is None or not future_queue:
    return max(1, int(math.ceil(float(latency_s) / control_dt_s)))

  target_time = float(request_time) + max(0.0, float(latency_s))
  prev_time = float(history[-1]["timestamp"]) if history else float(request_time)

  future_times: list[float] = []
  for _action in future_queue:
    prev_time = prev_time + control_dt_s
    future_times.append(float(prev_time))

  predicted_idx = None
  for idx, cur_time in enumerate(future_times):
    if cur_time >= target_time:
      predicted_idx = idx
      break

  if predicted_idx is None:
    extra = int(math.ceil((target_time - future_times[-1]) / control_dt_s))
    predicted_idx = len(future_times) - 1 + max(0, extra)

  # 至少向前预测 1 步,避免停在当前步导致动作滞后。
  return int(max(1, predicted_idx))
python 复制代码
def prepare_infer_context(self, latency_s: float, current_state: list[float] | None = None, image_timestamp: float | None = None):
  with self._action_queue_lock:
    queue_snapshot = [list(a) for a in self._future_actions]
    predicted_steps = self._predict_steps_with_history(
      latency_s=latency_s,
      request_time=time.time(),
      history=[{"timestamp": float(h["timestamp"]), "action": list(h["action"])} for h in self._heartbeat_action_history],
      future_queue=queue_snapshot,
      control_dt_s=self.get_control_dt_s(),
    )

    pad_action = list(queue_snapshot[-1]) if queue_snapshot else (list(current_state) if current_state is not None else None)

    # 延迟超过队列长度时,用最后一个动作补齐,保证时间轴连续。
    while len(self._future_actions) <= predicted_steps and pad_action is not None:
      self._future_actions.append(list(pad_action))

8、时间优化和平滑加速度

整体思路:

这部分是"两级平滑"。第一级在优化器目标函数里直接惩罚一阶差分和二阶差分,让轨迹在时间维度上更连续;第二级在实际下发前再做一次局部滤波,进一步抑制短时抖动。

补充说明:

  • dy 约束的是速度突变,ddy 约束的是加速度突变。
  • MPC 平滑偏向"全局一致性",SG 滤波偏向"局部可执行性"。
  • 两层同时开时,通常比只开其中一层更稳定。

代码位置:

  • client/executor.py(AcadosPlanner._build_solver、_savgol_smooth_action)

代码示例 1:MPC 时间平滑项(client/executor.py)

python 复制代码
e_track = sqrt_w_track * (x_q - r_ref)
e_cmd = ca.sqrt(cfg.w_cmd) * (y_cmd - r_ref)
e_yx = ca.sqrt(cfg.w_yx) * (y_cmd - x_q)

# 一阶差分 dy 抑制速度突变,二阶差分 ddy 抑制加速度突变。
dy = y_cmd - y_prev
ddy = y_cmd - 2.0 * y_prev + y_prev2
e_dy = ca.sqrt(cfg.w_dy) * dy
e_ddy = ca.sqrt(cfg.w_ddy) * ddy

y_expr = ca.vertcat(e_track, e_cmd, e_yx, e_dy, e_ddy)

代码示例 2:发送前平滑(client/executor.py)

python 复制代码
def _savgol_smooth_action(
  action: np.ndarray | None,
  history_actions: list[np.ndarray],
  future_actions: list[np.ndarray],
  weights: np.ndarray | None,
  window_length: int,
) -> np.ndarray | None:
  if action is None:
    return None
  action_arr = np.asarray(action, dtype=np.float32).reshape(-1)
  if weights is None or window_length <= 1:
    return action_arr

  # 前后各取半窗,拼成局部窗口,再用 SG 权重卷积输出平滑动作。
  half = window_length // 2
  past = [np.asarray(item, dtype=np.float32).reshape(-1) for item in history_actions][-half:]
  future = [np.asarray(item, dtype=np.float32).reshape(-1) for item in future_actions][:half]

  if len(past) < half:
    pad_val = past[0] if past else action_arr
    past = [pad_val.copy() for _ in range(half - len(past))] + past
  if len(future) < half:
    pad_val = future[-1] if future else action_arr
    future = future + [pad_val.copy() for _ in range(half - len(future))]

  window = np.vstack(past + [action_arr] + future)
  if window.shape[0] != window_length or window.shape[1] != action_arr.shape[0]:
    return action_arr
  return weights @ window

9、空间优化(关节空间边界与变化率约束)

整体思路:

空间优化的核心是"先定可行域,再做跟踪"。系统并不直接放开追踪参考轨迹,而是先把位置、速度、一步变化、两步变化都限制在机械臂可执行范围内,再在这个范围内选最优命令。

补充说明:

  • q_min/q_max 直接限制关节命令上界和下界。
  • e_max 对应速度等效边界,控制短时间内的位移幅度。
  • dy_max、ddy_max 让动作变化更连续,减少机械冲击。

代码位置:

  • client/executor.py(AcadosPlanner._build_solver、_set_h_bounds)

代码示例 1:空间约束建模(client/executor.py)

python 复制代码
# 关节命令上下界,直接限制控制量 u(即 y_cmd)。
ocp.constraints.idxbu = np.arange(nu, dtype=np.int64)
ocp.constraints.lbu = cfg.q_min.copy()
ocp.constraints.ubu = cfg.q_max.copy()

# h 约束统一限制位置误差、一阶变化、二阶变化。
h_expr = ca.vertcat((y_cmd - x_q), (y_cmd - y_prev), (y_cmd - 2.0 * y_prev + y_prev2))
ocp.model.con_h_expr = h_expr

e_max = cfg.tau * cfg.v_max
lh = np.concatenate([-e_max, -cfg.dy_max, -cfg.ddy_max])
uh = np.concatenate([+e_max, +cfg.dy_max, +cfg.ddy_max])
ocp.constraints.lh = lh
ocp.constraints.uh = uh

代码示例 2:接触状态下动态收紧约束(client/executor.py)

python 复制代码
def _set_h_bounds(self, contact_mode: bool):
  # 接触模式下缩小 v_max,有助于降低碰撞和过冲风险。
  scale = float(self.cfg.contact_v_scale) if contact_mode else 1.0
  e_max = self.cfg.tau * (self.cfg.v_max * scale)
  lh = np.concatenate([-e_max, -self.cfg.dy_max, -self.cfg.ddy_max])
  uh = np.concatenate([+e_max, +self.cfg.dy_max, +self.cfg.ddy_max])
  for k in self._h_bound_stages:
    self._set_stage_h_bounds(k, lh, uh)

10、MPC 跟踪与硬件约束(在线求解 + 兜底限幅)

整体思路:

MPC 在这里是在线滚动求解器,每个控制周期只执行第一步控制量,然后立刻进入下一轮重规划。这样可以持续吸收最新观测和约束变化。若求解器异常,则立即切换到限幅回退逻辑,保证命令仍然满足硬件边界。

补充说明:

  • 正常路径:按预测时域求解,取 u0 下发。
  • 接触模式:参考轨迹偏保守,优先稳态。
  • 异常路径:位置、dy、ddy、速度边界逐层收紧,确保安全。

代码位置:

  • client/executor.py(AcadosPlanner.solve、_rate_limit_fallback)

代码示例 1:在线 MPC 跟踪(client/executor.py)

python 复制代码
def solve(self, q_hat: np.ndarray, ai_future: np.ndarray, contact_mode: bool = False) -> PlannerOutput:
  if not self.initialized:
    self.reset(q_hat)

  if contact_mode:
    # 接触模式优先稳态,参考轨迹退化为当前状态重复。
    ai_use = np.repeat(q_hat.reshape(1, -1), self.cfg.N, axis=0)
  else:
    ai_use = ai_future

  # 逐时域注入参考轨迹和跟踪权重。
  for k in range(self.cfg.N):
    r_k = ai_use[k]
    p_k = np.concatenate([r_k, [self.cfg.stage_sqrt_w(k)]])
    self.solver.set(k, "p", p_k)

  status = int(self.solver.solve())
  if status != 0:
    # 求解失败时切换到安全兜底策略。
    y_cmd = self._rate_limit_fallback(q_hat=q_hat, y_des=q_hat, contact_mode=contact_mode)
    return PlannerOutput(y_cmd=y_cmd, alpha=0.0)

  # receding horizon,仅执行当前步 u0。
  u0 = self.solver.get(0, "u")
  y_cmd = np.asarray(u0[0:self.nq]).copy()
  self.x_aug = self.solver.get(1, "x").copy()
  return PlannerOutput(y_cmd=y_cmd, alpha=(0.0 if contact_mode else 1.0))

代码示例 2:硬件约束兜底(client/executor.py)

python 复制代码
def _rate_limit_fallback(self, q_hat: np.ndarray, y_des: np.ndarray, contact_mode: bool) -> np.ndarray:
  cfg = self.cfg
  nq = self.nq
  y_prev = self.x_aug[nq : 2 * nq].copy()
  y_prev2 = self.x_aug[2 * nq : 3 * nq].copy()

  # 先做位置边界限制。
  y = np.clip(y_des, cfg.q_min, cfg.q_max)

  # 再限制一阶变化率 dy。
  dy = np.clip(y - y_prev, -cfg.dy_max, cfg.dy_max)
  y = y_prev + dy

  # 再限制二阶变化率 ddy(加速度变化)。
  d2 = np.clip(y - 2.0 * y_prev + y_prev2, -cfg.ddy_max, cfg.ddy_max)
  y = 2.0 * y_prev - y_prev2 + d2

  # 最后按速度等效边界 e_max 再收一层。
  scale = float(cfg.contact_v_scale) if contact_mode else 1.0
  e_max = cfg.tau * (cfg.v_max * scale)
  e = np.clip(y - q_hat, -e_max, +e_max)
  return np.clip(q_hat + e, cfg.q_min, cfg.q_max)

分享完成~

相关推荐
The moon forgets21 小时前
DreamVLA:世界知识驱动的视觉-语言-动作新范式
人工智能·pytorch·python·深度学习·具身智能·vla
Agilex松灵机器人2 天前
什么是具身智能底盘?4 类主流 AI 机器人底盘选型|VLA/ROS2 项目硬件指南
人工智能·机器人·具身智能·vla·aloha·松灵科研案例
zylyehuo6 天前
论文复现【DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes】
论文复现
一颗小树x7 天前
《VLA 系列》复现 realtime-vla | 加速推理 | Triton后端
加速·推理·vla·realtime-vla
feasibility.7 天前
ROS2+Gazebo+VLM服务:纯仿真环境下的具身智能闭环系统| 大脑-小脑分离控制
人工智能·机器人·ros·仿真·具身智能·vla·vlm
一颗小树x7 天前
《VLA 系列》realtime-vla | 论文解读 加速推理 30Hz+
加速·vla·推理优化·realtime-vla
传说故事14 天前
【论文阅读】MEM: Multi-Scale Embodied Memory for Vision Language Action Models
论文阅读·人工智能·具身智能·vla
小O的算法实验室15 天前
2026年IEEE TEVC,面向城市电缆布线优化的双层多精度搜索框架
论文复现·智能算法