用 Remotion + edge-tts 打造中文教学视频全自动流水线

本文介绍一套从文案到 MP4 的完整自动化方案，核心思想是音频优先、实测驱动、单轨合并、并行渲染。适用于教程视频、科普解说、产品演示等带旁白的场景。

一、为什么需要这套方案？

AI 时代，视频内容的产出需求激增。但传统视频制作流程（写脚本 → 录音 → 剪辑 → 加字幕 → 导出）周期长、成本高，难以规模化。

本方案解决的问题：

痛点	传统方式	本方案
配音	真人录制，成本高	edge-tts 自动生成，零成本
时长对齐	手动对轨，误差大	实测音频时长，帧级精确
场景切换	逐段剪辑，效率低	Remotion 代码编排，复用组件
批量生产	难以复制	改文案即可重新生成

适用场景：

编程/技术教程视频
科普/解说类视频
带旁白的产品演示
个性化批量视频（如节日祝福、通知播报）

不适用场景：

纯音乐/无旁白视频
需要真人情感配音的项目
实时直播或交互式视频

二、核心架构：单轨音频模式

这是整套方案最重要的设计决策。

❌ 错误做法：逐场景音频

tsx 复制代码

// 每个 Sequence 内放独立 Audio 组件
<Series.Sequence durationInFrames={200}>
  <Audio src={staticFile("audio/ch1.mp3")} />  {/* 危险！ */}
  <Ch1Visual />
</Series.Sequence>

问题：

场景切换时音频被截断（Remotion 的 Audio 组件在 Sequence 边界处会切断）
每个场景需要额外缓冲帧，总时长难以精确控制
场景间可能有静音间隙或重叠

✅ 正确做法：顶层单轨音频

tsx 复制代码

// HpcTutorial.tsx --- 顶层结构
<AbsoluteFill style={{ backgroundColor: "#1a1a2e" }}>
  {/* 单条音轨贯穿全程 */}
  <Audio src={staticFile("audio/combined.mp3")} volume={0.85} />

  <Series>
    <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
      <TitleVisual />    {/* 纯视觉，不含 Audio */}
    </Series.Sequence>
    <Series.Sequence durationInFrames={SCENE_DURATIONS.ch1_what}>
      <Ch1WhatVisual />
    </Series.Sequence>
    {/* ... 更多场景 ... */}
  </Series>
</AbsoluteFill>

优势：

音频连续播放，不会被场景切换截断
场景时长 = 音频实际秒数 × FPS（精确匹配）
无需缓冲帧估算

三、六步完整工作流

Step 1：准备文案

按章节拆分文案，每个章节对应一个场景。

python 复制代码

NARRATIONS = {
    "00_title": ("欢迎来到Remotion教学视频制作教程！今天我们将学习如何..."),
    "01_ch1_intro": ("第一章：什么是Remotion？Remotion是一个用React编写视频的框架..."),
    "01_ch1_content": ("Remotion的核心优势在于可以用代码精确控制每一帧..."),
    # ... 更多章节
}

原则：

每段控制在 5~30 秒（太长拆分，太短合并）
中文推荐语速约 180~220 字/分钟
用 edge-tts 的 zh-CN-XiaoxiaoNeural 音色效果最好

Step 2：生成 TTS 音频

使用 edge-tts 库批量生成：

python 复制代码

import asyncio
from edge_tts import Communicate

VOICE = "zh-CN-XiaoxiaoNeural"  # 温暖专业女声
OUTPUT_DIR = "public/audio"

async def generate(name, text):
    output_path = f"{OUTPUT_DIR}/{name}.mp3"
    communicate = Communicate(text, VOICE)
    await communicate.save(output_path)
    return output_path

async def main():
    for name, text in NARRATIONS.items():
        path = await generate(name, text)
        print(f"  ✅ {name}: 已生成")

asyncio.run(main())

依赖安装：

bash 复制代码

pip install edge-tts tinytag imageio_ffmpeg

Step 3：测量实际音频时长（关键！）

❌ 绝对禁止的手动计算

python 复制代码

# 错误！不要这样做！
duration = len(audio_data) / sample_rate  # 只适用于 PCM 原始数据
duration = file_size / 2000               # 粗略且不准

✅ 正确方式：用专业库读取元数据

方案 A：tinytag（推荐，纯 Python）

python 复制代码

from tinytag import TinyTag

def get_duration(filepath):
    tag = TinyTag.get(filepath)
    return tag.duration  # 返回秒数（float）

方案 B：ffmpeg（最精确）

python 复制代码

import subprocess
from imageio_ffmpeg import get_ffmpeg_exe

FFMPEG = get_ffmpeg_exe()

def get_duration_ffmpeg(filepath):
    result = subprocess.run(
        [FFMPEG, "-i", filepath],
        capture_output=True, text=True
    )
    # 从 stderr 解析 Duration: 00:00:12.34
    import re
    match = re.search(r'Duration: (\d+):(\d+):(\d+\.\d+)', result.stderr)
    h, m, s = int(match.group(1)), int(match.group(2)), float(match.group(3))
    return h * 3600 + m * 60 + s

⚠️ Windows 注意事项 ：ffmpeg 通常不在系统 PATH 中，必须使用 imageio_ffmpeg.get_ffmpeg_exe() 获取路径。

Step 4：合并音频为单轨 + 生成时间线

python 复制代码

import json, os, subprocess
from tinytag import TinyTag
from imageio_ffmpeg import get_ffmpeg_exe

AUDIO_DIR = "public/audio"
OUTPUT_COMBINED = os.path.join(AUDIO_DIR, "combined.mp3")
OUTPUT_TIMELINE = os.path.join(AUDIO_DIR, "timeline.json")
FPS = 30
FFMPEG = get_ffmpeg_exe()

# 定义场景顺序（必须与 Remotion Series.Sequence 顺序一致！）
SCENE_ORDER = [
    ("title",       "00_title"),
    ("ch1_intro",   "01_ch1_intro"),
    ("ch1_what",    "01_ch1_what"),
    # ... 完整列表
    ("ending",      "99_ending"),
]

def main():
    segments = []
    current_start = 0.0
    concat_list_path = os.path.join(AUDIO_DIR, "concat_list.txt")

    with open(concat_list_path, "w") as f:
        for scene_name, audio_file in SCENE_ORDER:
            audio_path = os.path.join(AUDIO_DIR, f"{audio_file}.mp3")
            duration = TinyTag.get(audio_path).duration
            duration_in_frames = round(duration * FPS)

            segments.append({
                "sceneName": scene_name,
                "audioFile": audio_file,
                "startSeconds": round(current_start, 3),
                "endSeconds": round(current_start + duration, 3),
                "durationInSeconds": round(duration, 3),
                "durationInFrames": duration_in_frames,
            })

            f.write(f"file '{os.path.abspath(audio_path)}'\n")
            current_start += duration

    # ffmpeg concat 合并
    subprocess.run([
        FFMPEG, "-y", "-f", "concat", "-safe", "0",
        "-i", concat_list_path,
        "-c", "copy", OUTPUT_COMBINED
    ], check=True)

    timeline = {
        "fps": FPS,
        "totalFrames": round(current_start * FPS),
        "totalDurationSeconds": round(current_start, 3),
        "segments": segments,
    }

    with open(OUTPUT_TIMELINE, "w") as f:
        json.dump(timeline, f, ensure_ascii=False, indent=2)

    print(f"✅ combined.mp3: {current_start:.1f}s")
    print(f"✅ timeline.json: {timeline['totalFrames']} frames @ {FPS}fps")

if __name__ == "__main__":
    main()

Step 5：从时间线生成 durations.ts

javascript 复制代码

// generate-durations-from-timeline.cjs
const fs = require("fs");
const path = require("path");

const timeline = JSON.parse(
  fs.readFileSync(path.join(__dirname, "public/audio/timeline.json"), "utf-8")
);

const lines = timeline.segments.map(seg =>
  `  ${seg.sceneName}: ${seg.durationInFrames}, // "${seg.audioFile}" ${seg.durationInSeconds}s`
);
lines.unshift("export const SCENE_DURATIONS = {");
lines.push("} as const;");
lines.push("");
lines.push(`export const TOTAL_FRAMES = ${timeline.totalFrames};`);
lines.push("");
lines.push(`export const SCENE_ORDER = [${timeline.segments.map(s => `"${s.sceneName}"`).join(", ")}] as const;`);

fs.writeFileSync(path.join(__dirname, "src/durations.ts"), lines.join("\n"));
console.log(`✅ durations.ts: ${timeline.totalFrames} frames`);

运行：

bash 复制代码

node generate-durations-from-timeline.cjs

Step 6：高性能渲染

javascript 复制代码

// render-fast.cjs
const { bundle } = require("@remotion/bundler");
const { renderMedia, selectComposition } = require("@remotion/renderer");
const path = require("path");
const os = require("os");
const { execSync } = require("child_process");

const CHROME_PATH = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe";
const COMPOSITION_ID = "YourCompositionId";
const ENTRY_POINT = path.resolve(__dirname, "src/index.ts");
const OUTPUT_PATH = path.resolve(__dirname, "output/video.mp4");

// 6核CPU建议并发3~4
const CONCURRENCY = process.env.REMOTION_CONCURRENCY 
  ? parseInt(process.env.REMOTION_CONCURRENCY) : 3;

// 自动检测 GPU
function checkGPU() {
  try {
    if (os.platform() !== "win32") return { hasGPU: false, isAMD: false };
    const info = execSync("wmic path win32_videocontroller get name /value", { encoding: "utf8" });
    const hasGPU = /(AMD|Radeon|NVIDIA|Intel)/i.test(info);
    const isAMD = /(R7 200|AMD|Radeon)/i.test(info);
    return { hasGPU, isAMD };
  } catch (e) {
    return { hasGPU: false, isAMD: false };
  }
}

async function main() {
  const gpuInfo = checkGPU();

  // 1. Bundle
  const bundled = await bundle({ entryPoint: ENTRY_POINT });

  // 2. 选择 Composition
  const composition = await selectComposition({
    serveUrl: bundled,
    id: COMPOSITION_ID,
    browserExecutable: CHROME_PATH,
    chromiumOptions: {
      args: gpuInfo.hasGPU 
        ? ["--no-sandbox", "--disable-dev-shm-usage"] 
        : ["--no-sandbox", "--disable-gpu"],
    }
  });

  // 3. 渲染
  let lastPercent = -1;
  await renderMedia({
    composition,
    serveUrl: bundled,
    codec: "h264",
    outputLocation: OUTPUT_PATH,
    concurrency: CONCURRENCY,
    browserExecutable: CHROME_PATH,
    chromiumOptions: {
      args: gpuInfo.isAMD 
        ? ["--no-sandbox", "--disable-dev-shm-usage", "--ignore-gpu-blocklist", "--enable-gpu-rasterization", "--enable-webgl", "--disable-software-rasterizer"]
        : gpuInfo.hasGPU 
          ? ["--no-sandbox", "--disable-dev-shm-usage", "--ignore-gpu-blocklist", "--enable-gpu-rasterization"]
          : ["--no-sandbox", "--disable-gpu"],
    },
    onProgress: ({ progress }) => {
      const currentPercent = Math.floor(progress * 100);
      if (currentPercent > lastPercent && currentPercent % 5 === 0) {
        console.log(`  Render: ${currentPercent}%`);
        lastPercent = currentPercent;
      }
    },
  });
  console.log(`✅ 渲染完成：${OUTPUT_PATH}`);
}

main().catch(err => console.error("❌ 渲染失败：", err));

渲染性能参数选择：

硬件环境	concurrency	Node 内存	GPU 模式	预期速度
4 核 / 8GB RAM	2	2GB	自动检测	~15 min / 4min视频
6 核 / 16GB RAM	3~4	2GB	自动检测	₁₀15 min
8+ 核 / 32GB RAM	5~8	2GB	自动检测	₈12 min
Docker / 低内存	1~2	1GB	禁用(防崩溃)	₂₅40 min

启动命令：

bash 复制代码

# 默认配置
node --max-old-space-size=2048 render-fast.cjs

# 自定义并发数
REMOTION_CONCURRENCY=6 node --max-old-space-size=2048 render-fast.cjs

⚠️ Node.js 内存不要设太大（不要超过 4GB）。Node 本身是调度器，真正吃内存的是 Chrome 并行实例。

四、常见陷阱与解决方案

陷阱 1：视频时长远短于预期

症状：代码设定 260 秒，实际视频只有 60~120 秒。

排查清单：

是否用了旧的 Composition ID？
渲染是否被超时中断？（前台模式默认 10 分钟限制）
durations.ts 是否与当前音频文件同步？
是否有残留的旧视频文件覆盖了新输出？

解决：

bash 复制代码

rm -rf build .remotion
node script > log.txt &   # 后台运行

陷阱 2：音频在场景切换处被截断

根因：使用了逐场景 <Audio> 组件。

解决：改用单轨音频模式（见第二节）。

陷阱 3：ffmpeg 找不到（Windows）

解决：

python 复制代码

from imageio_ffmpeg import get_ffmpeg_exe
FFMPEG = get_ffmpeg_exe()
# 返回类似：C:\Python\Python312\Lib\site-packages\imageio_ffmpeg\binaries\ffmpeg-win-x86_64-v7.1.exe

陷阱 4：Chrome Headless 下载失败（国内网络）

解决：指定本地已安装的 Chrome：

javascript 复制代码

const CHROME_PATH = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe";
// 在 bundle() 和 renderMedia 中都加：{ browserExecutable: CHROME_PATH }

陷阱 5：老显卡 CPU 满载而 GPU 闲置

根因：Chrome 维护 GPU 黑名单，老旧 AMD 显卡默认被拉黑。

解决：在 chromiumOptions.args 中加入：

javascript 复制代码

["--ignore-gpu-blocklist", "--enable-gpu-rasterization", 
 "--enable-webgl", "--disable-software-rasterizer"]

陷阱 6：Remotion API 使用错误

根因：使用了 chromeLaunchOptions（Puppeteer 写法），Remotion 4.x 不认此字段。

解决：

javascript 复制代码

// ❌ 错误
chromeLaunchOptions: { args: [...] }
// ✅ 正确
chromiumOptions: { args: [...] }

五、文件组织规范

复制代码

project-root/
├── public/
│   └── audio/
│       ├── 00_title.mp3          # 各场景音频
│       ├── 01_ch1_intro.mp3
│       ├── ...
│       ├── combined.mp3          # 合并后的单轨音频
│       └── timeline.json         # 时间线数据
├── src/
│   ├── Root.tsx                  # Composition 注册
│   ├── durations.ts              # 场景时长常量（自动生成）
│   ├── HpcTutorial.tsx           # 主组件（单轨 Audio + Series）
│   └── scenes/
│       ├── TitleScene.tsx        # 各场景视觉组件
│       ├── Ch1IntroScene.tsx
│       └── ...
├── generate-audio.py             # Step 2: TTS 音频生成
├── merge-audio.py                # Step 4: 音频合并 + 时间线
├── generate-durations-from-timeline.cjs  # Step 5: 生成 durations.ts
└── render-fast.cjs               # Step 6: 高性能渲染

六、一键重新渲染流程

当修改了视觉组件或文案后：

bash 复制代码

# 1. 如文案变了：重新生成音频
python generate-audio.py

# 2. 更新时间线和 durations
python merge-audio.py
node generate-durations-from-timeline.cjs

# 3. 清理缓存
rm -rf build .remotion

# 4. 渲染（后台运行并写日志）
node --max-old-space-size=2048 render-fast.cjs > /tmp/render-log.txt 2>&1 &

# 5. 监控进度
tail -f /tmp/render-log.txt

# 6. 验证结果
ffmpeg -i output/video.mp4 2>&1 | grep Duration

七、总结

本文介绍了一套完整的 Remotion + edge-tts 教学视频生成流水线，核心要点：

单轨音频模式：顶层放一个 Audio 组件，场景只负责视觉
实测驱动：用 tinytag/ffmpeg 测量真实音频时长，不用估算
自动化流水线：文案 → TTS → 合并 → 时间线 → 渲染，全程脚本化
GPU 加速：正确配置 chromiumOptions，让老显卡也能参与渲染

这套方案已经在多个教学视频项目中落地验证，从文案到成片最快可在 30 分钟内完成。希望对你有所帮助！

参考资源

📥 下载 Markdown 源文件：remotion-tts-video-pipeline-modelscope-article.md