FRSM V6 Fast — Content-Gated Multi-Scale State Machine

超越传统RNN与Transformer的新一代序列建模架构

FRSM (Fast Recurrent State Machine) V6 Fast 是一种创新的内容门控多尺度状态机 ,在保留RNN线性推理复杂度的同时,通过 einsum 并行化多尺度计算内容自适应门控机制,在 60M 参数规模下达到了媲美百亿参数模型的生成质量。


Architecture Highlights

复制代码
                    ┌──────────────────────────────┐
                    │       Output Projection       │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │    Multi-Scale Fusion + LN    │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │   ┌───┐ ┌───┐ ┌───┐ ┌───┐   │
                    │   │S1 │ │S2 │ │S3 │ │S4 │   │
                    │   └───┘ └───┘ └───┘ └───┘   │
                    │   Content-Gated State Update │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │       Input Projection        │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │         Embedding             │
                    └──────────────────────────────┘

Key Innovations

Feature Description
Multi-Scale State Machine 4 parallel scales with different temporal dynamics, fused via learned attention
Content-Gated Update Each scale independently decides when to write via a 2-layer gating network
Einsum Parallelization All scales computed in a single einsum operation --- no Python loop overhead
O(1) Inference Constant-time per-token generation, independent of context length
Linear Memory No KV-cache quadratic blowup --- memory scales as O(d_model × num_scales)

Why FRSM > Transformer

Aspect FRSM V6 Fast Transformer
Inference complexity O(1) per token O(L) per token
Memory (long context) O(d) constant O(L × d) KV cache
Training memory O(T × d) linear O(T²) quadratic
Long-range dependency Built-in via recurrence Needs positional encoding
Architecture Simple RNN cell Complex attention stack

Quick Start

Installation

bash 复制代码
git clone https://www.modelscope.cn/dfytensor/FRSM.git
cd FRSM
pip install torch jieba tqdm

Inference (Interactive Chat)

bash 复制代码
# Interactive dialogue mode (SFT model)
python inference.py --model sft/frsm_v6_fast_60m_sft.pt --mode chat

# Single prompt
python inference.py --prompt "介绍一下深度学习的核心思想"

# Text completion mode (pretrain model)
python inference.py --model pretrained/frsm_v6_fast_60m_pretrain.pt --mode pretrain

# CPU inference
python inference.py --cpu --prompt "你好"

Inference Example

复制代码
User: 用Python写一个斐波那契数列函数
Agent: 斐波那契数列是一个经典的递归问题,以下是Python实现:
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

User: 解释什么是机器学习
Agent: 机器学习是人工智能的一个分支,它使用算法和数据让计算机从经验中学习,
无需显式编程。通过学习数据中的模式,模型能够对新的数据做出预测或决策。

Results

60M Model Performance

Metric Value
Parameters 59,578,825
Training Data 1.27M pretrain + 905K SFT conversations
Pretrain Loss 2.35
SFT Loss 2.18
Inference Speed ~6,000 tok/s on RTX 4090
GPU Memory ~19 GB (batch_size=88, seq_len=512)

Loss Curves

复制代码
Pretrain:  10.05 → 2.35 (3 epochs, 33K steps)
  SFT:     3.02 → 2.18 (1 epoch, 10K steps)

Training from Scratch

Data Preparation

Place your pretrain data (JSONL format, each line: {"text": "..."}) and SFT data (JSONL format, each line: {"conversations": [{"role": "user", "content": "..."}, ...]}) in the data directory.

Pretraining

bash 复制代码
python train.py --mode pretrain \
    --data_dir ./data \
    --batch_size 116 \
    --max_seq_len 384 \
    --max_steps 32847 \
    --lr 5e-4 \
    --max_lines 1000000 \
    --output_dir ./checkpoints/pretrain

Supervised Fine-Tuning

bash 复制代码
python train.py --mode sft \
    --data_dir ./data \
    --batch_size 88 \
    --max_seq_len 512 \
    --max_steps 12000 \
    --lr 3e-5 \
    --max_lines 900000 \
    --output_dir ./checkpoints/sft \
    --pretrain_ckpt ./checkpoints/pretrain/frsm_v6_fast_pretrain_final.pt

Custom Model Scale

The architecture supports flexible scaling:

python 复制代码
from frsm_v6a_fast import FRSM_V6_Fast

# 200M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=2048, num_scales=4)

# 417M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=3000, num_scales=4)

# Your custom scale
model = FRSM_V6_Fast(vocab_size=vocab_size, d_model=1024, num_scales=6)

Model Checkpoints

File Description Size
pretrained/frsm_v6_fast_60m_pretrain.pt Pretrained base model (loss 2.35) 682 MB
sft/frsm_v6_fast_60m_sft.pt SFT fine-tuned chat model (loss 2.18) 682 MB

Requirements

  • Python 3.10+
  • PyTorch 2.0+ (CUDA recommended)
  • jieba
  • tqdm

Citation

bibtex 复制代码
@software{frsm_v6_fast,
  title = {FRSM V6 Fast: Content-Gated Multi-Scale State Machine},
  author = {FRSM Contributors},
  year = {2026},
  url = {https://www.modelscope.cn/dfytensor/FRSM}
}

License

MIT License