超越传统RNN与Transformer的新一代序列建模架构
FRSM (Fast Recurrent State Machine) V6 Fast 是一种创新的内容门控多尺度状态机 ,在保留RNN线性推理复杂度的同时,通过 einsum 并行化多尺度计算 和内容自适应门控机制,在 60M 参数规模下达到了媲美百亿参数模型的生成质量。
Architecture Highlights
┌──────────────────────────────┐
│ Output Projection │
└──────────────┬───────────────┘
┌──────────────┴───────────────┐
│ Multi-Scale Fusion + LN │
└──────────────┬───────────────┘
┌──────────────┴───────────────┐
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │S1 │ │S2 │ │S3 │ │S4 │ │
│ └───┘ └───┘ └───┘ └───┘ │
│ Content-Gated State Update │
└──────────────┬───────────────┘
┌──────────────┴───────────────┐
│ Input Projection │
└──────────────┬───────────────┘
┌──────────────┴───────────────┐
│ Embedding │
└──────────────────────────────┘
Key Innovations
| Feature | Description |
|---|---|
| Multi-Scale State Machine | 4 parallel scales with different temporal dynamics, fused via learned attention |
| Content-Gated Update | Each scale independently decides when to write via a 2-layer gating network |
| Einsum Parallelization | All scales computed in a single einsum operation --- no Python loop overhead |
| O(1) Inference | Constant-time per-token generation, independent of context length |
| Linear Memory | No KV-cache quadratic blowup --- memory scales as O(d_model × num_scales) |
Why FRSM > Transformer
| Aspect | FRSM V6 Fast | Transformer |
|---|---|---|
| Inference complexity | O(1) per token | O(L) per token |
| Memory (long context) | O(d) constant | O(L × d) KV cache |
| Training memory | O(T × d) linear | O(T²) quadratic |
| Long-range dependency | Built-in via recurrence | Needs positional encoding |
| Architecture | Simple RNN cell | Complex attention stack |
Quick Start
Installation
bash
git clone https://www.modelscope.cn/dfytensor/FRSM.git
cd FRSM
pip install torch jieba tqdm
Inference (Interactive Chat)
bash
# Interactive dialogue mode (SFT model)
python inference.py --model sft/frsm_v6_fast_60m_sft.pt --mode chat
# Single prompt
python inference.py --prompt "介绍一下深度学习的核心思想"
# Text completion mode (pretrain model)
python inference.py --model pretrained/frsm_v6_fast_60m_pretrain.pt --mode pretrain
# CPU inference
python inference.py --cpu --prompt "你好"
Inference Example
User: 用Python写一个斐波那契数列函数
Agent: 斐波那契数列是一个经典的递归问题,以下是Python实现:
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
User: 解释什么是机器学习
Agent: 机器学习是人工智能的一个分支,它使用算法和数据让计算机从经验中学习,
无需显式编程。通过学习数据中的模式,模型能够对新的数据做出预测或决策。
Results
60M Model Performance
| Metric | Value |
|---|---|
| Parameters | 59,578,825 |
| Training Data | 1.27M pretrain + 905K SFT conversations |
| Pretrain Loss | 2.35 |
| SFT Loss | 2.18 |
| Inference Speed | ~6,000 tok/s on RTX 4090 |
| GPU Memory | ~19 GB (batch_size=88, seq_len=512) |
Loss Curves
Pretrain: 10.05 → 2.35 (3 epochs, 33K steps)
SFT: 3.02 → 2.18 (1 epoch, 10K steps)
Training from Scratch
Data Preparation
Place your pretrain data (JSONL format, each line: {"text": "..."}) and SFT data (JSONL format, each line: {"conversations": [{"role": "user", "content": "..."}, ...]}) in the data directory.
Pretraining
bash
python train.py --mode pretrain \
--data_dir ./data \
--batch_size 116 \
--max_seq_len 384 \
--max_steps 32847 \
--lr 5e-4 \
--max_lines 1000000 \
--output_dir ./checkpoints/pretrain
Supervised Fine-Tuning
bash
python train.py --mode sft \
--data_dir ./data \
--batch_size 88 \
--max_seq_len 512 \
--max_steps 12000 \
--lr 3e-5 \
--max_lines 900000 \
--output_dir ./checkpoints/sft \
--pretrain_ckpt ./checkpoints/pretrain/frsm_v6_fast_pretrain_final.pt
Custom Model Scale
The architecture supports flexible scaling:
python
from frsm_v6a_fast import FRSM_V6_Fast
# 200M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=2048, num_scales=4)
# 417M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=3000, num_scales=4)
# Your custom scale
model = FRSM_V6_Fast(vocab_size=vocab_size, d_model=1024, num_scales=6)
Model Checkpoints
| File | Description | Size |
|---|---|---|
pretrained/frsm_v6_fast_60m_pretrain.pt |
Pretrained base model (loss 2.35) | 682 MB |
sft/frsm_v6_fast_60m_sft.pt |
SFT fine-tuned chat model (loss 2.18) | 682 MB |
Requirements
- Python 3.10+
- PyTorch 2.0+ (CUDA recommended)
- jieba
- tqdm
Citation
bibtex
@software{frsm_v6_fast,
title = {FRSM V6 Fast: Content-Gated Multi-Scale State Machine},
author = {FRSM Contributors},
year = {2026},
url = {https://www.modelscope.cn/dfytensor/FRSM}
}
License
MIT License