FRSM V6 Fast — Content-Gated Multi-Scale State Machine

超越传统RNN与Transformer的新一代序列建模架构

FRSM (Fast Recurrent State Machine) V6 Fast 是一种创新的内容门控多尺度状态机 ，在保留RNN线性推理复杂度的同时，通过 einsum 并行化多尺度计算 和内容自适应门控机制，在 60M 参数规模下达到了媲美百亿参数模型的生成质量。

Architecture Highlights

复制代码

                    ┌──────────────────────────────┐
                    │       Output Projection       │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │    Multi-Scale Fusion + LN    │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │   ┌───┐ ┌───┐ ┌───┐ ┌───┐   │
                    │   │S1 │ │S2 │ │S3 │ │S4 │   │
                    │   └───┘ └───┘ └───┘ └───┘   │
                    │   Content-Gated State Update │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │       Input Projection        │
                    └──────────────┬───────────────┘
                    ┌──────────────┴───────────────┐
                    │         Embedding             │
                    └──────────────────────────────┘

Key Innovations

Feature	Description
Multi-Scale State Machine	4 parallel scales with different temporal dynamics, fused via learned attention
Content-Gated Update	Each scale independently decides when to write via a 2-layer gating network
Einsum Parallelization	All scales computed in a single einsum operation --- no Python loop overhead
O(1) Inference	Constant-time per-token generation, independent of context length
Linear Memory	No KV-cache quadratic blowup --- memory scales as O(d_model × num_scales)

Why FRSM > Transformer

Aspect	FRSM V6 Fast	Transformer
Inference complexity	O(1) per token	O(L) per token
Memory (long context)	O(d) constant	O(L × d) KV cache
Training memory	O(T × d) linear	O(T²) quadratic
Long-range dependency	Built-in via recurrence	Needs positional encoding
Architecture	Simple RNN cell	Complex attention stack

Quick Start

Installation

bash 复制代码

git clone https://www.modelscope.cn/dfytensor/FRSM.git
cd FRSM
pip install torch jieba tqdm

Inference (Interactive Chat)

bash 复制代码

# Interactive dialogue mode (SFT model)
python inference.py --model sft/frsm_v6_fast_60m_sft.pt --mode chat

# Single prompt
python inference.py --prompt "介绍一下深度学习的核心思想"

# Text completion mode (pretrain model)
python inference.py --model pretrained/frsm_v6_fast_60m_pretrain.pt --mode pretrain

# CPU inference
python inference.py --cpu --prompt "你好"

Inference Example

复制代码

User: 用Python写一个斐波那契数列函数
Agent: 斐波那契数列是一个经典的递归问题，以下是Python实现：
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

User: 解释什么是机器学习
Agent: 机器学习是人工智能的一个分支，它使用算法和数据让计算机从经验中学习，
无需显式编程。通过学习数据中的模式，模型能够对新的数据做出预测或决策。

Results

60M Model Performance

Metric	Value
Parameters	59,578,825
Training Data	1.27M pretrain + 905K SFT conversations
Pretrain Loss	2.35
SFT Loss	2.18
Inference Speed	~6,000 tok/s on RTX 4090
GPU Memory	~19 GB (batch_size=88, seq_len=512)

Loss Curves

复制代码

Pretrain:  10.05 → 2.35 (3 epochs, 33K steps)
  SFT:     3.02 → 2.18 (1 epoch, 10K steps)

Training from Scratch

Data Preparation

Place your pretrain data (JSONL format, each line: {"text": "..."}) and SFT data (JSONL format, each line: {"conversations": [{"role": "user", "content": "..."}, ...]}) in the data directory.

Pretraining

bash 复制代码

python train.py --mode pretrain \
    --data_dir ./data \
    --batch_size 116 \
    --max_seq_len 384 \
    --max_steps 32847 \
    --lr 5e-4 \
    --max_lines 1000000 \
    --output_dir ./checkpoints/pretrain

Supervised Fine-Tuning

bash 复制代码

python train.py --mode sft \
    --data_dir ./data \
    --batch_size 88 \
    --max_seq_len 512 \
    --max_steps 12000 \
    --lr 3e-5 \
    --max_lines 900000 \
    --output_dir ./checkpoints/sft \
    --pretrain_ckpt ./checkpoints/pretrain/frsm_v6_fast_pretrain_final.pt

Custom Model Scale

The architecture supports flexible scaling:

python 复制代码

from frsm_v6a_fast import FRSM_V6_Fast

# 200M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=2048, num_scales=4)

# 417M model
model = FRSM_V6_Fast(vocab_size=23005, d_model=3000, num_scales=4)

# Your custom scale
model = FRSM_V6_Fast(vocab_size=vocab_size, d_model=1024, num_scales=6)

Model Checkpoints

File	Description	Size
`pretrained/frsm_v6_fast_60m_pretrain.pt`	Pretrained base model (loss 2.35)	682 MB
`sft/frsm_v6_fast_60m_sft.pt`	SFT fine-tuned chat model (loss 2.18)	682 MB

Requirements

Python 3.10+
PyTorch 2.0+ (CUDA recommended)
jieba
tqdm

Citation

bibtex 复制代码

@software{frsm_v6_fast,
  title = {FRSM V6 Fast: Content-Gated Multi-Scale State Machine},
  author = {FRSM Contributors},
  year = {2026},
  url = {https://www.modelscope.cn/dfytensor/FRSM}
}

License

MIT License