LLMs之Pretrained：《Training Language Models via Neural Cellular Automata》翻译与解读

导读：这篇论文证明了"合成结构"也能教会模型通用能力：作者用 NCA 生成可控的非语言合成数据，先让模型学习更纯粹的规则与依赖，再接入自然语言预训练，结果不仅提升语言建模效率，还能迁移到数学、代码和推理任务；其中最重要的启示是，合成数据的价值不在于替代语义本身，而在于为模型提供更高质量、可定制的计算结构。

>> 背景痛点：

● 高质量自然语言数据正在逼近瓶颈：论文指出，大模型预训练高度依赖自然语言，但高质量文本是有限资源，而且获取与清洗成本高。

● 自然语言自带偏置，且知识与推理纠缠：作者认为，自然语言数据不仅包含人类偏见，还把事实知识、推理过程与表达形式混杂在一起，不利于把"学到的能力"拆分出来研究。

● 传统合成数据往往不够"像真实世界"：既有算法合成数据在语言领域较少见，且很多方法生成的分布过窄、同质性过强，难以在匹配 token 预算下超过自然语言训练。

● 仅靠规模堆数据并不一定最优：论文强调，合成数据的有效性不只是"更多"，而是取决于数据生成器的结构性质与复杂度是否匹配目标域。

>> 具体的解决方案：

● 用神经元胞自动机（NCA）生成非语言合成数据：作者提出用 NCA 这种可参数化、可控、可大规模生成的动力系统，作为 LLM 的"预预训练"数据来源。

● 采用"合成先行、自然语言随后"的两阶段框架：先在 NCA 动态上做 pre-pre-training，再进行标准自然语言 pre-training，从而让模型先学到更通用的计算原语，再去吸收语义。

● 显式控制 NCA 复杂度，做面向任务的分布设计：论文提出可通过 gzip 可压缩性、状态空间大小等方式调节 NCA 复杂度，以便针对 web text、math、code 等不同下游域进行匹配。

● 把 NCA 视作"学习计算规律"的训练底座：作者的核心假设是，LLM 重要能力来自结构而非语义本身，因此让模型预测 NCA 轨迹，可以迫使其学习长程依赖、局部规则和潜在计算过程。

>> 核心思路步骤：

● 构造具有丰富时空结构的 NCA 轨迹：NCA 通过神经网络参数化更新规则，可生成长程时空模式，并呈现类似自然语言的重尾、Zipf 风格统计特征。

● 按复杂度区间采样训练数据：作者不是简单随机取样，而是根据复杂度带筛选 NCA 轨迹，比较不同压缩率/复杂度对应的迁移效果。

● 先做 NCA 预预训练，再转入自然语言预训练：模型先用 next-token prediction 学习 NCA 序列，再进入 OpenWebText、OpenWebMath、CodeParrot 等自然语言语料的常规预训练。

● 用困惑度、收敛速度和下游 pass@ 评估迁移：论文主要用验证 perplexity、达到最终 perplexity 所需 token 数、以及 GSM8K/HumanEval/BigBench-Lite 的 pass 准确率来衡量效果。

● 分析"什么在驱动迁移"：作者通过重置部分权重来判断哪些模块携带迁移信号，并进一步研究数据复杂度与下游任务的匹配关系。

>> 优势：

● 样本效率高：仅用 164M NCA tokens，论文就报告了下游语言建模性能最高提升约 6%，并且加速收敛最高可达 1.6 倍。

● 甚至能超过更多自然语言数据的预预训练：令人意外的是，NCA 预预训练在部分设置下，甚至优于使用 1.6B tokens 的自然语言 C4 预预训练，且计算量还更高。

● 收益可迁移到推理任务：论文报告，这种收益不仅体现在困惑度上，也能转移到 GSM8K、HumanEval、BigBench-Lite 等推理/代码基准。

● 训练更快、更稳定：在多个模型规模和多种语料上，NCA 预预训练都能持续优于 scratch、Dyck 与 C4 基线，并表现出更快的收敛。

● 可做领域定制化设计：因为 NCA 的复杂度可调，作者认为它给了训练分布一个新旋钮，可以按目标域的计算特征去定制合成数据。

>> 结论和观点（侧重经验与建议）：

● 迁移成立的关键不是"像不像语言"，而是"像不像可学习的结构"：作者认为，模型之所以能从 NCA 迁移到语言，是因为 NCA 提供了更纯粹的规则归纳信号，而不是依赖语义内容本身。

● 注意力层是最可迁移的组件：实验显示，重置 attention 权重会带来最大退化，说明 attention 更像通用的依赖追踪与隐式规则推断载体。

● MLP 更偏向存储域特定统计：论文发现，MLP 和 LayerNorm 的迁移效果更依赖源域与目标域是否对齐；当两者差异较大时，保留这些参数甚至可能干扰学习。

● 最优复杂度是"任务依赖"的，不是一刀切：OpenWebText 更偏好更复杂、低可压缩的 NCA 规则，而 CodeParrot 则更适合中等复杂度；这说明合成数据需要按目标域调参，而不是统一使用同一分布。

● 不是数据越多越好，而是匹配得越好越有效：论文明确指出，迁移效果并不随 NCA 数据量单调增长，过高复杂度或不匹配的分布也可能让收益 plateau。

● NCA 更像是"预预训练底座"，不是语义学习的终点：作者把这一框架定义为通往"全合成预训练"的早期原型，但也承认最终语义获取仍可能需要有限且精选的自然语言数据。

● 当前仍有边界与开放问题：论文指出，NCA 目前更适合作为 pre-pre-training 信号；要成为自然语言预训练的完整替代，还需要解决更大 alphabet、不同复杂度区间下的性能平台期问题。

[《Training Language Models via Neural Cellular Automata》翻译与解读](#《Training Language Models via Neural Cellular Automata》翻译与解读)

Abstract

[Figure 1:Overview of NCA Pre-pre-training to Language Pre-training. We pre-pre-train a transformer with next-token prediction on the dynamics of neural cellular automata (NCA) sampled from selected complexity regions. We then conduct standard pre-training on natural language corpora. NCA pre-pre-training improves both validation perplexity and convergence speed on language pre-training. Interestingly, the optimal NCA distribution varies by downstream domain.图 1：NCA 预预训练到语言预训练的概述。我们先在从选定复杂度区域采样的神经元细胞自动机（NCA）动态上对一个变压器进行下一个标记预测的预预训练。然后在自然语言语料库上进行标准预训练。NCA 预预训练提高了语言预训练的验证困惑度和收敛速度。有趣的是，最优的 NCA 分布因下游领域而异。](#Figure 1:Overview of NCA Pre-pre-training to Language Pre-training. We pre-pre-train a transformer with next-token prediction on the dynamics of neural cellular automata (NCA) sampled from selected complexity regions. We then conduct standard pre-training on natural language corpora. NCA pre-pre-training improves both validation perplexity and convergence speed on language pre-training. Interestingly, the optimal NCA distribution varies by downstream domain.图 1：NCA 预预训练到语言预训练的概述。我们先在从选定复杂度区域采样的神经元细胞自动机（NCA）动态上对一个变压器进行下一个标记预测的预预训练。然后在自然语言语料库上进行标准预训练。NCA 预预训练提高了语言预训练的验证困惑度和收敛速度。有趣的是，最优的 NCA 分布因下游领域而异。)

1、Introduction

[6 Discussion](#6 Discussion)

[Why should we expect transfer?为何我们应期待迁移？](#Why should we expect transfer?为何我们应期待迁移？)

[Why is 160M tokens of automata better than 1.6B tokens of text?为什么 1.6 亿个自动机标记比 160 亿个文本标记更好？](#Why is 160M tokens of automata better than 1.6B tokens of text?为什么 1.6 亿个自动机标记比 160 亿个文本标记更好？)

[Limitations and open problems局限性和开放性问题](#Limitations and open problems局限性和开放性问题)

《Training Language Models via Neural Cellular Automata》翻译与解读

|------------|--------------------------------------------------------------------------------------------------------------|
| 地址 | 论文地址：https://arxiv.org/abs/2603.10055 |
| 时间 | 2026年03月09日 |
| 作者 | 麻省理工学院（MIT）、Improbable AI实验室（Improbable AI Lab） |

Abstract

Figure 1:Overview of NCA Pre-pre-training to Language Pre-training. We pre-pre-train a transformer with next-token prediction on the dynamics of neural cellular automata (NCA) sampled from selected complexity regions. We then conduct standard pre-training on natural language corpora. NCA pre-pre-training improves both validation perplexity and convergence speed on language pre-training. Interestingly, the optimal NCA distribution varies by downstream domain.图 1：NCA 预预训练到语言预训练的概述。我们先在从选定复杂度区域采样的神经元细胞自动机（NCA）动态上对一个变压器进行下一个标记预测的预预训练。然后在自然语言语料库上进行标准预训练。NCA 预预训练提高了语言预训练的验证困惑度和收敛速度。有趣的是，最优的 NCA 分布因下游领域而异。

1、Introduction

6 Discussion

Why should we expect transfer?为何我们应期待迁移？

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic "shortcuts" or co-occurrence priors (Abbas et al., [2023](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Geirhos et al., [2020](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). In contrast, every NCA sequence is generated by a hidden transition rule -- parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., [2022](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). This mirrors a core capability required for language modeling (Brown et al., [2020](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Wei et al., [2022](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Dong et al., [2024](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). Xie et al. ([2022](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., [2023](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Cook et al., [2025](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., [2025b](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Wu et al., [2022](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Shinnick et al., [2025b](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, [2012](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)), some of which realize Turing-complete systems (Rendell, [2002](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。); Wolfram & Gad-el Hak, [2003](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., [2024](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)) that applies across the function class. | NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义"捷径"或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的------该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 |
| This framing is supported by our mechanistic finding from Section [5.3.1](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。): attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., [2022](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)) showed that ICL ability emerges with the formation of induction heads -- attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., [2026](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, [2025](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. ([2026](#NCA data are substantially different from natural language and generated by deterministic processes, prompting the question of why one should expect transfer at all? We argue that NCAs may provide a purer training signal for in-context rule inference. In natural language, models may rely on semantic “shortcuts” or co-occurrence priors (Abbas et al., 2023; Geirhos et al., 2020). In contrast, every NCA sequence is generated by a hidden transition rule – parameterized by a random neural network. With no semantic knowledge to fall back on, every NCA token guides the model to in-context rule inference (Kirsch et al., 2022). This mirrors a core capability required for language modeling (Brown et al., 2020; Wei et al., 2022; Dong et al., 2024). Xie et al. (2022) show that training on natural text teaches models to perform implicit Bayesian inference over latent concepts: each sequence draws from a latent concept, and predicting the next token means conditioning on the inferred concept. The same mechanism appears in math and code as well (Garg et al., 2023; Cook et al., 2025). Prior work on formal languages and algorithmic tasks such as Dyck and string copying (Hu et al., 2025b; Wu et al., 2022; Shinnick et al., 2025b) also train for this kind of in-context inference. Unlike these tasks, NCAs encompass a broad, universal class of computable functions (Copeland, 2012), some of which realize Turing-complete systems (Rendell, 2002; Wolfram & Gad-el Hak, 2003). The breadth and scale of this distribution makes memorization infeasible, forcing models to learn a general mechanism for rule inference (Li et al., 2024) that applies across the function class. NCA 数据与自然语言有显著差异，并且由确定性过程生成，这引发了为何应期待迁移的问题？我们认为，NCA 可能为上下文中的规则推理提供更纯粹的训练信号。在自然语言中，模型可能会依赖语义“捷径”或共现先验知识（Abbas 等人，2023；Geirhos 等人，2020）。相比之下，每个 NCA 序列都是由一个隐藏的转换规则生成的——该规则由一个随机神经网络参数化。由于没有语义知识可依赖，每个 NCA 标记都引导模型进行上下文中的规则推理（Kirsch 等人，2022）。这反映了语言建模所需的核心能力（Brown 等人，2020；Wei 等人，2022；Dong 等人，2024）。Xie 等人（2022）表明，在自然文本上进行训练会使模型学会对潜在概念进行隐式贝叶斯推理：每个序列都从一个潜在概念中抽取，预测下一个标记意味着基于推断出的概念进行条件化。同样的机制在数学和代码中也存在（Garg 等人，2023；Cook 等人，2025）。先前关于形式语言和算法任务（如迪克语言和字符串复制）（胡等人，2025b；吴等人，2022；希尼克等人，2025b）的研究也针对这种上下文推理进行了训练。与这些任务不同，NCA 包含了一个广泛通用的可计算函数类别（科佩兰，2012），其中一些实现了图灵完备系统（伦德尔，2002；沃尔夫勒姆和加德-埃尔哈克，2003）。这种分布的广度和规模使得记忆变得不可行，迫使模型学习一种适用于整个函数类别的通用规则推理机制（李等人，2024）。 This framing is supported by our mechanistic finding from Section 5.3.1: attention layers, not the MLPs or LayerNorms carry the most transferable structure. (Olsson et al., 2022) showed that ICL ability emerges with the formation of induction heads – attention circuits that help copy information from previous tokens to future ones. Because NCA pre-pre-training exclusively rewards this behavior, it may induce earlier and more robust formation than language-only pre-training. The transferred attention weights are, in effect, the in-context learning circuits, which are later adapted for downstream tasks and domains. A secondary motivation for transfer is epiplexity (Finzi et al., 2026). Classical information theory suggests deterministic transformations cannot increase information content (Polyanskiy & Wu, 2025), thus questioning whether LLMs can learn meaningful structure from NCAs. However, this view assumes a computationally unbounded observer. For computationally bounded observers, Finzi et al. (2026) show that deterministic processes can generate useful structural information–coined epiplexity–that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着“归纳头”的形成而出现——这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是“表观复杂性”（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息——称为“表观复杂性”——模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。)) show that deterministic processes can generate useful structural information--coined epiplexity--that models must internalize to learn useful representations of the data. Their key insight is that simple local rules, like CA, can produce emergent structures (e.g., gliders, collisions) that a finite-capacity model cannot brute-force simulate. Instead, the model must learn a representation that allows it to predict the simulation at a coarser-grained abstraction. Learning these representations over a diverse and universal class of functions like NCA may help with learning representations of natural language as well. | 这种框架得到了我们在第 5.3.1 节中的机制性发现的支持：注意力层，而非多层感知机（MLP）或层归一化（LayerNorm），承载着最具可迁移性的结构。（奥尔森等人，2022）表明，上下文学习（ICL）能力随着"归纳头"的形成而出现------这些注意力电路有助于将信息从前一个标记复制到后续标记。由于 NCA 预训练专门奖励这种行为，它可能比仅语言预训练更早且更稳健地形成这种结构。实际上，转移的注意力权重就是上下文学习电路，这些电路随后会针对下游任务和领域进行调整。迁移的另一个动机是"表观复杂性"（Finzi 等人，2026 年）。经典信息理论表明，确定性变换无法增加信息量（Polyanskiy 和 Wu，2025 年），这让人质疑大型语言模型能否从非确定性细胞自动机（NCA）中学习到有意义的结构。然而，这种观点假定观察者具有无限的计算能力。对于计算能力有限的观察者而言，Finzi 等人（2026 年）表明，确定性过程可以生成有用的结构信息------称为"表观复杂性"------模型必须将其内化，才能学习到数据的有用表示。他们的关键见解在于，像细胞自动机这样的简单局部规则可以产生涌现结构（例如滑翔机、碰撞），有限容量的模型无法通过暴力模拟来实现。相反，模型必须学习一种表示，使其能够在更粗粒度的抽象层面上预测模拟结果。在像 NCA 这样多样且通用的函数类上学习这些表示，可能有助于学习自然语言的表示。 |

LLMs之Pretrained：《Training Language Models via Neural Cellular Automata》翻译与解读

《Training Language Models via Neural Cellular Automata》翻译与解读

Abstract

1、Introduction

6 Discussion

Why should we expect transfer?为何我们应期待迁移？

Why is 160M tokens of automata better than 1.6B tokens of text? 为什么 1.6 亿个自动机标记比 160 亿个文本标记更好？

Limitations and open problems局限性和开放性问题