编译器 | 从工业实现到极简设计 —— 编译器的多元形态与实践

注:英文引文,机翻未校。

如有内容异常,请看原文。


21 Compilers and 3 Orders of Magnitude in 60 Minutes

21 种编译器,60 分钟跨越三个数量级

A wander through a weird landscape to the heart of compilation Spring 2019

漫游编译器的奇妙领域,直击编译技术 2019 年春季

Hello! I am someone who has worked (for pay!) on some compilers: rustc, swiftc, gcc, clang, llvm, tracemonkey, etc. Ron asked if I could talk about compiler stuff I know, give perspective on the field a bit. This is for students who already know roughly how to write compilers, not going to cover that!

大家好!我是一名职业编译器开发者,曾有偿参与过 rustc、swiftc、gcc、clang、llvm、tracemonkey 等多款编译器的开发。罗恩邀请我讲讲自己所了解的编译器相关知识,为大家梳理一下这个领域的发展视角。本次分享的受众是那些已经大致掌握编译器编写方法的学生,基础编写内容就不再赘述了!

DISO KID the speaker, in 1979

稚气未脱的开发者 演讲者,1979 年

I like compilers! Relationship akin to "child with many toy dinosaurs". Some are bigger and scarier. We will look at them first. • Some are weird and wonderful. We will visit them along the way. Some are really tiny!

我热爱编译器!这份情愫就像孩子拥有一大堆恐龙玩具一般。有些编译器庞大又"凶猛",我们先从这类说起。• 有些编译器独特又精妙,我们会在分享途中逐一探索。还有些编译器小巧至极!

MIR SIL

Borrowsaur fighting a Thunkasaur

借用恐龙大战传名恐龙

Goal for talk

本次分享的目标

I expect gap between class projects and industrial compilers is overwhelming. Want to explore space between, demystify and make more design choices clear. Reduce terror, spark curiosity, encourage trying it as career! If I can compiler, so can you!

我知道,课堂项目中的编译器和工业级编译器之间的差距大到令人望而生畏。我希望探索二者之间的过渡领域,为工业级编译器祛魅,让各类设计选择的思路更清晰。减少大家的畏难情绪,激发探索好奇心,鼓励大家将编译器开发作为职业方向!我能做编译器开发,大家也可以!

How to draw an owl 如何画一只猫头鹰

  1. Draw some circles 画几个圆圈
  2. Draw the rest of the fucking owl 画完剩下的所有部分

Plan of talk

分享大纲

Describe a few of the giants. Talk a bit about what makes them so huge & complex. Wander through the wilderness (including history) looking for ways compilers can vary, and examining specimens. Also just point out stuff I think is cool / underappreciated.

介绍几款业界的"巨头"编译器,聊聊是什么让它们如此庞大且复杂。纵览编译器领域的多样形态(包括发展历史),探寻编译器的各类演变方向,研究各类典型实例。同时也会指出一些我认为酷炫却被低估的技术点。

Caveats

注意事项

I'm not a teacher or very good at giving talks. • Lots of material, not ideal to stop for questions unless you're absolutely lost. Gotta keep pace! But: time at end for questions and/or email followup. Happy to return to things you're curious about. Slides are numbered! Jot down any you want to ask about.

我并非专业讲师,也不太擅长做分享。• 本次内容量很大,除非大家完全听不懂,否则最好不要中途提问,得保持进度!但分享最后会留足问答时间,也可以通过邮件后续交流。我很乐意解答大家感兴趣的问题,幻灯片都标了序号,把想提问的页码记下来就好。

Apologies: not as much industry-talk as I promised. Will try for some. But too many dinosaurs for show and tell!

先道个歉:本次分享的行业实操内容会比承诺的少一些,我会尽量穿插一些。实在是有太多"恐龙级"编译器要和大家介绍了!

Part 1: Some Giants

第一部分:几款巨头编译器

Specimen #1 Clang

实例 1 Clang

~2m lines of C++: 800k lines clang plus 1.2m LLVM. Self hosting, bootstrapped from GCC. C-language family (C, C++, ObjC), multi-target (23). Single AST + LLVM IR. 2007-now, large multi-org team. • Good diagnostics, fast code. Originally Apple, more permissively licensed than GCC.

约 200 万行 C++代码:80 万行 Clang 代码加 120 万行 LLVM 代码。支持自举,基于 GCC 完成首次编译。支持 C 语言家族(C、C++、Objective-C),23 个编译目标平台。采用单一抽象语法树+LLVM 中间表示。2007 年至今,由跨组织的大型开发团队维护。• 诊断信息完善,生成的代码执行效率高。最初由苹果公司开发,协议授权比 GCC 更宽松。

c++ 复制代码
LValue CodeGenFunction::EmitLValue(const Expr *E) { 
    ApplyDebugLocation DL(*this, E); 
    switch (E->getStmtClass()) { 
        default: return EmitUnsupportedLValue(E, "l-value expression"); 
        case Expr::ObjCPropertyRefExprClass: llvm_unreachable("cannot emit a property reference directly"); 
        case Expr::ObjCSelectorExprClass: return EmitObjCSelectorLValue(cast<ObjCSelectorExpr>(E));
        case Expr::ObjCIsaExprClass: return EmitObjCIsaExpr(cast<ObjCIsaExpr>(E)); 
        case Expr::BinaryOperatorClass: return EmitBinaryOperatorLValue(cast<BinaryOperator>(E)); 
        case Expr::CompoundAssignOperatorClass: { 
            QualType Ty = E->getType(); 
            if (const AtomicType *AT = Ty->getAs<AtomicType>()) 
                Ty = AT->getValueType(); 
            if (!Ty->isAnyComplexType()) 
                return EmitCompoundAssignmentLValue(cast<CompoundAssignOperator>(E)); 
            return EmitComplexCompoundAssignmentLValue(cast<CompoundAssignOperator>(E)); 
        }
        case Expr::CallExprClass: 
        case Expr::CXXMemberCallExprClass: 
        case Expr::CXXOperatorCallExprClass: 
        case Expr::UserDefinedLiteralClass: 
            return EmitCallExprLValue(cast<CallExpr>(E));
    }
}

Specimen #2 Swiftc

实例 2 Swiftc

~530k lines of C++ plus 2m lines clang and LLVM. Many same authors. Not self-hosting. Newer app-dev language. Tightly integrated with clang, interop with C/ObjC libraries. • Extra SIL IR for optimizations. Multi-target, via LLVM. 2014-now, mostly Apple.

约 53 万行 C++代码,外加 200 万行 Clang 和 LLVM 代码,许多开发者与 Clang 相同。暂不支持自举。为新一代应用开发语言打造的编译器。与 Clang 深度集成,可与 C/Objective-C 库实现互操作。• 新增 SIL 中间表示用于代码优化,基于 LLVM 实现多目标平台编译。2014 年至今,主要由苹果公司维护。

c++ 复制代码
RValue RValueEmitter::visitIfExpr(IfExpr *E, SGFContext C) { 
    auto &lowering = SGF.getTypeLowering(E->getType()); 
    if (lowering.isLoadable() || !SGF.silConv.useLoweredAddresses()) { 
        // If the result is loadable, emit each branch and forward its result into the destination block argument. 
        Condition cond = SGF.emitCondition(E->getCondExpr(), /*invertCondition*/ false, SGF.getLoweredType(E->getType()), NumTrueTaken, NumFalseTaken); 
        cond.enterTrue(SGF); 
        SILValue trueValue; 
        auto TE = E->getThenExpr(); 
        FullExpr trueScope(SGF.Cleanups, CleanupLocation(TE)); 
        trueValue = visit(TE).forwardAsSingleValue(SGF, TE); 
        cond.exitTrue(SGF, trueValue); 
        SILValue falseValue;
        cond.enterFalse(SGF); 
        auto EE = E->getElseExpr(); 
        FullExpr falseScope(SGF.Cleanups, CleanupLocation(EE)); 
        falseValue = visit(EE).forwardAsSingleValue(SGF, EE);
        cond.exitFalse(SGF, falseValue);
    }
}

Specimen #3 Rustc

实例 3 Rustc

~360k lines of Rust, plus 1.2m lines LLVM. Self-hosting, bootstrapped from OCaml. Newer systems language. • Two extra IRs (HIR, MIR). Multi-target, via LLVM. 2010-now, large multi-org team. Originally mostly Mozilla. And yes I did a lot of the initial bring-up so my name is attached to it forever; glad it worked out!

约 36 万行 Rust 代码,外加 120 万行 LLVM 代码。支持自举,基于 OCaml 完成首次编译。为新一代系统级编程语言打造的编译器。• 新增两种中间表示(HIR、MIR),基于 LLVM 实现多目标平台编译。2010 年至今,由跨组织的大型开发团队维护。最初主要由火狐公司开发。没错,我参与了这款编译器的早期开发,我的名字也因此被永久关联其中,很庆幸这款编译器能发展得这么好!

rust 复制代码
fn expr_as_rvalue(
    &mut self,
    mut block: BasicBlock,
    scope: Option<region::Scope>,
    expr: Expr<'tcx>,
) -> BlockAnd<Rvalue<'tcx>> {
    debug!(
        "expr_as_rvalue(block={:?}, scope={:?}, expr={:?})",
        block, scope, expr
    );
    let this = self;
    let expr_span = expr.span;
    let source_info = this.source_info(expr_span);
    match expr.kind {
        ExprKind::Scope { region_scope, lint_level, value, .. } => {
            let region_scope = (region_scope, source_info);
            this.in_scope(region_scope, lint_level, block, |this| {
                this.as_rvalue(block, scope, value)
            })
        }
        _ => unimplemented!()
    }
}
Aside: What is this "LLVM"?

补充:什么是 LLVM?

Notice the last 3 languages all end in LLVM. "Low Level Virtual Machine" Strongly typed IR, serialization format, library of optimizations, lowerings to many target architectures. "One-stop-shop" for compiler backends. 2003-now, UIUC at first, many industrial contributors now. Longstanding dream of compiler engineering world, possibly most successful attempt at it yet. Here is a funny diagram of modern compilers from Andi McClure K r Lex Tokens Parse Reaciros Pasing them.

大家会发现,前面三款编译器的底层都基于 LLVM。LLVM 即"低级虚拟机",它拥有强类型的中间表示、标准化的序列化格式、完善的优化算法库,还能将代码降级适配多种目标架构。是编译器后端的"一站式解决方案"。2003 年至今,最初由伊利诺伊大学厄巴纳-香槟分校开发,如今已有众多企业参与贡献。它实现了编译器工程领域长久以来的愿景,也是目前该方向最成功的尝试。

这是安迪·麦克卢尔绘制的一幅现代编译器架构图

  • Source Program → [Lex] → Tokens

  • Tokens → [Parse] → Reductions / Parsing Actions

    text 复制代码
    ┌─────────────────────────────────────────────────────────┐
    │                     Source Program                      │
    └───────────────────────────┬─────────────────────────────┘
                                │
                                ▼
    ┌─────────────────────────────────────────────────────────┐
    │  词法分析阶段 (Lexical Analysis)                        │
    │  ┌─────────┐        输出        ┌───────────┐           │
    │  │   Lex   │ ────────────────→  │  Tokens   │           │
    │  └─────────┘                    └───────────┘           │
    └───────────────────────────┬─────────────────────────────┘
                                │
                                ▼
    ┌─────────────────────────────────────────────────────────┐
    │  语法分析阶段 (Syntax Analysis)                         │
    │  ┌─────────┐        输出        ┌─────────────────────┐ │
    │  │  Parse  │ ────────────────→  │ Reductions/Actions │ │
    │  └─────────┘                    └─────────────────────┘ │
    └───────────────────────────┬─────────────────────────────┘
                                │
                                ▼
    ┌─────────────────────────────────────────────────────────┐
    │  语义分析与 IR 生成阶段 (Semantic Analysis & IR)           │
    │  ┌───────────────────┐        输出        ┌───────────┐ │
    │  │ (Semantic Check)  │ ────────────────→  │ Abstract  │ │
    │  └───────────────────┘                    │  Syntax   │ │
    │                                          └───────────┘ │
    └───────────────────────────┬─────────────────────────────┘
                                │
                                ▼
    ┌─────────────────────────────────────────────────────────┐
    │  代码生成与优化阶段 (Code Generation & Optimization)    │
    │  ┌───────────────────┐        输出        ┌───────────┐ │
    │  │ (CodeGen + Opt)   │ ────────────────→  │ Machine   │ │
    │  └───────────────────┘                    │ Language  │ │
    │                                          └───────────┘ │
    └─────────────────────────────────────────────────────────┘

Specimen #4 GNU Compiler Collection (GCC)

实例 4 GNU 编译器集合(GCC)

~2.2m lines of mostly C, C++. 600k lines Ada. Self-hosting, bootstrapped from other C compilers. Multi-language (C, C++, ObjC, Ada, D, Go, Fortran), multi-target (21). • Language & target-agnostic TREE AST and RTL IR. Challenging to work on. 1987-present, large multi-org team. • Generates quite fast code. • Originally political project to free software from proprietary vendors. Licensed somewhat protectively.

约 220 万行代码,主要为 C、C++语言编写,还有 60 万行 Ada 语言代码。支持自举,基于其他 C 语言编译器完成首次编译。多语言支持(C、C++、Objective-C、Ada、D、Go、Fortran),21 个编译目标平台。• 采用与语言、目标平台无关的 TREE 抽象语法树和 RTL 中间表示,开发维护难度较高。1987 年至今,由跨组织的大型开发团队维护。• 生成的代码执行效率极高。• 最初是一场带有理念的项目,旨在让软件摆脱专有厂商的控制,协议授权的保护性较强。

c 复制代码
static int
find_reusable_reload (rtx *p_in, rtx out, enum reg_class rclass,
                      enum reload_type type, int opnum, int dont_share)
{
  rtx in = *p_in;
  int i;
  if (earlyclobber_operand_p (out))
    return n_reloads;
  for (i = 0; i < n_reloads; i++)
    if ((reg_class_subset_p (rclass, rld[i].rclass)
         || reg_class_subset_p (rld[i].rclass, rclass))
        && (rld[i].reg_rtx == 0
            && ((in != 0 && MATCHES (rld[i].in, in)
                 && !dont_share
                 && (rld[i].out == 0 || !earlyclobber_operand_p (rld[i].out))
                 && (small_register_class_p (rclass)
                     && MERGABLE_RELOADS (type, rld[i].when_needed, opnum, rld[i].opnum))
                 || TEST_HARD_REG_BIT (reg_class_contents[(int) rclass],
                                       true_regnum (rld[i].reg_rtx)))
            || (out != 0 && MATCHES (rld[i].out, out)
                || targetm.small_register_classes_for_mode_p (VOIDmode))
            && (out == 0 || rld[i].out == 0 || MATCHES (rld[i].out, out)))
        && (in == 0 || rld[i].in == 0 || MATCHES (rld[i].in, in))))
      return i;
  return n_reloads;
}

Part 2: Why So Big?

第二部分:为何如此庞大?

Size and Economics

规模与商业逻辑

Compilers get big because the development costs are seen as justified by the benefits, at least to the people paying the bills. Developer productivity: highly expressive languages, extensive diagnostics, IDE integration, legacy interop. • Every drop of runtime performance: shipping on billions of devices or gigantic multi-warehouse fleets. Covering & exploiting all the hardware: someone makes a new chip, they pay for an industrial compiler to make use of it. Writing compilers in verbose languages: for all the usual reasons (compatibility, performance, familiarity).

编译器之所以会变得庞大,是因为对出资方而言,其开发成本能从带来的收益中得到充分回报。提升开发者效率:支持高表达性的编程语言、提供详尽的诊断信息、实现与集成开发环境的对接、兼容遗留代码库。• 极致的运行性能:编译器生成的代码会运行在数十亿台设备或大型分布式集群中,性能的微小提升都能带来巨大价值。适配并挖掘硬件潜力:每当有新的芯片问世,厂商都会投入资金开发工业级编译器,以充分发挥新硬件的性能。使用冗长的编程语言开发编译器:出于各类常见原因(兼容性、性能、开发团队的熟悉度)。

Tradeoffs and Balance

权衡与平衡

This is ok! The costs and benefits are context dependent. Different contexts, weightings: different compilers. • Remainder of talk will be exploring those differences. Always remember: balancing cost tradeoffs by context. Totally biased subset of systems: stuff I think is interesting and worth knowing, might give hope / inspire curiosity.

这种庞大的状态是合理的!成本与收益的考量会因应用场景而异,场景不同、权重不同,造就的编译器也截然不同。• 本次分享的后续内容,将围绕这些差异展开探索。请始终记住:根据具体场景平衡成本与收益。本次分享只会选取我认为有趣、值得了解的编译器体系,这些内容或许能给大家带来启发,激发探索欲。

Part 3: Variations (This part is much longer)

第三部分:编译器的多样形态(本部分内容较多)

Variation #1 Fewer Optimizations

形态 1 精简优化策略

In some contexts, "all the optimizations" is too much. Too slow to compile, too much memory, too much development / maintenance effort, too inflexible. Common in JITs, or languages with lots of indirection anyways (dynamic dispatch, pointer chasing): optimizer can't do too well anyways.

在某些场景下,"全量优化"反而过犹不及。全量优化会导致编译速度过慢、占用过多内存、开发与维护成本过高,还会让编译器缺乏灵活性。即时编译器(JIT)中常采用精简优化,对于本身存在大量间接操作的编程语言(动态分派、指针追踪)也是如此------这类语言即便做全量优化,效果也不尽如人意。

Proebsting's Law

普罗布斯特定律

"Compiler Advances Double Computing Power Every 18 Years" Sarcastic joke / real comparison to Moore's law: hardware doubles power every 18 months. Swamps compilers. Empirical observation though! Optimizations seem to only win ~3-5x, after 60+ years of work. Less-true as language gains more abstractions to eliminate (i.e. specialize / de-virtualize). More true if lower-level.

"编译器技术的进步每 18 年让计算性能翻倍" 这既是一句调侃,也是与摩尔定律的真实对比:硬件性能每 18 个月就能翻倍,其发展速度远超编译器。但这也是经实际观察得出的结论!经过 60 多年的研究,编译器优化带来的性能提升仅约 3-5 倍。当编程语言拥有更多可消除的抽象层时(即特化、去虚拟化),该定律的适用性会降低;对于底层编程语言,该定律的适用性则更高。

Discussion

讨论

The results of our experiment suggest that Proebsting's Law is probably true. The reality is somewhat grimmer than Proebsting initially supposed. Research in optimizing compilers has been ongoing since1955.The compiler technology developed over this 45-year period is able to improve the performance of integer intensive programs by a factor of 3.3. This corresponds to uniform performance improvements of about 2.8% per year. Even if we assume that the beginning of useful compiler optimization research began in the mid 1960's ,the uniform performance improvement on integer intensive codes due to compiler optimization is still only 3.6% per year. This lies in stark contrast to the 60% per year performance improvements we can expect from hardware due to Moore's Law. The performance difference between optimized and unoptimized programs is larger for the floating-point intensive codes in SPECfp95.This indicates that compiler research has had a larger effect on improving the performance of scientific codes than on improving the performance of ordinary, integer intensive applications. Again, if we assume compiler research has been ongoing since 1955, we get a doubling of performance every 16 years. This corresponds to uniform performance improvements of about 4.9% per year over this 45 year period. This is only slightly better than the results for integer intensive programs. Scott, Kevin. On Proebsting's Law. 2001

我们的实验结果表明,普罗布斯特定律很可能是成立的,而实际情况比普罗布斯特最初设想的更为严峻。编译器优化的研究从 1955 年就已开始,在这 45 年间发展的编译器技术,仅能让整数密集型程序的性能提升 3.3 倍,折合每年的平均性能提升约 2.8%。即便我们假设真正有价值的编译器优化研究始于 20 世纪 60 年代中期,编译器优化为整数密集型代码带来的年平均性能提升也仅为 3.6%。这与摩尔定律带来的硬件年 60%的性能提升形成了鲜明对比。在 SPECfp95 基准测试中,浮点密集型代码经优化与未优化的性能差距更大,这表明编译器研究对科学计算类代码的性能提升作用,要大于对普通整数密集型应用的作用。同样假设编译器研究始于 1955 年,浮点密集型代码的性能每 16 年翻倍,45 年间的年平均性能提升约 4.9%,这一结果仅略优于整数密集型程序。凯文·斯科特,《论普罗布斯特定律》,2001 年

Frances Allen Got All The Good Ones

弗朗西斯·艾伦总结了优化方法

1971: "A Catalogue of Optimizing Transformations". The ~8 passes to write if you're going to bother. Inline, Unroll (& Vectorize), CSE, DCE, Code Motion, Constant Fold, Peephole. That's it. You're welcome. Many compilers just do those, get ~80% best-case perf. https://commons.wikimedia.org/wiki/File:Allen_mg_2528-3750K-b.jpg - CC BY-SA 2.0

1971 年,她发表了《优化变换分类》一文,梳理出了约 8 种值得实现的编译器优化流程:内联展开、循环展开(及向量化)、公共子表达式消除、死代码消除、代码移动、常量折叠、窥孔优化。就是这些,掌握这些就足够了。许多编译器仅实现了这些优化,就能达到理论最佳性能的约 80%。https://commons.wikimedia.org/wiki/File:Allen_mg_2528-3750K-b.jpg - 知识共享-署名-相同方式共享 2.0 协议

Specimen #5 V8

实例 5 V8

660k lines C++ including backends. Not self-hosting. • JavaScript compiler in Chrome, Node. • Multi-target (7), multi-tier JIT. Optimizations mix of classical stuff and dynamic language stuff from Smalltalk. Multiple generations of optimization and IRs. Always adjusting for sweet spot of runtime perf vs. compile time, memory, maintenance cost, etc. Recently added slower (non-JIT) interpreter tier, removed others. 2008-present, mostly Google, open source.

包含后端代码在内,共约 66 万行 C++代码,不支持自举。• 是谷歌浏览器、Node.js 的 JavaScript 编译器。• 支持 7 个编译目标平台,采用多层级即时编译架构。优化策略融合了经典编译优化方法,以及源自 Smalltalk 的动态语言优化方法。拥有多代优化策略和中间表示,始终在运行性能、编译时间、内存占用、维护成本等维度中寻找最优平衡。近期新增了执行速度较慢的非即时编译解释器层,同时移除了其他部分层级。2008 年至今,主要由谷歌公司开发,开源维护。

c++ 复制代码
// Shared routine for word comparison against zero.
void InstructionSelector::VisitWordCompareZero(Node* user, Node* value, FlagsContinuation* cont) {
    // Try to combine with comparisons against 0 by simply inverting the branch.
    while (value->opcode() == IrOpcode::kWord32Equal && CanCover(user, value)) {
        Int32BinopMatcher m(value);
        if (CanCover(user, value)) {
            user = value;
            value = m.left().node();
            cont->Negate();
            if(!m.right().Is(0)) break;
            switch (value->opcode()){
                case IrOpcode::kWord32Equal: 
                    cont->OverwriteAndNegateIfEqual(kEqual); 
                    return VisitWordCompare(this, value, kX64Cmp32, cont);
                case IrOpcode::kInt32LessThan: 
                    cont->OverwriteAndNegateIfEqual(kSignedLessThan); 
                    return VisitWordCompare(this, value, kX64Cmp32, cont);
                case IrOpcode::kInt32LessThanOrEqual: 
                    cont->OverwriteAndNegateIfEqual(kSignedLessThanOrEqual); 
                    return VisitWordCompare(this, value, kX64Cmp32, cont);
                case IrOpcode::kUint32LessThan: 
                    cont->OverwriteAndNegateIfEqual(kUnsignedLessThan);
                    return VisitWordCompare(this, value, kX64Cmp32, cont);
                case IrOpcode::kUint32LessThanOrEqual: 
                    cont->OverwriteAndNegateIfEqual(kUnsignedLessThanOrEqual); 
                    return VisitWordCompare(this, value, kX64Cmp32, cont);
            }
        }
    }
}

Variation #2 Compiler-friendly Implementation (and Input) Languages

形态 2 适配编译器开发的实现语言(及输入语言)
Note : your textbook has 3 implementation flavours. Java, C, ML. No coincidence. ML designed as implementation language for symbolic logic (expression-tree wrangling) system: LCF (1972). LCF written in Lisp. Lisp also designed as implementation language for symbolic logic system: Advice Taker (1959). Various family members: Haskell, OCaml, Scheme, Racket. All really good at defining and manipulating trees. ASTs, types, IRs, etc. Usually make for much smaller/simpler compilers.
注意:大家的教材中会介绍三种编译器实现语言------Java、C、ML,这并非巧合。ML 语言最初是为符号逻辑(表达式树处理)系统 LCF(1972 年)设计的实现语言,而 LCF 本身是用 Lisp 语言编写的。Lisp 语言同样是为符号逻辑系统 Advice Taker(1959 年)设计的实现语言。这类语言的衍生版本有很多:Haskell、OCaml、Scheme、Racket。它们都擅长定义和处理树形结构,比如抽象语法树、类型、中间表示等,基于这类语言开发的编译器,通常体积更小、结构更简洁。

Specimen #6 Glasgow Haskell Compiler (GHC)

实例 6 格拉斯哥 Haskell 编译器(GHC)

180k lines Haskell, selfhosting, bootstrapped from Chalmers Lazy ML compiler. • Pure-functional language, very advanced type-system. Several tidy IRs after AST: Core, STG, CMM. Custom backends. 1991-now, initially academic researchers, lately Microsoft after they were hired there.

约 18 万行 Haskell 代码,支持自举,基于查尔默斯惰性 ML 编译器完成首次编译。• 为纯函数式编程语言打造,拥有极为先进的类型系统。在抽象语法树后设计了多款简洁的中间表示:Core、STG、CMM,配备自定义的编译器后端。1991 年至今,最初由学术研究者开发,后来团队被微软聘用,目前由微软维护。

haskell 复制代码
stmtToInstrs :: CmmNode e x -> NatM InstrBlock
stmtToInstrs stmt = do
  dflags <- getDynFlags
  is32Bit <- is32BitPlatform
  case stmt of
    CmmComment s -> return (unitOL (COMMENT s))
    CmmTick {} -> return nilOL
    CmmUnwind regs -> do
      let to_unwind_entry :: (GlobalReg, Maybe CmmExpr) -> UnwindTable
          to_unwind_entry (reg, expr) = M.singleton reg (fmap toUnwindExpr expr)
          tbl = foldMap to_unwind_entry regs
      if M.null tbl
        then return nilOL
        else do
          lbl <- mkAsmTempLabel <$> getUniqueM
          return $ unitOL $ UNWIND lbl tbl
    CmmAssign reg src -> do
      let ty = cmmRegType dflags reg
          format = cmmTypeFormat ty
      if isFloatType ty
        then assignReg_FltCode format reg src
        else if is32Bit && isWord64 ty
             then assignReg_I64Code reg src
             else assignReg_IntCode format reg src
    CmmStore addr src -> do
      let ty = cmmExprType dflags src
          format = cmmTypeFormat ty
      if isFloatType ty
        then assignMem_FltCode format addr src
        else if is32Bit && isWord64 ty
             then assignMem_I64Code addr src
             else assignMem_IntCode format addr src
    _ -> return nilOL
  where
    isFloatType ty = cmmTypeIsFloating ty

Specimen #7 Chez Scheme

实例 7 Chez Scheme

87k lines Scheme (a Lisp), selfhosting, bootstrapped from CScheme. 4 targets, good performance, incremental compilation. Written on "nanopass framework" for compilers with many similar IRs. Chez has 27 different IRs! 1984-now, academic-industrial, mostly single developer. Getting down to the size-range where a compiler is small enough to be that.

约 8.7 万行 Scheme 代码(Lisp 语言的衍生版本),支持自举,基于 CScheme 完成首次编译。支持 4 个编译目标平台,生成的代码性能优异,支持增量编译。基于"微遍框架"开发,该框架适用于拥有多款相似中间表示的编译器,而 Chez Scheme 本身拥有 27 种不同的中间表示!1984 年至今,由学界与业界合作开发,主要由一位开发者维护。这款编译器的体积已经小到了单人可独立开发维护的程度。

scheme 复制代码
(define asm-size
  (lambda (x)
    (case (car x)
      [(asm) 0]
      [(byte) 1]
      [(word) 2]
      [else 4])))

(define asm-move
  (lambda (code* dest src)
    (Trivit (dest src)
      (record-case src
        [(imm) (n)
         (if (and (eqv? n 0) (record-case dest [(reg) r #t] [else #f]))
             (emit xor dest dest code*)
             (emit movi src dest code*))]
        [(literal) stuff (emit movi src dest code*)]
        [else (emit mov src dest code*)]))))

(define-who asm-move/extend
  (lambda (op)
    (lambda (code* dest src)
      (Trivit (dest src)
        (case op
          [(sext8) (emit movsb src dest code*)]
          [(zext8) (emit movzb src dest code*)]
          [(sext16) (emit movsw src dest code*)]
          [(zext16) (emit movzw src dest code*)]
          [else (sorry! who "unexpected op ~s" op)])))))

Specimen #8 Poly/ML

实例 8 Poly/ML

44k lines SML, self-hosting. Single machine target (plus byte-code), AST + IR, classical optimizations. Textbook style. Standard platform for symbolic logic packages Isabelle and HOL. 1986-now, academic, mostly single developer.

约 4.4 万行标准 ML 代码,支持自举。仅支持一个原生机器目标平台(外加字节码平台),采用抽象语法树+单一中间表示的架构,实现经典的编译器优化策略,是教科书式的编译器实现。是符号逻辑工具包 Isabelle 和 HOL 的标准运行平台。1986 年至今,由学术研究者开发,主要由一位开发者维护。

sml 复制代码
fun cgOp (PushToStack(RegisterArg reg)) =
  let
    val (rc, rx) = getReg reg
  in
    opCodeBytes (PUSH_R rc, if rx then SOME{w=false, b=true, x=false, r=false} else NONE)
  end
| cgOp (PushToStack (MemoryArg{base, offset, index})) =
  opAddressPlus2(Group5, LargeInt.fromInt offset, base, index, 0w6)
| cgOp (PushToStack(NonAddressConstArg constnt)) =
  if is8BitL constnt
  then opCodeBytes (PUSH_8, NONE) @ [Word8.fromLargeInt constnt]
  else if is32bit constnt
       then opCodeBytes (PUSH_32, NONE) @ int32Signed constnt
       else error "constant too large"
| cgOp (PushToStack(AddressConstArg _)) =
  case targetArch of
    Native64Bit =>
      let
        val opb = opCodeBytes(Group5, NONE)
        val mdrm = modrm(Based0, 0w6 (* push *), 0w5 (* PC rel *))
      in
        opb @ [mdrm] @ int32Signed(tag 0)
      end
  | Native32Bit => opCodeBytes(PUSH_32, NONE) @ int32Signed(tag 0)
  | ObjectId32Bit => opCodeBytes(PUSH_32, NONE) @ int32Signed(tag 0)

Specimen #9 CakeML

实例 9 CakeML

58k lines SML, 5 targets, selfhosting. 9 IRs, many simplifying passes. 160k lines HOL proofs: verified! • Language semantics proven to be preserved through compilation!!! Cannot emphasize enough. This was science fiction when I was young. CompCert first serious one, now several. 2012-now, deeply academic.

约 5.8 万行标准 ML 代码,支持 5 个编译目标平台,支持自举。拥有 9 种中间表示,设计了多款代码简化处理流程。配套 16 万行 HOL 定理证明代码,是经过形式化验证的编译器!• 经证明,编译过程中编程语言的语义完全保留!这一点再怎么强调都不为过,在我年轻时,这简直是科幻般的存在。CompCert 是首款成熟的形式化验证编译器,如今这类编译器已有多款。2012 年至今,属于纯学术研究型编译器。

sml 复制代码
val WordOp64_on_32_def = Define `
  WordOp64_on_32 (opw:opw) =
    dtcase opw of
      | Andw => list_Seq [Assign 29 (Const 0w);
                          Assign 27 (Const 0w);
                          Assign 33 (Op And [Var 13; Var 23]);
                          Assign 31 (Op And [Var 11; Var 21])]
      | Orw => list_Seq [Assign 29 (Const 0w);
                         Assign 27 (Const 0w);
                         Assign 33 (Op Or [Var 13; Var 23]);
                         Assign 31 (Op Or [Var 11; Var 21])]
      | Xorw => list_Seq [Assign 29 (Const 0w);
                          Assign 27 (Const 0w);
                          Assign 33 (Op Xor [Var 13; Var 23]);
                          Assign 31 (Op Xor [Var 11; Var 21])]
      | Addw => list_Seq [Assign 29 (Const 0w);
                          Assign 27 (Const 0w);
                          Inst (Arith (AddCarry 33 13 23 29));
                          Inst (Arith (AddCarry 31 11 21 29))]
      | Subw => list_Seq [Assign 29 (Const 1w);
                          Assign 27 (Op Xor [Const (-1w); Var 23]);
                          Inst (Arith (AddCarry 33 13 27 29));
                          Assign 27 (Op Xor [Const (-1w); Var 21]);
                          Inst (Arith (AddCarry 31 11 27 29))]`

Variation #3 Meta-languages

形态 3 元语言

Notice Lisp / ML code looks a bit like grammar productions: recursive branching tree-shaped type definitions, pattern matching. There's a language lineage that took that idea ("programs as grammars") to its logical conclusion: metacompilers (a.k.a. "compiler-compilers"). Ultimate in "compiler-friendly" implementation languages. More or less: parser glued to an "un-parser". Many times half a metacompiler lurks in more-normal compilers: • YACCs ("yet another compiler-compiler"): parser-generators BURGs ("bottom-up rewrite generators"): code-emitter-generators See also: GCC ".md" files, LLVM TableGen. Common pattern!

大家会发现,Lisp/ML 语言的代码风格与语法产生式有些相似,都包含递归分支的树形类型定义和模式匹配。有一类语言将"程序即语法"的理念发挥到了极致,那就是元编译器(也被称为"编译器的编译器"),它们是适配编译器开发的终极实现语言。元编译器本质上就是将语法分析器与"反解析器"结合的工具。很多常规编译器中,都暗藏着半个元编译器的设计:• YACC(又一款编译器的编译器):语法分析器生成工具 BURG(自底向上重写生成器):代码生成器生成工具 除此之外,GCC 的.md 文件、LLVM 的 TableGen 也采用了类似设计,这是一种非常常见的模式!

Aside: SRI-ARC

补充:斯坦福研究院增强研究实验室(SRI-ARC)

Stanford Research Institute - Augmentation Research Lab. US Air Force R&D project. Very famous for its NLS ("oNLine System"). History of that project too big to tell here. Highly influential in forms of computer-human interaction, hypertext, collaboration, visualization. Less well-known is their language tech: TREE-META and MPS/MPL.

该实验室是美国空军的研发项目合作机构,其开发的 NLS(在线系统)闻名世界。这个项目的发展历程十分丰富,在此无法详细展开,它在人机交互、超文本、协同工作、可视化技术等领域产生了深远影响。鲜为人知的是,该实验室在语言技术领域也有重要成果:TREE-META 和 MPS/MPL。

SRI-ARC 18-AUG-72 10:02 SRI-ARC 8 JUNE 1972 10575 Highlights of1971 Summary 10575 At present the primary language systems developed and in use at ARC are the Tree-meta Compiler-compiler Systen and the Ll0 Programning language system wnich was written in Tree-Meta. 3cle5a Work is currentiy progressing on a Modular Programming System (MPS) in collaboration with a group at the Xerox Palo Alto Researen Center. 3cle5b

SRI-ARC 1972 年 8 月 18 日 10:02 SRI-ARC 1972 年 6 月 8 日 10575 1971 年工作重点总结 10575 目前,增强研究实验室开发并投入使用的语言系统为 Tree-meta 元编译器系统,以及基于 Tree-meta 开发的 Ll0 编程语言系统。3cle5a 目前正与施乐帕洛阿尔托研究中心合作,开展模块化编程系统(MPS)的研发工作。3cle5b

Specimen #10 TREE-META

实例 10 TREE-META

• 184 lines of TREE-META. Bootstrapped from META-II. • In the Schorre metacompiler family (META, META-II) SRI-ARC, 1967. Made to support language tools in the NLS project. "Syntax-directed translation": parse input to trees, un-parse to machine code. Only guided by grammars. Hard to provide diagnostics, typechecking, optimization, really anything other than straight translations. But: extremely small, simple compliers. Couple pages. Ideal for bootstrap phase.

• 仅 184 行 TREE-META 代码,基于 META-II 完成自举。• 属于肖尔元编译器家族(META、META-II),由 SRI-ARC 于 1967 年开发,旨在为 NLS 项目的语言工具提供支持。采用"语法导向翻译":将输入解析为树形结构,再反解析为机器码,全程仅由语法规则引导。难以提供诊断信息、类型检查、代码优化等功能,仅能实现简单的直接翻译。但优势在于:编译器体积极小、结构极简,仅有几页代码,是编译器自举阶段的理想选择。

meta 复制代码
.META PROGM
[-,#1] =>#1');' *1;
'%IN;'%;
OUTPT[-,-] => % *1 ':' % '%PUSHJ;' % *2 '%POPJ;' % ;
AC[-,-,-] => *1 *3 ACX [*2,#1] #1 ':' % ;
ACX[AC[-,-,-],#1] => #1 ');' % *1:*1 *1:*3 ACX[*1:*2,#1]
[-,#1] => #1 ');' % *1 ;
BALTER[-] => '%SAV;' % *1 '%RSTR;' % ;
ERCODE [-,NUM] => *1 '%ERCHK;DATA(' OUTAB[*2] OUTAB[-] => <TN <- CONV[*1]; OUT[TN] > ');' % ;
ERR[-] => *1 '%ERCHK;DATA(0);' % ;
[-.-] => *1 '%ERSTR;DATA(' OPSTR[*2] ;
NDMK[-,-] => '%MKND;DATA(@' *1 ',' *2 ');' % ;
NDLB => '%NDLBL;DATA(@' *1 ');' % ;
MKNODE [-] => '%NDMK;DATA(' *1 ');' % ;
CALL[-] F/ => ' BF;DATA@'; :
OER[-,-] =>*1R;'*2
D00[-,-]=>*1*2;
NDLB OUTRE;'%;
'%SET;' PRIM[.ID] 1> '%'*1
[.SR] => OUTAB[-] STST[-]
[.SR] => '%SRP;DATA(' OPSTR[*1] ;
CALL[-] => '%CALL;DATA(@' *1 ');' % ;
SCODE[-]=> STST[-] => '%TST;DATA(' OPSTR[*1] ;
SCODE[-] => '%CHRCK;DATA(' OUTAB[*1] ;
ARB[-] => #1 ':' % *1 '%BT;DATA(@' #1 ');' % '%SET;' % ;
BEGINN[-.-] => <B <- 0 > *2 'ENTRY 0;%INIT;%CALL;DATA(@' *1 ');' %
BEGINN[-.-] ≈> bootstrapping from the old xs-9h0 to the new or-0. In the past

Specimen #11 (Segue) Mesa

实例 11(过渡)Mesa

42k lines of Mesa (bootstrapped from MPL, itself from TREE-META). One of my favourite languages! Strongly typed, modules with separate compilation and type checked linking. Highly influential (Modula, Java). Co-designed language, OS, and byte-code VM implemented in CPU microcode, adapted to compiler. Xerox PARC, 1976-1981, small team left SRI-ARC, took MPL.

约 4.2 万行 Mesa 代码,基于 MPL 完成自举,而 MPL 本身又基于 TREE-META 开发。这是我最喜欢的语言之一!强类型语言,支持模块的分离编译和类型检查链接,影响力深远(启发了 Modula、Java 等语言)。语言、操作系统和字节码虚拟机协同设计,虚拟机由 CPU 微码实现,与编译器高度适配。由施乐帕洛阿尔托研究中心开发,1976-1981 年,团队来自 SRI-ARC,并带来了 MPL 技术。

mesa 复制代码
sCall: PROCEDURE [node: TreeIndex] RETURNS [nrets: CARDINAL]
BEGIN -- generates code for procedure call statement
  OPEN FOpCodes;
  ptsei:CSEIndex;
  operandtype:TYPE;
  computedtarget:BOOLEAN;
  inlineTree:TreeLink;
  nparms:CARDINAL;
  savestacksize: INTEGER;
  sei:ISEIndex;
  bti:CBTIndex;
  a:BitAddress;
  inlineCall: BOOLEAN;
  WITH (tb+node).son1 SELECT FROM
    symbol=> BEGIN
      sei := index;
      inlineCall := (seb+sei).constant AND (seb+sei).extended;
      computedtarget := (ctxb+(seb+sei).ctxnum).ctxlevel # 1G;
    END;
    ENDCASE -> BEGIN
      inlineCall := FALSE;
      computedtarget := TRUE;
      IF ~inlineCall THEN dumpstack[];
    END;
END;

XEROX PARC - CC BY 2.0

Variations #4, #5, and #6 Leverage Interpreters

形态 4、5、6 结合解释器使用

Mesa and Xerox PARC is a nice segue into next few points: all involve compilers interacting with interpreters. Interpreters & compilers actually have a long relationship! • In fact interpreters predate compilers. Let us travel back in time to the beginning, to illustrate!

Mesa 语言和施乐帕洛阿尔托研究中心的成果,正好引出接下来的几个话题:所有内容都围绕编译器与解释器的结合展开。解释器和编译器其实有着悠久的渊源!• 事实上,解释器的诞生早于编译器。让我们回到最初的时代,来细说这段历史!

Origins of "computer"

"计算机"的起源

1940s: First digital computers. Before: fixed-function machines and/or humans (largely women) doing job called "computer". Computing power literally measured in "kilo-girls" and "kilo-girl-hours".

##20 世纪 40 年代:第一台数字计算机诞生。在此之前,完成计算工作的是专用机器和/或人工计算者(主要为女性),这份工作就叫"computer"。当时的计算能力甚至用"千个女性计算者"和"千个女性计算工时"来衡量。

ENIAC: General Hardware

埃尼阿克:通用硬件

1945: ENIAC built for US Army, Ordnance Corps. Artillery calculations in WWII. "Programmers" drawn from "computer" staff, all women. "Programming" meant physically rewiring per task.

##1945 年:埃尼阿克为美国陆军军械部队研制,用于二战中的火炮弹道计算。"程序员"均从人工计算者中选拔,全部为女性。当时的"编程"意味着为每个任务重新进行物理接线。

Stored Programs

存储程序

1948: Jean Bartik leads team to convert ENIAC to "stored programs", instructions (called "orders") held in memory. Interpreted by hardware. Faster to reconfigure than rewiring; but ran slower. Subroutine concept developed for factoring stored programs.

1948 年:琼·巴蒂克带领团队将埃尼阿克改造为"存储程序"架构,指令(当时称为"命令")被存储在内存中,由硬件进行解释执行。这种方式比重新接线更易重构,但运行速度更慢。为了拆分存储程序,子程序的概念应运而生。

First Software Pseudo codes: Interpreters on ENIAC, BINAC, UNIVAC

首款软件伪代码:埃尼阿克、BINAC、UNIVAC 上的解释器

1949:"Short Code" software interpreters for higher level = "pseudo-code" instructions (non-HW-interpreted) that denote subroutine calls and expressions. ~50x slower than HW-interpreted.

1949 年:"短代码"软件解释器问世,用于执行高层级的伪代码指令(非硬件解释),这些指令代表子程序调用和表达式运算,运行速度比硬件解释慢约 50 倍。

substitute operators and parentheses. substitute variables group into 12-byte words.

Note multiplication is represented by juxtaposition.

替换运算符和括号、替换变量,组合为 12 字节的指令字。

注意:乘法通过字符并列表示。

Specimen #12 A-0: The First Compiler

实例 12 A-0:首款编译器

Reads interpreter-like pseudocodes, then emits "compilation" program with all codes resolved to their subroutines. Result runs almost as fast as manually coded; but as easy to write-for as interpreter. An interpreter "fast mode". Rationale all about balancing time tradeoffs (coding-time, compiler execution-time, run-time). 1951, Grace Hopper, Univac

读取类解释器的伪代码,然后生成一个"编译后"程序,将所有伪代码解析为对应的子程序。生成的程序运行速度几乎与手工编写的代码相当,同时又和解释器一样易于编写,相当于解释器的"快速模式"。其设计是平衡各类时间成本(编码时间、编译器执行时间、运行时间)。1951 年,由格蕾丝·霍珀为 UNIVAC 计算机开发。

Balance between interpretation and compilation is context dependent too!

解释和编译的平衡同样依赖于具体场景

Variation #4 Only Compile from Frontend to IR, Interpret Residual VM Code

形态 4 仅将前端代码编译为中间表示,解释执行剩余的虚拟机代码

Can stop before real machine code. Emit IR == "virtual machine" code. • Can further compile or just interpret that VM code. • Residual VM interpreter has several real advantages: • Easier to port to new hardware, or bootstrap compiler. "Just get something running". • Fast compilation & program startup, keeps interactive user engaged. Simply easier to write, less labor. Focus your time on frontend semantics. As a cheap implementation device: bytecode interpreters offer 1/4 of the performance of optimizing native-code compilers, at 1/20 of the implementation cost. https://xavierleroy.org/talks/zam-kazam05.pdf

可在生成原生机器码前停止编译,将中间表示作为"虚拟机"代码输出。• 可对该虚拟机代码进一步编译,也可直接解释执行。• 基于剩余虚拟机代码的解释器具备诸多实际优势:• 更易移植到新硬件,也更易实现编译器自举,目标是"先让程序跑起来"。• 编译速度和程序启动速度更快,能提升交互式用户的体验。实现难度更低、工作量更小,可将开发精力集中在前端语义分析上。作为一种低成本的实现方案:字节码解释器的性能约为优化型原生代码编译器的 1/4,但实现成本仅为其 1/20。

Specimen #13 Roslyn

实例 13 Roslyn

350k lines C#, 320k lines VB. Self-hosting, bootstrapped off previous gen. Multi-language framework (C#, VB.NET). Rich semantics, good diagnostics, IDE integration. Lowers from AST to CIL IR. Separate CLR project interprets or compiles IR. 2011-now, Microsoft, OSS.

约 35 万行 C#代码、32 万行 VB 代码,支持自举,基于上一代版本完成首次编译。多语言编译框架(支持 C#、VB.NET),语义分析能力丰富,诊断信息完善,深度集成各类集成开发环境。将抽象语法树降级为 CIL 中间表示,由独立的 CLR 项目对该中间表示进行解释或进一步编译。2011 年至今,由微软开发,开源维护。

csharp 复制代码
private void EmitBinaryOperatorInstruction (BoundBinaryOperator expression)
{
    switch (expression.OperatorKind.Operator())
    {
        case BinaryOperatorKind.Multiplication:
            _builder.EmitOpCode(ILOpCode.Mul);
            break;
        case BinaryOperatorKind.Addition:
            _builder.EmitOpCode(ILOpCode.Add);
            break;
        case BinaryOperatorKind.Subtraction:
            _builder.EmitOpCode(ILOpCode.Sub);
            break;
        case BinaryOperatorKind.Division:
            if (IsUnsignedBinaryOperator(expression))
            {
                _builder.EmitOpCode(ILOpCode.Div_un);
            }
            else
            {
                _builder.EmitOpCode(ILOpCode.Div);
            }
            break;
    }
}

Specimen #14 Eclipse Compiler for Java (ECJ)

实例 14 Eclipse Java 编译器(ECJ)

146k lines Java, self-hosting, bootstrapped off Javac. In Eclipse! Also in many Java products (eg. IntelliJ IDEA). Rich semantics, good diagnostics, IDE integration. Lowers from AST to JVM IR. Separate JVM projects interpret or compile IR. 2001-now, IBM, OSS.

约 14.6 万行 Java 代码,支持自举,基于 Javac 完成首次编译。集成于 Eclipse 开发环境,同时也被多款 Java 产品采用(如 IntelliJ IDEA)。语义分析能力丰富,诊断信息完善,深度适配集成开发环境。将抽象语法树降级为 JVM 中间表示,由独立的 JVM 项目对该中间表示进行解释或进一步编译。2001 年至今,由 IBM 开发,开源维护。

java 复制代码
/**
 * Code generation for the conditional operator ?:
 * @param currentScope org.eclipse.jdt.internal.compiler.lookup.BlockScope
 * @param codeStream org.eclipse.jdt.internal.compiler.codegen.CodeStream
 * @param valueRequired boolean
 */
@Override
public void generateCode(BlockScope currentScope, CodeStream codeStream, boolean valueRequired) {
    if (!valueRequired) return;
    int pc = codeStream.position;
    BranchLabel endifLabel, falseLabel;
    if (this.constant != Constant.NotAConstant) {
        codeStream.recordPositionsFrom(pc, this.sourceStart);
        codeStream.generateConstant(this.constant, this.implicitConversion);
        return;
    }
    Constant cst = this.condition.optimizedBooleanConstant();
    boolean needTruePart = !(cst != Constant.NotAConstant && cst.booleanValue() == false);
    boolean needFalsePart = !(cst != Constant.NotAConstant && cst.booleanValue() == true);
    endifLabel = new BranchLabel(codeStream);
}

Variation #5 Only Compile Some Functions, Interpret the Rest

形态 5 仅编译部分函数,解释执行其余代码

Cost of interpreter only bad at inner loops or fine-grain. Outer loops or coarse-grain (eg. function calls) similar to virtual dispatch. Design option: interpret by default, selectively compile hot functions ("fast mode") at coarse grain. Best of both worlds! • Keep interpreter-speed immediate feedback to user. • Interpreter may be low-effort, portable, can bootstrap. • Defer effort on compiler until needed. Anything hard to compile, just call back to interpreter.

解释器的性能劣势仅体现在内层循环或细粒度代码执行中,对于外层循环或粗粒度操作(如函数调用),其性能与虚分派相近。一种设计方案为:默认采用解释执行,针对粗粒度的热点函数选择性编译(即"快速模式"),兼顾两种方式的优势!• 保留解释器的快速执行特性,为用户提供即时反馈。• 解释器的实现成本更低、可移植性更强,也能用于编译器自举。• 仅在需要时投入精力开发编译逻辑,对于难以编译的代码,直接回调解释器执行即可。

Specimen #15 Pharo/Cog

实例 15 Pharo/Cog

54k line VM interpreter and 18k line JIT: C code generated from Smalltalk metaprograms. Bootstrapped from Squeak. Smalltalk is what you'll actually hear people mention coming from Xerox PARC. • Very simple language. "Syntax fits on a postcard". Easy to interpret. • Complete GUI, IDE, powerful tools. • Standard Smalltalk style: interpret by default, JIT for "fast mode". Compiler bootstraps-from and calls-into VM whenever convenient. Targets ARM, x86, x64, MIPS. 2008-now, academic-industrial consortium.

包含 5.4 万行代码的虚拟机解释器和 1.8 万行代码的即时编译器,为基于 Smalltalk 元程序生成的 C 代码,基于 Squeak 完成自举。Smalltalk 正是大家熟知的、由施乐帕洛阿尔托研究中心研发的语言。• 语言设计极简,素有"语法能写在明信片上"的说法,易于实现解释执行。• 配套完整的图形用户界面、集成开发环境和强大的工具链。• 采用 Smalltalk 标准设计风格:默认解释执行,针对热点代码启用即时编译的"快速模式"。编译器基于虚拟机实现自举,并可在需要时灵活调用虚拟机的功能。支持 ARM、x86、x64、MIPS 架构。2008 年至今,由学界和业界联合开发维护。

smalltalk 复制代码
genShortJumpIfFalse
    | distance target|
    distance := self v3: (self generatorAt: byte0) ShortForward: bytecodePC Branch: 0 Distance: methodObj.
    target:= distance+1+bytecodePC.
    self genJumpIf: objectMemory falseObject to: target

Specimen #16 Franz Lisp

实例 16 Franz Lisp

20k line C interpreter, 7,752 line Lisp compiler. Older command-line system, standard Unix Lisp for years. • Like Smalltalk: very simple language. Actually an AST/IR that escaped from the lab. Easy to interpret. Frequent Lisp style: interpret by default; compile for "fast mode". Compiler bootstraps-from and callsinto interpreter whenever convenient. Targets m68k and VAX. 1978-1988, UC Berkeley.

包含 2 万行代码的 C 语言解释器和 7752 行代码的 Lisp 编译器,是早期的命令行式系统,曾长期作为 Unix 系统的标准 Lisp 语言环境。• 与 Smalltalk 类似:语言设计极简,本质上是从实验室走出的抽象语法树/中间表示,易于实现解释执行。• 采用 Lisp 语言的经典设计风格:默认解释执行,针对热点代码编译为"快速模式"。编译器基于解释器实现自举,并可在需要时灵活调用解释器的功能。支持 m68k、VAX 架构。1978-1988 年,由加州大学伯克利分校开发。

lisp 复制代码
;--- e-move :: move value from one place to another
; this corresponds to d-move except the args are EIADRS
(defun e-move (from to)
  (if (and (dtpr from) (eq '$ (car from)) (eq 0 (cadr from)))
      (e-write2 'clrl to)
      (e-write3 'movl from to)))

;--- d-move :: emit instructions to move value from one place to another
(defun d-move (from to)
  (makecomment `(from ,(e-uncvt from) to ,(e-uncvt to)))
  #+(or for-vax for-tahoe)
  (cond ((eq 'Nil from) (e-move '($ 0) (e-cvt to)))
        (t (e-move (e-cvt from) (e-cvt to))))
  #+for-68k
  (let ((froma (e-cvt from)) (toa (e-cvt to)))
    (if (and (dtpr froma) (eq '$ (car froma))
             (and (> (cadr froma) -1) (< (cadr froma) 65))
             (atom toa) (eq 'd (nthchar toa 1)))
        ; it's a mov #immed,Dn, where 0 <= immed <= 64
        (e-write3 'moveq froma toa)
        (cond ((eq 'Nil froma) (e-write3 'movl '#.nil-reg toa))
              (t (e-write3 'movl froma toa))))))

Variation #6 Partial Evaluation Tricks

形态 6 部分求值技巧

Consider program in terms of parts that are static (will not change anymore) or dynamic (may change). Partial evaluator (a.k.a. "specializer") runs the parts that depend only on static info, emits residual program that only depends on dynamic info. Note: interpreter takes two inputs: program to interpret, and program's own input. First is static, but redundantly treated as dynamic. So: compiling is like partially evaluating an interpreter, eliminating the redundant dynamic treatment in its first input.

将程序划分为静态部分(运行过程中不再改变)和动态部分(运行过程中可能改变)。部分求值器(也称为"特化器")会先执行仅依赖静态信息的代码,生成仅包含动态信息相关逻辑的残余程序。需要注意的是:解释器接收两个输入------待解释的程序和该程序的输入,前者属于静态信息,却被冗余地当作动态信息处理。因此,编译过程本质上相当于对解释器进行部分求值,消除其对第一个输入的冗余动态处理。

Futamura Projections

二村投影

Famous work relating programs P, interpreters I, partial evaluators E, and compilers C. The so-called "Futamura Projections":

这是一项关联了程序 P、解释器 I、部分求值器 E 和编译器 C 的经典研究,即所谓的"二村投影":

  1. E(I,P) → partially evaluate I§ → emit C§, a compiled program
    对解释器 I 和程序 P 执行部分求值 E(I,P),得到针对 P 的编译程序 C§
  2. E(E,I) → partially evaluate λP.I§ → emit C, a compiler!
    对部分求值器 E 和解释器 I 执行部分求值 E(E,I),得到通用编译器 C!
  3. E(E,E) → partially evaluate λI.λP.I§ → emit a compiler-compiler!
    对部分求值器 E 自身执行两次部分求值 E(E,E),得到编译器的编译器!
    Futamura, Yoshihiko, 1971. Partial Evaluation of Computation Process- An Approach to a Compiler-Compiler. http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.10.2747
    Formal strategy for building compilers from interpreters and specializers.
    该理论由吉彦二村于 1971 年提出,出自论文《计算过程的部分求值------一种构建编译器的编译器的方法》。这是一种基于解释器和特化器构建编译器的规范化策略。

Specimen #17 Truffle/Graal

实例 17 Truffle/Graal

240k lines of Java for Graal (VM); 90k lines for Truffle (interpreter-writing framework). Actual real system based on first Futamura Projection. Seriously competitive! Potential future Oracle JVM. • Multi-language (JavaScript, Python, Ruby, R, JVM byte code, LLVM bitcode) multi-target (3). "Write an interpreter with some machinery to help the partial evaluator, get a compiler for free". Originally academic, now Oracle.

Graal(虚拟机)包含 24 万行 Java 代码,Truffle(解释器开发框架)包含 9 万行 Java 代码。这是基于二村第一投影实现的工业级系统,性能极具竞争力,有望成为甲骨文未来的 JVM 虚拟机。• 支持多语言编译(JavaScript、Python、Ruby、R、JVM 字节码、LLVM 位码),适配 3 种目标架构。设计理念为:"开发一个适配部分求值器的解释器,就能免费获得一款编译器"。最初为学术研究项目,现由甲骨文公司维护。

java 复制代码
public Variable emitConditional(LogicNode node, Value trueValue, Value falseValue) {
    if (node instanceof IsNullNode) {
        IsNullNode isNullNode = (IsNullNode) node;
        LIRKind kind = gen.getLIRKind(isNullNode.getValue().stamp(NodeView.DEFAULT));
        Value nullValue = gen.emitConstant(kind, isNullNode.nullConstant());
        return gen.emitConditionalMove(kind.getPlatformKind(), operand(isNullNode.getValue()),
                nullValue, Condition.EQ, false, trueValue, falseValue);
    } else if (node instanceof CompareNode) {
        CompareNode compare = (CompareNode) node;
        PlatformKind kind = gen.getLIRKind(compare.getX().stamp(NodeView.DEFAULT)).getPlatformKind();
        return gen.emitConditionalMove(kind, operand(compare.getX()), operand(compare.getY()),
                compare.condition().asCondition(), compare.unorderedIsTrue(), trueValue, falseValue);
    } else if (node instanceof LogicConstantNode) {
        return gen.emitMove(((LogicConstantNode) node).getValue() ? trueValue : falseValue);
    } else if (node instanceof IntegerTestNode) {
        IntegerTestNode test = (IntegerTestNode) node;
        return gen.emitIntegerTestMove(operand(test.getX()), operand(test.getY()), trueValue, falseValue);
    } else {
        throw GraalError.unimplemented(node.toString());
    }
}

Variation #7 Forget IR and/or AST!

形态 7 摒弃中间表示和/或抽象语法树

In some contexts, even building an AST or IR is overkill. Small hardware, tight budget, one target, bootstrapping. Avoiding AST tricky, languages can be designed to help. So-called "single-pass" compilation, emit code line-at-a-time, while reading. Likely means no optimization aside from peephole.

在某些场景下,构建抽象语法树或中间表示属于过度设计,例如面向小型硬件、开发预算有限、仅适配单一目标架构、编译器自举阶段。规避抽象语法树的实现有一定难度,可通过语言设计来适配,即采用所谓的"单遍编译"模式:边读取代码边逐行生成目标代码。这种模式下,编译器通常仅能实现窥孔优化,无其他高级优化能力。

Specimen #18 Turbo Pascal

实例 18 Turbo Pascal

14k instructions including editor. x86 assembly. 39kb on disk. Famous early personal-micro compiler. Single-pass, no AST or IR. Single target. Proprietary ($65) so I don't have source. Here's an ad! 1983-1992; lineage continues into modern Delphi compiler.

包含编辑器在内仅 1.4 万条指令,基于 x86 汇编开发,磁盘占用仅 39KB。是早期知名的个人微型计算机编译器,采用单遍编译模式,无抽象语法树和中间表示,仅适配单一目标架构。为商业闭源软件(售价 65 美元),因此无源码可供展示,以下是其产品广告!1983-1992 年开发,其技术脉络延续至现代 Delphi 编译器。
广告标语:They said it couldn't be done. Borland Did It. Turbo Pascal 3.0

Specimen #19 Manx Aztec C

实例 19 Manx Aztec C

21k instructions, 50kb on disk. Contemporary to Turbo Pascal, one of many competitors. Unclear if AST or not, no source. Probably no IR. Multi-target, Z80 and 8080. 1980-1990s, small team.

共 2.1 万条指令,磁盘占用 50KB,与 Turbo Pascal 同期开发,是其众多竞品之一。目前无法确认是否实现抽象语法树,无源码可供展示,大概率未设计中间表示。支持多目标架构,适配 Z80 和 8080。20 世纪 80-90 年代由小型开发团队打造。
产品标识:AZTEC C-'C' PROGRAM DEVELOPMENT SYSTEM PORTABLE SOFTWARE APPLE CP/M IBM

Specimen #20 Not just the past: 8cc

实例 20 并非过往:8cc

6,740 lines of C, self-hosting, compiles to ~110kb via clang, 220kb via self. Don't have to use assembly to get this small! Quite readable and simple. Works. Single target, AST but no IR, few diagnostics. 2012-2016, mostly one developer.

仅 6740 行 C 语言代码,支持自举,通过 Clang 编译后体积约 110KB,自编译后体积约 220KB。证明了无需汇编语言也能开发出超轻量级编译器!代码可读性强、结构简洁,可正常运行。仅适配单一目标架构,实现了抽象语法树但无中间表示,诊断信息较为简略。2012-2016 年主要由一位开发者独立开发。

c 复制代码
static void emit_binop_int_arith(Node *node) {
    SAVE;
    char *op = NULL;
    switch (node->kind) {
        case '+': op = "add"; break;
        case '-': op = "sub"; break;
        case '*': op = "imul"; break;
        case '^': op = "xor"; break;
        case OP_SAL: op = "sal"; break;
        case OP_SAR: op = "sar"; break;
        case OP_SHR: op = "shr"; break;
        case '/': case '%': break;
        default: error("invalid operator '%d'", node->kind);
    }
    emit_expr(node->left);
    push("rax");
    emit_expr(node->right);
    emit("mov #rax, #rcx");
    pop("rax");
    if (node->kind == '/' || node->kind == '%') {
        if (node->ty->usig) {
            emit("xor #edx, #edx");
            emit("div #rcx");
        } else {
            emit("cqto");
            emit("idiv #rcx");
        }
        if (node->kind == '%')
            emit("mov #edx, #eax");
    } else if (node->kind == OP_SAL || node->kind == OP_SAR || node->kind == OP_SHR) {
        emit("mov #cl, #%s", get_int_reg(node->left->ty, 'c'));
        emit("%s #cl, #rax", op);
    } else {
        emit("%s #rcx, #rax", op);
    }
}

Grand Finale

压轴实例

Specimen #21 JonesForth

实例 21 JonesForth

692 instruction VM, 1,490 lines Forth for compiler, REPL, debugger, etc. • Educational implementation of Forth. • Forth, like Lisp, is nearly VM code at input (postfix not prefix). Minimal partial-compiler turns user "words" into chains of indirect-jumps. Machine-code primitive words. Interactive system with quote, eval, control flow, exceptions, debug inspector. Pretty high expressivity! • 2009, one developer.

虚拟机仅 692 条指令,编译器、交互式解释器、调试器等模块共 1490 行 Forth 代码。• 是 Forth 语言的教学型实现。• 与 Lisp 类似,Forth 的输入语法近乎虚拟机代码(采用后缀表示而非前缀表示)。轻量级部分编译器将用户定义的"单词"转换为间接跳转链,基础"单词"由机器码实现。打造了交互式系统,支持引用、求值、控制流、异常处理、调试检查,表达能力极强!• 2009 年由一位开发者独立开发。

forth 复制代码
\ IF is an IMMEDIATE word which compiles 0BRANCH followed by a dummy offset,
\ and places the address of the 0BRANCH on the stack. Later when we see THEN,
\ we pop that address off the stack, calculate the offset, and back-fill the offset.
: IF IMMEDIATE
  ' 0BRANCH ,  \ compile 0BRANCH
  HERE @       \ save location of the offset on the stack
  0 ,          \ compile a dummy offset
;

: THEN IMMEDIATE
  HERE @ SWAP -  \ calculate the offset from the address saved on the stack
  SWAP !         \ store the offset in the back-filled location
;

: ELSE IMMEDIATE
  ' BRANCH ,     \ definite branch to just over the false-part
  HERE @         \ save location of the offset on the stack
  0 ,            \ compile a dummy offset
  SWAP           \ now back-fill the original (IF) offset
  HERE @ SWAP -
  SWAP !
;

Coda

结语

There have been a lot of languages http://hopl.info catalogues 8,945 programming languages from the 18th century to the present.

编程语言的数量浩如烟海,http://hopl.info 网站收录了从 18 世纪至今的 8945 种编程语言。

Go study them: past and present!

去研究它们吧,无论是过去的经典还是当下的新潮!

Many compilers possible!

编译器的实现形式千变万化!

Pick a future you like!

选择你所向往的未来,投身其中!

The End!

分享结束!

(I also probably ought to mention that due to using some CC BY-SA pictures, this talk is licensed CC BY-SA 4.0 international)

另需说明:因本次分享使用了部分知识共享-署名-相同方式共享协议的图片,本分享整体采用 CC BY-SA 4.0 国际协议授权。

reference

相关推荐
穷人小水滴4 天前
LLVM IR 入门: 使用 LLVM 编译到 WebAssembly
webassembly·编译器·llvm
同志啊为人民服务!23 天前
RS485通信,无法进入中断处理程序,问题分析过程
单片机·编译器·rs485·中断处理程序
杨杨杨大侠1 个月前
深入理解 LLVM:从编译器原理到 JIT 实战
java·jvm·编译器
superman超哥1 个月前
Rust 生命周期省略规则:编译器的智能推导机制
开发语言·后端·rust·编译器·rust生命周期·省略规则·智能推导
切糕师学AI2 个月前
GCC 和 LLVM 各自的优缺点
编译器·链接器·汇编器
切糕师学AI2 个月前
GCC是什么?
编译器·gcc
驱动探索者2 个月前
[缩略语大全]之[编译器]篇
计算机·状态模式·飞书·编译器
闲人编程3 个月前
CPython与PyPy性能对比:不同解释器的优劣分析
python·算法·编译器·jit·cpython·codecapsule