21 Compilers and 3 Orders of Magnitude in 60 Minutes
21 种编译器,60 分钟跨越三个数量级
A wander through a weird landscape to the heart of compilation Spring 2019
漫游编译器的奇妙领域,直击编译技术 2019 年春季
Hello! I am someone who has worked (for pay!) on some compilers: rustc, swiftc, gcc, clang, llvm, tracemonkey, etc. Ron asked if I could talk about compiler stuff I know, give perspective on the field a bit. This is for students who already know roughly how to write compilers, not going to cover that!
I like compilers! Relationship akin to "child with many toy dinosaurs". Some are bigger and scarier. We will look at them first. • Some are weird and wonderful. We will visit them along the way. Some are really tiny!
I expect gap between class projects and industrial compilers is overwhelming. Want to explore space between, demystify and make more design choices clear. Reduce terror, spark curiosity, encourage trying it as career! If I can compiler, so can you!
Describe a few of the giants. Talk a bit about what makes them so huge & complex. Wander through the wilderness (including history) looking for ways compilers can vary, and examining specimens. Also just point out stuff I think is cool / underappreciated.
I'm not a teacher or very good at giving talks. • Lots of material, not ideal to stop for questions unless you're absolutely lost. Gotta keep pace! But: time at end for questions and/or email followup. Happy to return to things you're curious about. Slides are numbered! Jot down any you want to ask about.
~2m lines of C++: 800k lines clang plus 1.2m LLVM. Self hosting, bootstrapped from GCC. C-language family (C, C++, ObjC), multi-target (23). Single AST + LLVM IR. 2007-now, large multi-org team. • Good diagnostics, fast code. Originally Apple, more permissively licensed than GCC.
LValue CodeGenFunction::EmitLValue(const Expr *E) {
ApplyDebugLocation DL(*this, E);
switch (E->getStmtClass()) {
default: return EmitUnsupportedLValue(E, "l-value expression");
case Expr::ObjCPropertyRefExprClass: llvm_unreachable("cannot emit a property reference directly");
case Expr::ObjCSelectorExprClass: return EmitObjCSelectorLValue(cast<ObjCSelectorExpr>(E));
case Expr::ObjCIsaExprClass: return EmitObjCIsaExpr(cast<ObjCIsaExpr>(E));
case Expr::BinaryOperatorClass: return EmitBinaryOperatorLValue(cast<BinaryOperator>(E));
case Expr::CompoundAssignOperatorClass: {
QualType Ty = E->getType();
if (const AtomicType *AT = Ty->getAs<AtomicType>())
Ty = AT->getValueType();
if (!Ty->isAnyComplexType())
return EmitCompoundAssignmentLValue(cast<CompoundAssignOperator>(E));
return EmitComplexCompoundAssignmentLValue(cast<CompoundAssignOperator>(E));
}
case Expr::CallExprClass:
case Expr::CXXMemberCallExprClass:
case Expr::CXXOperatorCallExprClass:
case Expr::UserDefinedLiteralClass:
return EmitCallExprLValue(cast<CallExpr>(E));
}
}
Specimen #2 Swiftc
实例 2 Swiftc
~530k lines of C++ plus 2m lines clang and LLVM. Many same authors. Not self-hosting. Newer app-dev language. Tightly integrated with clang, interop with C/ObjC libraries. • Extra SIL IR for optimizations. Multi-target, via LLVM. 2014-now, mostly Apple.
RValue RValueEmitter::visitIfExpr(IfExpr *E, SGFContext C) {
auto &lowering = SGF.getTypeLowering(E->getType());
if (lowering.isLoadable() || !SGF.silConv.useLoweredAddresses()) {
// If the result is loadable, emit each branch and forward its result into the destination block argument.
Condition cond = SGF.emitCondition(E->getCondExpr(), /*invertCondition*/ false, SGF.getLoweredType(E->getType()), NumTrueTaken, NumFalseTaken);
cond.enterTrue(SGF);
SILValue trueValue;
auto TE = E->getThenExpr();
FullExpr trueScope(SGF.Cleanups, CleanupLocation(TE));
trueValue = visit(TE).forwardAsSingleValue(SGF, TE);
cond.exitTrue(SGF, trueValue);
SILValue falseValue;
cond.enterFalse(SGF);
auto EE = E->getElseExpr();
FullExpr falseScope(SGF.Cleanups, CleanupLocation(EE));
falseValue = visit(EE).forwardAsSingleValue(SGF, EE);
cond.exitFalse(SGF, falseValue);
}
}
Specimen #3 Rustc
实例 3 Rustc
~360k lines of Rust, plus 1.2m lines LLVM. Self-hosting, bootstrapped from OCaml. Newer systems language. • Two extra IRs (HIR, MIR). Multi-target, via LLVM. 2010-now, large multi-org team. Originally mostly Mozilla. And yes I did a lot of the initial bring-up so my name is attached to it forever; glad it worked out!
Notice the last 3 languages all end in LLVM. "Low Level Virtual Machine" Strongly typed IR, serialization format, library of optimizations, lowerings to many target architectures. "One-stop-shop" for compiler backends. 2003-now, UIUC at first, many industrial contributors now. Longstanding dream of compiler engineering world, possibly most successful attempt at it yet. Here is a funny diagram of modern compilers from Andi McClure K r Lex Tokens Parse Reaciros Pasing them.
~2.2m lines of mostly C, C++. 600k lines Ada. Self-hosting, bootstrapped from other C compilers. Multi-language (C, C++, ObjC, Ada, D, Go, Fortran), multi-target (21). • Language & target-agnostic TREE AST and RTL IR. Challenging to work on. 1987-present, large multi-org team. • Generates quite fast code. • Originally political project to free software from proprietary vendors. Licensed somewhat protectively.
约 220 万行代码,主要为 C、C++语言编写,还有 60 万行 Ada 语言代码。支持自举,基于其他 C 语言编译器完成首次编译。多语言支持(C、C++、Objective-C、Ada、D、Go、Fortran),21 个编译目标平台。• 采用与语言、目标平台无关的 TREE 抽象语法树和 RTL 中间表示,开发维护难度较高。1987 年至今,由跨组织的大型开发团队维护。• 生成的代码执行效率极高。• 最初是一场带有理念的项目,旨在让软件摆脱专有厂商的控制,协议授权的保护性较强。
Compilers get big because the development costs are seen as justified by the benefits, at least to the people paying the bills. Developer productivity: highly expressive languages, extensive diagnostics, IDE integration, legacy interop. • Every drop of runtime performance: shipping on billions of devices or gigantic multi-warehouse fleets. Covering & exploiting all the hardware: someone makes a new chip, they pay for an industrial compiler to make use of it. Writing compilers in verbose languages: for all the usual reasons (compatibility, performance, familiarity).
This is ok! The costs and benefits are context dependent. Different contexts, weightings: different compilers. • Remainder of talk will be exploring those differences. Always remember: balancing cost tradeoffs by context. Totally biased subset of systems: stuff I think is interesting and worth knowing, might give hope / inspire curiosity.
In some contexts, "all the optimizations" is too much. Too slow to compile, too much memory, too much development / maintenance effort, too inflexible. Common in JITs, or languages with lots of indirection anyways (dynamic dispatch, pointer chasing): optimizer can't do too well anyways.
"Compiler Advances Double Computing Power Every 18 Years" Sarcastic joke / real comparison to Moore's law: hardware doubles power every 18 months. Swamps compilers. Empirical observation though! Optimizations seem to only win ~3-5x, after 60+ years of work. Less-true as language gains more abstractions to eliminate (i.e. specialize / de-virtualize). More true if lower-level.
The results of our experiment suggest that Proebsting's Law is probably true. The reality is somewhat grimmer than Proebsting initially supposed. Research in optimizing compilers has been ongoing since1955.The compiler technology developed over this 45-year period is able to improve the performance of integer intensive programs by a factor of 3.3. This corresponds to uniform performance improvements of about 2.8% per year. Even if we assume that the beginning of useful compiler optimization research began in the mid 1960's ,the uniform performance improvement on integer intensive codes due to compiler optimization is still only 3.6% per year. This lies in stark contrast to the 60% per year performance improvements we can expect from hardware due to Moore's Law. The performance difference between optimized and unoptimized programs is larger for the floating-point intensive codes in SPECfp95.This indicates that compiler research has had a larger effect on improving the performance of scientific codes than on improving the performance of ordinary, integer intensive applications. Again, if we assume compiler research has been ongoing since 1955, we get a doubling of performance every 16 years. This corresponds to uniform performance improvements of about 4.9% per year over this 45 year period. This is only slightly better than the results for integer intensive programs. Scott, Kevin. On Proebsting's Law. 2001
1971: "A Catalogue of Optimizing Transformations". The ~8 passes to write if you're going to bother. Inline, Unroll (& Vectorize), CSE, DCE, Code Motion, Constant Fold, Peephole. That's it. You're welcome. Many compilers just do those, get ~80% best-case perf. https://commons.wikimedia.org/wiki/File:Allen_mg_2528-3750K-b.jpg - CC BY-SA 2.0
660k lines C++ including backends. Not self-hosting. • JavaScript compiler in Chrome, Node. • Multi-target (7), multi-tier JIT. Optimizations mix of classical stuff and dynamic language stuff from Smalltalk. Multiple generations of optimization and IRs. Always adjusting for sweet spot of runtime perf vs. compile time, memory, maintenance cost, etc. Recently added slower (non-JIT) interpreter tier, removed others. 2008-present, mostly Google, open source.
// Shared routine for word comparison against zero.
void InstructionSelector::VisitWordCompareZero(Node* user, Node* value, FlagsContinuation* cont) {
// Try to combine with comparisons against 0 by simply inverting the branch.
while (value->opcode() == IrOpcode::kWord32Equal && CanCover(user, value)) {
Int32BinopMatcher m(value);
if (CanCover(user, value)) {
user = value;
value = m.left().node();
cont->Negate();
if(!m.right().Is(0)) break;
switch (value->opcode()){
case IrOpcode::kWord32Equal:
cont->OverwriteAndNegateIfEqual(kEqual);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kInt32LessThan:
cont->OverwriteAndNegateIfEqual(kSignedLessThan);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kInt32LessThanOrEqual:
cont->OverwriteAndNegateIfEqual(kSignedLessThanOrEqual);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kUint32LessThan:
cont->OverwriteAndNegateIfEqual(kUnsignedLessThan);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kUint32LessThanOrEqual:
cont->OverwriteAndNegateIfEqual(kUnsignedLessThanOrEqual);
return VisitWordCompare(this, value, kX64Cmp32, cont);
}
}
}
}
Variation #2 Compiler-friendly Implementation (and Input) Languages
形态 2 适配编译器开发的实现语言(及输入语言) Note : your textbook has 3 implementation flavours. Java, C, ML. No coincidence. ML designed as implementation language for symbolic logic (expression-tree wrangling) system: LCF (1972). LCF written in Lisp. Lisp also designed as implementation language for symbolic logic system: Advice Taker (1959). Various family members: Haskell, OCaml, Scheme, Racket. All really good at defining and manipulating trees. ASTs, types, IRs, etc. Usually make for much smaller/simpler compilers. 注意:大家的教材中会介绍三种编译器实现语言------Java、C、ML,这并非巧合。ML 语言最初是为符号逻辑(表达式树处理)系统 LCF(1972 年)设计的实现语言,而 LCF 本身是用 Lisp 语言编写的。Lisp 语言同样是为符号逻辑系统 Advice Taker(1959 年)设计的实现语言。这类语言的衍生版本有很多:Haskell、OCaml、Scheme、Racket。它们都擅长定义和处理树形结构,比如抽象语法树、类型、中间表示等,基于这类语言开发的编译器,通常体积更小、结构更简洁。
Specimen #6 Glasgow Haskell Compiler (GHC)
实例 6 格拉斯哥 Haskell 编译器(GHC)
180k lines Haskell, selfhosting, bootstrapped from Chalmers Lazy ML compiler. • Pure-functional language, very advanced type-system. Several tidy IRs after AST: Core, STG, CMM. Custom backends. 1991-now, initially academic researchers, lately Microsoft after they were hired there.
约 18 万行 Haskell 代码,支持自举,基于查尔默斯惰性 ML 编译器完成首次编译。• 为纯函数式编程语言打造,拥有极为先进的类型系统。在抽象语法树后设计了多款简洁的中间表示:Core、STG、CMM,配备自定义的编译器后端。1991 年至今,最初由学术研究者开发,后来团队被微软聘用,目前由微软维护。
haskell复制代码
stmtToInstrs :: CmmNode e x -> NatM InstrBlock
stmtToInstrs stmt = do
dflags <- getDynFlags
is32Bit <- is32BitPlatform
case stmt of
CmmComment s -> return (unitOL (COMMENT s))
CmmTick {} -> return nilOL
CmmUnwind regs -> do
let to_unwind_entry :: (GlobalReg, Maybe CmmExpr) -> UnwindTable
to_unwind_entry (reg, expr) = M.singleton reg (fmap toUnwindExpr expr)
tbl = foldMap to_unwind_entry regs
if M.null tbl
then return nilOL
else do
lbl <- mkAsmTempLabel <$> getUniqueM
return $ unitOL $ UNWIND lbl tbl
CmmAssign reg src -> do
let ty = cmmRegType dflags reg
format = cmmTypeFormat ty
if isFloatType ty
then assignReg_FltCode format reg src
else if is32Bit && isWord64 ty
then assignReg_I64Code reg src
else assignReg_IntCode format reg src
CmmStore addr src -> do
let ty = cmmExprType dflags src
format = cmmTypeFormat ty
if isFloatType ty
then assignMem_FltCode format addr src
else if is32Bit && isWord64 ty
then assignMem_I64Code addr src
else assignMem_IntCode format addr src
_ -> return nilOL
where
isFloatType ty = cmmTypeIsFloating ty
Specimen #7 Chez Scheme
实例 7 Chez Scheme
87k lines Scheme (a Lisp), selfhosting, bootstrapped from CScheme. 4 targets, good performance, incremental compilation. Written on "nanopass framework" for compilers with many similar IRs. Chez has 27 different IRs! 1984-now, academic-industrial, mostly single developer. Getting down to the size-range where a compiler is small enough to be that.
(define asm-size
(lambda (x)
(case (car x)
[(asm) 0]
[(byte) 1]
[(word) 2]
[else 4])))
(define asm-move
(lambda (code* dest src)
(Trivit (dest src)
(record-case src
[(imm) (n)
(if (and (eqv? n 0) (record-case dest [(reg) r #t] [else #f]))
(emit xor dest dest code*)
(emit movi src dest code*))]
[(literal) stuff (emit movi src dest code*)]
[else (emit mov src dest code*)]))))
(define-who asm-move/extend
(lambda (op)
(lambda (code* dest src)
(Trivit (dest src)
(case op
[(sext8) (emit movsb src dest code*)]
[(zext8) (emit movzb src dest code*)]
[(sext16) (emit movsw src dest code*)]
[(zext16) (emit movzw src dest code*)]
[else (sorry! who "unexpected op ~s" op)])))))
Specimen #8 Poly/ML
实例 8 Poly/ML
44k lines SML, self-hosting. Single machine target (plus byte-code), AST + IR, classical optimizations. Textbook style. Standard platform for symbolic logic packages Isabelle and HOL. 1986-now, academic, mostly single developer.
约 4.4 万行标准 ML 代码,支持自举。仅支持一个原生机器目标平台(外加字节码平台),采用抽象语法树+单一中间表示的架构,实现经典的编译器优化策略,是教科书式的编译器实现。是符号逻辑工具包 Isabelle 和 HOL 的标准运行平台。1986 年至今,由学术研究者开发,主要由一位开发者维护。
sml复制代码
fun cgOp (PushToStack(RegisterArg reg)) =
let
val (rc, rx) = getReg reg
in
opCodeBytes (PUSH_R rc, if rx then SOME{w=false, b=true, x=false, r=false} else NONE)
end
| cgOp (PushToStack (MemoryArg{base, offset, index})) =
opAddressPlus2(Group5, LargeInt.fromInt offset, base, index, 0w6)
| cgOp (PushToStack(NonAddressConstArg constnt)) =
if is8BitL constnt
then opCodeBytes (PUSH_8, NONE) @ [Word8.fromLargeInt constnt]
else if is32bit constnt
then opCodeBytes (PUSH_32, NONE) @ int32Signed constnt
else error "constant too large"
| cgOp (PushToStack(AddressConstArg _)) =
case targetArch of
Native64Bit =>
let
val opb = opCodeBytes(Group5, NONE)
val mdrm = modrm(Based0, 0w6 (* push *), 0w5 (* PC rel *))
in
opb @ [mdrm] @ int32Signed(tag 0)
end
| Native32Bit => opCodeBytes(PUSH_32, NONE) @ int32Signed(tag 0)
| ObjectId32Bit => opCodeBytes(PUSH_32, NONE) @ int32Signed(tag 0)
Specimen #9 CakeML
实例 9 CakeML
58k lines SML, 5 targets, selfhosting. 9 IRs, many simplifying passes. 160k lines HOL proofs: verified! • Language semantics proven to be preserved through compilation!!! Cannot emphasize enough. This was science fiction when I was young. CompCert first serious one, now several. 2012-now, deeply academic.
约 5.8 万行标准 ML 代码,支持 5 个编译目标平台,支持自举。拥有 9 种中间表示,设计了多款代码简化处理流程。配套 16 万行 HOL 定理证明代码,是经过形式化验证的编译器!• 经证明,编译过程中编程语言的语义完全保留!这一点再怎么强调都不为过,在我年轻时,这简直是科幻般的存在。CompCert 是首款成熟的形式化验证编译器,如今这类编译器已有多款。2012 年至今,属于纯学术研究型编译器。
Notice Lisp / ML code looks a bit like grammar productions: recursive branching tree-shaped type definitions, pattern matching. There's a language lineage that took that idea ("programs as grammars") to its logical conclusion: metacompilers (a.k.a. "compiler-compilers"). Ultimate in "compiler-friendly" implementation languages. More or less: parser glued to an "un-parser". Many times half a metacompiler lurks in more-normal compilers: • YACCs ("yet another compiler-compiler"): parser-generators BURGs ("bottom-up rewrite generators"): code-emitter-generators See also: GCC ".md" files, LLVM TableGen. Common pattern!
Stanford Research Institute - Augmentation Research Lab. US Air Force R&D project. Very famous for its NLS ("oNLine System"). History of that project too big to tell here. Highly influential in forms of computer-human interaction, hypertext, collaboration, visualization. Less well-known is their language tech: TREE-META and MPS/MPL.
SRI-ARC 18-AUG-72 10:02 SRI-ARC 8 JUNE 1972 10575 Highlights of1971 Summary 10575 At present the primary language systems developed and in use at ARC are the Tree-meta Compiler-compiler Systen and the Ll0 Programning language system wnich was written in Tree-Meta. 3cle5a Work is currentiy progressing on a Modular Programming System (MPS) in collaboration with a group at the Xerox Palo Alto Researen Center. 3cle5b
• 184 lines of TREE-META. Bootstrapped from META-II. • In the Schorre metacompiler family (META, META-II) SRI-ARC, 1967. Made to support language tools in the NLS project. "Syntax-directed translation": parse input to trees, un-parse to machine code. Only guided by grammars. Hard to provide diagnostics, typechecking, optimization, really anything other than straight translations. But: extremely small, simple compliers. Couple pages. Ideal for bootstrap phase.
42k lines of Mesa (bootstrapped from MPL, itself from TREE-META). One of my favourite languages! Strongly typed, modules with separate compilation and type checked linking. Highly influential (Modula, Java). Co-designed language, OS, and byte-code VM implemented in CPU microcode, adapted to compiler. Xerox PARC, 1976-1981, small team left SRI-ARC, took MPL.
Mesa and Xerox PARC is a nice segue into next few points: all involve compilers interacting with interpreters. Interpreters & compilers actually have a long relationship! • In fact interpreters predate compilers. Let us travel back in time to the beginning, to illustrate!
1940s: First digital computers. Before: fixed-function machines and/or humans (largely women) doing job called "computer". Computing power literally measured in "kilo-girls" and "kilo-girl-hours".
1945: ENIAC built for US Army, Ordnance Corps. Artillery calculations in WWII. "Programmers" drawn from "computer" staff, all women. "Programming" meant physically rewiring per task.
1948: Jean Bartik leads team to convert ENIAC to "stored programs", instructions (called "orders") held in memory. Interpreted by hardware. Faster to reconfigure than rewiring; but ran slower. Subroutine concept developed for factoring stored programs.
substitute operators and parentheses. substitute variables group into 12-byte words.
Note multiplication is represented by juxtaposition.
替换运算符和括号、替换变量,组合为 12 字节的指令字。
注意:乘法通过字符并列表示。
Specimen #12 A-0: The First Compiler
实例 12 A-0:首款编译器
Reads interpreter-like pseudocodes, then emits "compilation" program with all codes resolved to their subroutines. Result runs almost as fast as manually coded; but as easy to write-for as interpreter. An interpreter "fast mode". Rationale all about balancing time tradeoffs (coding-time, compiler execution-time, run-time). 1951, Grace Hopper, Univac
Balance between interpretation and compilation is context dependent too!
解释和编译的平衡同样依赖于具体场景
Variation #4 Only Compile from Frontend to IR, Interpret Residual VM Code
形态 4 仅将前端代码编译为中间表示,解释执行剩余的虚拟机代码
Can stop before real machine code. Emit IR == "virtual machine" code. • Can further compile or just interpret that VM code. • Residual VM interpreter has several real advantages: • Easier to port to new hardware, or bootstrap compiler. "Just get something running". • Fast compilation & program startup, keeps interactive user engaged. Simply easier to write, less labor. Focus your time on frontend semantics. As a cheap implementation device: bytecode interpreters offer 1/4 of the performance of optimizing native-code compilers, at 1/20 of the implementation cost. https://xavierleroy.org/talks/zam-kazam05.pdf
350k lines C#, 320k lines VB. Self-hosting, bootstrapped off previous gen. Multi-language framework (C#, VB.NET). Rich semantics, good diagnostics, IDE integration. Lowers from AST to CIL IR. Separate CLR project interprets or compiles IR. 2011-now, Microsoft, OSS.
private void EmitBinaryOperatorInstruction (BoundBinaryOperator expression)
{
switch (expression.OperatorKind.Operator())
{
case BinaryOperatorKind.Multiplication:
_builder.EmitOpCode(ILOpCode.Mul);
break;
case BinaryOperatorKind.Addition:
_builder.EmitOpCode(ILOpCode.Add);
break;
case BinaryOperatorKind.Subtraction:
_builder.EmitOpCode(ILOpCode.Sub);
break;
case BinaryOperatorKind.Division:
if (IsUnsignedBinaryOperator(expression))
{
_builder.EmitOpCode(ILOpCode.Div_un);
}
else
{
_builder.EmitOpCode(ILOpCode.Div);
}
break;
}
}
Specimen #14 Eclipse Compiler for Java (ECJ)
实例 14 Eclipse Java 编译器(ECJ)
146k lines Java, self-hosting, bootstrapped off Javac. In Eclipse! Also in many Java products (eg. IntelliJ IDEA). Rich semantics, good diagnostics, IDE integration. Lowers from AST to JVM IR. Separate JVM projects interpret or compile IR. 2001-now, IBM, OSS.
Variation #5 Only Compile Some Functions, Interpret the Rest
形态 5 仅编译部分函数,解释执行其余代码
Cost of interpreter only bad at inner loops or fine-grain. Outer loops or coarse-grain (eg. function calls) similar to virtual dispatch. Design option: interpret by default, selectively compile hot functions ("fast mode") at coarse grain. Best of both worlds! • Keep interpreter-speed immediate feedback to user. • Interpreter may be low-effort, portable, can bootstrap. • Defer effort on compiler until needed. Anything hard to compile, just call back to interpreter.
54k line VM interpreter and 18k line JIT: C code generated from Smalltalk metaprograms. Bootstrapped from Squeak. Smalltalk is what you'll actually hear people mention coming from Xerox PARC. • Very simple language. "Syntax fits on a postcard". Easy to interpret. • Complete GUI, IDE, powerful tools. • Standard Smalltalk style: interpret by default, JIT for "fast mode". Compiler bootstraps-from and calls-into VM whenever convenient. Targets ARM, x86, x64, MIPS. 2008-now, academic-industrial consortium.
20k line C interpreter, 7,752 line Lisp compiler. Older command-line system, standard Unix Lisp for years. • Like Smalltalk: very simple language. Actually an AST/IR that escaped from the lab. Easy to interpret. Frequent Lisp style: interpret by default; compile for "fast mode". Compiler bootstraps-from and callsinto interpreter whenever convenient. Targets m68k and VAX. 1978-1988, UC Berkeley.
;--- e-move :: move value from one place to another
; this corresponds to d-move except the args are EIADRS
(defun e-move (from to)
(if (and (dtpr from) (eq '$ (car from)) (eq 0 (cadr from)))
(e-write2 'clrl to)
(e-write3 'movl from to)))
;--- d-move :: emit instructions to move value from one place to another
(defun d-move (from to)
(makecomment `(from ,(e-uncvt from) to ,(e-uncvt to)))
#+(or for-vax for-tahoe)
(cond ((eq 'Nil from) (e-move '($ 0) (e-cvt to)))
(t (e-move (e-cvt from) (e-cvt to))))
#+for-68k
(let ((froma (e-cvt from)) (toa (e-cvt to)))
(if (and (dtpr froma) (eq '$ (car froma))
(and (> (cadr froma) -1) (< (cadr froma) 65))
(atom toa) (eq 'd (nthchar toa 1)))
; it's a mov #immed,Dn, where 0 <= immed <= 64
(e-write3 'moveq froma toa)
(cond ((eq 'Nil froma) (e-write3 'movl '#.nil-reg toa))
(t (e-write3 'movl froma toa))))))
Variation #6 Partial Evaluation Tricks
形态 6 部分求值技巧
Consider program in terms of parts that are static (will not change anymore) or dynamic (may change). Partial evaluator (a.k.a. "specializer") runs the parts that depend only on static info, emits residual program that only depends on dynamic info. Note: interpreter takes two inputs: program to interpret, and program's own input. First is static, but redundantly treated as dynamic. So: compiling is like partially evaluating an interpreter, eliminating the redundant dynamic treatment in its first input.
Famous work relating programs P, interpreters I, partial evaluators E, and compilers C. The so-called "Futamura Projections":
这是一项关联了程序 P、解释器 I、部分求值器 E 和编译器 C 的经典研究,即所谓的"二村投影":
E(I,P) → partially evaluate I§ → emit C§, a compiled program
对解释器 I 和程序 P 执行部分求值 E(I,P),得到针对 P 的编译程序 C§
E(E,I) → partially evaluate λP.I§ → emit C, a compiler!
对部分求值器 E 和解释器 I 执行部分求值 E(E,I),得到通用编译器 C!
E(E,E) → partially evaluate λI.λP.I§ → emit a compiler-compiler!
对部分求值器 E 自身执行两次部分求值 E(E,E),得到编译器的编译器!
Futamura, Yoshihiko, 1971. Partial Evaluation of Computation Process- An Approach to a Compiler-Compiler. http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.10.2747
Formal strategy for building compilers from interpreters and specializers.
该理论由吉彦二村于 1971 年提出,出自论文《计算过程的部分求值------一种构建编译器的编译器的方法》。这是一种基于解释器和特化器构建编译器的规范化策略。
Specimen #17 Truffle/Graal
实例 17 Truffle/Graal
240k lines of Java for Graal (VM); 90k lines for Truffle (interpreter-writing framework). Actual real system based on first Futamura Projection. Seriously competitive! Potential future Oracle JVM. • Multi-language (JavaScript, Python, Ruby, R, JVM byte code, LLVM bitcode) multi-target (3). "Write an interpreter with some machinery to help the partial evaluator, get a compiler for free". Originally academic, now Oracle.
In some contexts, even building an AST or IR is overkill. Small hardware, tight budget, one target, bootstrapping. Avoiding AST tricky, languages can be designed to help. So-called "single-pass" compilation, emit code line-at-a-time, while reading. Likely means no optimization aside from peephole.
14k instructions including editor. x86 assembly. 39kb on disk. Famous early personal-micro compiler. Single-pass, no AST or IR. Single target. Proprietary ($65) so I don't have source. Here's an ad! 1983-1992; lineage continues into modern Delphi compiler.
包含编辑器在内仅 1.4 万条指令,基于 x86 汇编开发,磁盘占用仅 39KB。是早期知名的个人微型计算机编译器,采用单遍编译模式,无抽象语法树和中间表示,仅适配单一目标架构。为商业闭源软件(售价 65 美元),因此无源码可供展示,以下是其产品广告!1983-1992 年开发,其技术脉络延续至现代 Delphi 编译器。 广告标语:They said it couldn't be done. Borland Did It. Turbo Pascal 3.0
Specimen #19 Manx Aztec C
实例 19 Manx Aztec C
21k instructions, 50kb on disk. Contemporary to Turbo Pascal, one of many competitors. Unclear if AST or not, no source. Probably no IR. Multi-target, Z80 and 8080. 1980-1990s, small team.
共 2.1 万条指令,磁盘占用 50KB,与 Turbo Pascal 同期开发,是其众多竞品之一。目前无法确认是否实现抽象语法树,无源码可供展示,大概率未设计中间表示。支持多目标架构,适配 Z80 和 8080。20 世纪 80-90 年代由小型开发团队打造。 产品标识:AZTEC C-'C' PROGRAM DEVELOPMENT SYSTEM PORTABLE SOFTWARE APPLE CP/M IBM
Specimen #20 Not just the past: 8cc
实例 20 并非过往:8cc
6,740 lines of C, self-hosting, compiles to ~110kb via clang, 220kb via self. Don't have to use assembly to get this small! Quite readable and simple. Works. Single target, AST but no IR, few diagnostics. 2012-2016, mostly one developer.
仅 6740 行 C 语言代码,支持自举,通过 Clang 编译后体积约 110KB,自编译后体积约 220KB。证明了无需汇编语言也能开发出超轻量级编译器!代码可读性强、结构简洁,可正常运行。仅适配单一目标架构,实现了抽象语法树但无中间表示,诊断信息较为简略。2012-2016 年主要由一位开发者独立开发。
c复制代码
static void emit_binop_int_arith(Node *node) {
SAVE;
char *op = NULL;
switch (node->kind) {
case '+': op = "add"; break;
case '-': op = "sub"; break;
case '*': op = "imul"; break;
case '^': op = "xor"; break;
case OP_SAL: op = "sal"; break;
case OP_SAR: op = "sar"; break;
case OP_SHR: op = "shr"; break;
case '/': case '%': break;
default: error("invalid operator '%d'", node->kind);
}
emit_expr(node->left);
push("rax");
emit_expr(node->right);
emit("mov #rax, #rcx");
pop("rax");
if (node->kind == '/' || node->kind == '%') {
if (node->ty->usig) {
emit("xor #edx, #edx");
emit("div #rcx");
} else {
emit("cqto");
emit("idiv #rcx");
}
if (node->kind == '%')
emit("mov #edx, #eax");
} else if (node->kind == OP_SAL || node->kind == OP_SAR || node->kind == OP_SHR) {
emit("mov #cl, #%s", get_int_reg(node->left->ty, 'c'));
emit("%s #cl, #rax", op);
} else {
emit("%s #rcx, #rax", op);
}
}
Grand Finale
压轴实例
Specimen #21 JonesForth
实例 21 JonesForth
692 instruction VM, 1,490 lines Forth for compiler, REPL, debugger, etc. • Educational implementation of Forth. • Forth, like Lisp, is nearly VM code at input (postfix not prefix). Minimal partial-compiler turns user "words" into chains of indirect-jumps. Machine-code primitive words. Interactive system with quote, eval, control flow, exceptions, debug inspector. Pretty high expressivity! • 2009, one developer.
\ IF is an IMMEDIATE word which compiles 0BRANCH followed by a dummy offset,
\ and places the address of the 0BRANCH on the stack. Later when we see THEN,
\ we pop that address off the stack, calculate the offset, and back-fill the offset.
: IF IMMEDIATE
' 0BRANCH , \ compile 0BRANCH
HERE @ \ save location of the offset on the stack
0 , \ compile a dummy offset
;
: THEN IMMEDIATE
HERE @ SWAP - \ calculate the offset from the address saved on the stack
SWAP ! \ store the offset in the back-filled location
;
: ELSE IMMEDIATE
' BRANCH , \ definite branch to just over the false-part
HERE @ \ save location of the offset on the stack
0 , \ compile a dummy offset
SWAP \ now back-fill the original (IF) offset
HERE @ SWAP -
SWAP !
;
Coda
结语
There have been a lot of languages http://hopl.info catalogues 8,945 programming languages from the 18th century to the present.