Expression Problem

注:本文为 "Expression Problem" 相关合辑。

英文引文,机翻未校。

中文引文,略作重排。

如有内容异常,请看原文。


Expression Problem

表达式问题

The "expression problem" is a phrase used to describe a dual problem that neither Object Oriented Programming nor Functional Programming fully addresses.

「表达式问题」这一术语,用于描述一个面向对象编程与函数式编程均无法彻底解决的双重困境。

The basic problem can be expressed in terms of a simple example: Suppose you wish to describe shapes, including rectangles and circles, and you wish to compute the areas.

这一问题的核心可以通过一个简单示例来阐述:假设你需要定义多种几何图形(包括正方形与圆形),并且需要实现计算这些图形面积的功能。

In Functional Programming, you would describe a data type such as:

在函数式编程范式中,你可以这样定义一个数据类型:

复制代码
 type Shape = Square of side
            | Circle of radius 

Then you would define a single area function:

随后,你可以定义一个统一的面积计算函数:

复制代码
 define area = fun x -> case x of
    Square of side => (side * side)
  | Circle of radius => (3.14 *  radius * radius)

In Object Oriented Programming, you would describe the data type such as:

在面向对象编程范式中,你可以这样定义该数据类型:

复制代码
 class Shape <: Object
   virtual fun area : () -> double

 class Square <: Shape
   side : double
   area() =  side * side

 class Circle <: Shape
   radius : double
   area() = 3.14 * radius * radius

The 'Expression Problem' manifests when you wish to 'extend' the set of objects or functions.

当你希望对对象的种类集合或函数的功能集合进行扩展时,「表达式问题」就会显现出来。

  • If you want to add a 'triangle' shape:

    若你希望新增一种名为「三角形」的图形:

    • the Object Oriented Programming approach makes it easy (because you can simply create a new class)

      面向对象编程的实现方式十分简便(你只需新建一个类即可)

    • but Functional Programming makes it difficult (because you'll need to edit every function that accepts a 'Shape' parameter, including 'area')

      而函数式编程的实现方式则十分繁琐(你需要修改所有接收 Shape 类型参数的函数,其中也包含 area 函数)

  • OTOH, if you want add a 'perimeter' function:

    另一方面,若你希望新增一个名为「周长」的计算函数:

    • Functional Programming makes it easy (simply add a new function 'perimeter')

      函数式编程的实现方式十分简便(你只需新增一个名为 perimeter 的函数即可)

    • while ObjectOrientedProgramming makes it difficult (because you'll need to edit every class to add 'perimeter()' to the interface).

      而面向对象编程的实现方式则十分繁琐(你需要修改每一个类,在其接口中新增 perimeter() 方法)

And that is the heart of the Expression Problem.

这,就是「表达式问题」的核心本质。

The Expression Problem is a specific example of a larger class of problems known generally as Cross Cutting Concerns - in this particular case the relevant 'concerns' being 'the set of shapes' and 'the features of shapes'. Many languages include designs to help solve the Expression Problem. Open functions (functions that can be extended with new pattern-matches), open data-types (data types that can be extended with new patterns), and MultiMethods ('open' specialized polymorphic functions with 'open' set of classes), and Predicate Dispatching, are all viable approaches.

「表达式问题」是一类更宽泛问题的典型特例,这类问题被统称为横切关注点------在本案例中,对应的两个「关注点」分别是「图形的种类集合」与「图形的功能特性集合」。许多编程语言在设计时都内置了用于解决「表达式问题」的特性。可扩展函数(支持新增模式匹配逻辑的函数)、可扩展数据类型(支持新增匹配模式的数据类型)、多方法(即支持类集合扩展的开放式专用多态函数)以及谓词分派,均属于可行的解决方案。

More general solutions to Separation Of Concerns will also apply (e.g. IBM's Hyper Spaces motivating example specifically targets this problem).

针对关注点分离的通用解决方案同样适用于此问题(例如 IBM 的 HyperSpace 技术,其设计的核心示例就是专门针对该问题的)。

In Oop Arguments Debates And Discussion, the Expression Problem is mentioned briefly in a case against Domain Objects. The reasoning: the main purpose of Domain Objects is to associate a domain ObjectIdentifier (see ObjectIdentity) with a set of domain features (properties). Use of OOP methods (e.g. synchronous Message Passing with return values) to solve this problem generally requires: (a) classifying each object identifier into a 'class' that chooses a constructor for it (a problem that runs into Limits Of Hierarchies), then (b) anticipating which 'features' might need to be computed from these objects (e.g. stylized printing, verification, cost estimates, etc.) that the features may be named in the hierarchy. Requirement (b) runs into the Expression Problem, since any missed features will cut across classes... and, in corollary, any supported feature will increase the responsibilities embedded in code (generally making it less reusable across programs).

在面向对象编程的争议与讨论相关内容中,「表达式问题」在反对领域对象的论点中被简要提及。其核心逻辑为:领域对象的主要作用,是将领域内的对象标识符(参见对象标识)与一组领域功能特性(属性)相关联。若使用面向对象编程的方法(例如带返回值的同步消息传递)解决该问题,通常需要满足两个条件:(a)将每一个对象标识符归类至某个「类」中,由该类为其选择对应的构造函数(这一做法会面临层级结构的局限性相关问题);(b)预先判断这些对象可能需要计算的「功能特性」(例如格式化打印、验证、成本估算等),并在层级结构中为这些功能特性命名。条件(b)会直接引发「表达式问题」,因为任何被遗漏的功能特性都需要跨多个类实现......相应地,每新增一个受支持的功能特性,都会增加代码中嵌入的职责(这通常会降低代码在不同程序间的可复用性)。

As opposed to a more specific class of computation-domain objects (math, language, queues and stacks, functors, values, etc.) this is especially problematic in general Domain Objects because there is no complete set of 'defining' features for a given object, so one is never finished adding or removing features across the set of Domain Objects.

与数学、语言、队列与栈、函子、数值等特定计算领域的对象不同,这一问题在通用领域对象中表现得尤为突出。原因在于,对于任意一个给定的对象,不存在一套完整的「定义性」功能特性,因此开发者永远无法完成对领域对象集合的功能特性增删工作。

The Oop Arguments Debates And Discussion page does not argue that Functional Programming is a better answer, only that Domain Objects are one wrong answer that runs into this problem.

面向对象编程的争议与讨论相关页面并未主张函数式编程是更优解,仅指出领域对象是其中一种会引发该问题的错误方案。

Here is an implementation that shows how to sidestep the problem, in C++:

以下是一个使用 C++ 规避该问题的实现方案:http://www.reddit.com/r/programming/comments/mmrmj/the_expression_problem/c328mt3

It's not sidestepping if you actually solve the problem. That looks like a promising start in CeePlusPlus.
如果你真的解决了这个问题,那就不能称之为「规避」了。这似乎是 C++ 领域内一个颇具前景的开端。

The Visitor Pattern in OOP lets you switch to the other side of the problem, by making it easy to add new operations and hard to add new objects. Though of course, it comes with the cost of setting up the pattern.

面向对象编程中的访问者模式可以让你切换到问题的另一维度------它会让新增操作变得简便,而新增对象变得繁琐。当然,使用该模式也需要付出一定的代价,即完成模式的搭建工作。

Related: SwitchStatementsSmell

Last edit November 24, 2013


What is the 'expression problem'?

什么是"表达式问题"?

I have a rough idea about what this is but if someone has an explanation of the 'expression problem' that they think is succinct and intuitive I would love to hear it.

我对这个概念有一个大致的了解,但如果有人能给出一个简洁且直观的解释,我会非常乐意倾听。

edited Sep 23, 2010 at 21:22

Daniel

asked Aug 29, 2010 at 20:02

James

Given that it's a reasonably involved concept, I'm not certain you'll get very far with a "succinct and intuitive" explanation, although I'd be glad to be proved wrong!

鉴于这是一个相对复杂的概念,我不确定是否真的能给出一个"简洁且直观"的解释,不过如果有人能做到这一点,我会很乐意承认自己的判断有误!

-- Gian

Commented Aug 30, 2010 at 12:06

No joke : An incomplete (and possibly slightly inaccurate) but illuminating metaphor would suffice.

不是开玩笑的:即便是一个不够完整(甚至可能存在些许偏差)但能够带来启发的比喻,对我来说就足够了。

-- James

Commented Sep 14, 2010 at 9:17

See also Complete solutions to the Expression Problem?.

另可参考《表达式问题的完整解决方案有哪些?》一文。

-- Shelby Moore III

Commented Dec 7, 2011 at 3:50

Answers

Watch this lecture.

推荐观看这门课程

The idea is that your program is a combination of a datatype and operations over it. The problem asks for an implementation that allows to add new cases of the type and new operations without the need for recompilation of the old modules and keeping static type safety (no casts or runtime type checks).

表达式问题的核心思想是:程序是由数据类型以及作用于该数据类型的操作共同构成的。这个问题要求程序的实现满足以下条件:能够在不重新编译旧有模块的前提下,新增数据类型的分支与新的操作,同时保持静态类型安全(不使用类型转换或运行时类型检查)。

It's interesting to notice that in functional programming languages it's easy to add new operations, but hard to add cases to the datatype. While in an OO language it's the other way round. This is one of the big conceptual differences between the two programming paradigms.

一个有趣的现象是:在函数式语言中,新增操作很容易实现,但新增数据类型的分支却很困难;而在面向对象语言中,情况则恰恰相反。这是两种编程范式之间的一个重大概念差异。

edited Feb 16, 2024 at 2:39

Jared Updike

answered Sep 23, 2010 at 7:17

Daniel

elnygren

Here's a solution that shows how the expression problem can be solved in Clojure (functional programming)

这里提供一个解决方案,展示了如何在 Clojure(函数式语言)中解决表达式问题:gist.github.com/elnygren/e34368a86d62f0cb75f04ba903f7834a

Christopher Done

@elnygren I forked your Gist and replaced your code with Haskell:

@elnygren 我复刻了你的代码片段,并将其中的代码替换为了 Haskell 实现:gist.github.com/chrisdone/7e07b3a90474542c9d1ebef033c1ee6e

yǝsʞǝla

One more fork with Scala example here:

这里还有一个复刻版本,提供了 Scala 语言的实现示例:gist.github.com/izmailoff/41c7f790eb97042c307885388754a0be

Cnly

The original link for the lecture seems to have expired. I think this is a working one.

原课程链接似乎已经失效,我认为这个链接是可以正常访问的。

The idea behind the problem is that text is 1 dimensional. Even if you have lines and columns, you generally read it, word by word, line by line. So does the compiler.

这个问题背后的核心逻辑是:文本是一维的。即便文本存在行与列的划分,人们在阅读时依然是逐字、逐行地进行,编译器的处理方式也是如此。

And you try to represent some kind of 2 or more dimensional data in it. For example a table in row-mayor order looks like this:

但你却需要在这种一维的文本中,表示某种二维或更高维度的数据。例如,一个以行为主序存储的表格,在文本中的表现形式如下:

复制代码
((A, B, C), (D, E, F), (G, H, I))

In this representation, it's quite easy to add a new row at the end, without touching the rest:

在这种表示方式下,在表格末尾新增一行是非常容易的,无需修改其他部分的内容:

复制代码
((A, B, C), (D, E, F), (G, H, I), (J, K, L))

But adding columns is problematic a bit, you need to touch it 4 different places:

但新增一列却会变得很麻烦,你需要修改 4 个不同的位置:

复制代码
((A, B, C, M), (D, E, F, N), (G, H, I, O), (J, K, L, P))

You generally run into this problem in practice, when dealing with abstract classes: it's quite easy to add a new subtype as a new module, but when you add a new abstract method, you'll need to touch all the modules and add it; you need to do the same thing in many places. Normally you make abstractions to protect against these repetitive things.

在实际开发中,当你处理抽象类时,通常会遇到类似的问题:新增一个子类型并将其作为独立模块是很容易的,但当你需要新增一个抽象方法时,就必须修改所有相关的模块,为每个子类型都实现这个方法;你需要在多个地方执行重复的操作。通常来说,我们进行抽象的目的,就是为了避免这类重复劳动。

There is no solution to this problem as long as you use 1D representation.

只要你采用一维的文本形式来表示数据,这个问题就无法从根本上得到解决。

The solution to this problem would be an editor that can let you edit these table like things like a real table and not like text (in an Excel like view, where you can conveniently add new columns and rows).

解决这个问题的理想方案,是使用一款能够将表格类数据当作真实表格而非文本进行编辑的工具(例如类似 Excel 的可视化界面,在这种界面中,你可以方便地新增行与列)。

answered Mar 4, 2014 at 18:55

Calmarius


表达式问题的三种解法

帕萩莉

Fainéant wizard.

一、表达式问题

若一个数据类型的内部表示形式(representation)与外部行为(behavior)均可依据需求持续修改、扩充,且该过程中无需对原有代码执行重编译或修改操作,此类问题被定义为表达式问题(Expression Problem, EP)[1]。

约束条件:

  1. 双向可扩展性:对应的解决方案需支持新增数据变体与新增操作两类拓展形式

  2. 无修改与无冗余:既存代码不允许被修改,也不允许出现逻辑重复的实现

  3. 分离编译与类型校验:类型安全校验及编译流程不得延迟至链接阶段或运行阶段执行

  4. 独立可扩展性:不同维度的拓展逻辑可进行无冲突的组合调用 [2]

1.1 在面向对象编程(OOP)中的表现形式

面向对象编程中,表达式问题呈现出"新增数据构造器易、修改表示形式难"的特点,基础实现如下:

java 复制代码
interface Exp { int eval(); }

class Lit implements Exp {
    int x;
    Lit(int x) { this.x = x; }
    public int eval() { return x; }
}

class Add implements Exp {
    Exp l, r;
    Add(Exp l, Exp r) { this.l = l; this.r = r; }
    public int eval() { return l.eval() + r.eval(); }
}

public class Main {
    public static void main(String[] args) {
        Exp e = new Add(new Lit(1), new Lit(2));
        System.out.println(e.eval());
    }
}

新增数据构造器(data constructor)可通过新增接口实现类完成,操作具备良好的易用性:

java 复制代码
interface Exp { int eval(); }

class Lit implements Exp {
    int x;
    Lit(int x) { this.x = x; }
    public int eval() { return x; }
}

class Add implements Exp {
    Exp l, r;
    Add(Exp l, Exp r) { this.l = l; this.r = r; }
    public int eval() { return l.eval() + r.eval(); }
}

class Mult implements Exp {
    Exp l, r;
    Mult(Exp l, Exp r) { this.l = l; this.r = r; }
    public int eval() { return l.eval() * r.eval(); }
}

public class Main {
    public static void main(String[] args) {
        Exp e = new Add(new Lit(1), new Lit(2));
        Exp r = new Mult(e, new Lit(4));
        System.out.println(r.eval());
    }
}

而对数据表示形式的修改则存在显著的局限性:若在Exp接口中新增格式化输出类的方法,其下所有实现该接口的子类均需同步修改并补充对应方法的具体实现,违背无修改的约束。

1.1.1 面向对象编程的解决方案:对象代数(Object Algebra)

An object algebra is a class that implements a generic abstract factory interface, which corresponds to a particular kind of algebraic signature [3].

对象代数是一类实现泛型抽象工厂接口的类,该接口对应某一特定类型的代数签名 [3]。

该方案依托访问者模式(Visitor Pattern)实现,可同时支持数据构造器与操作方法的无侵入拓展,代码如下:

java 复制代码
interface Exp { <A> A visit(Visitor<A> vis); }

interface Visitor<A> {
  A lit(Lit a);
  A add(Add a);
}

class Lit implements Exp {
  int value;
  public <A> A visit(Visitor<A> vis) { return vis.lit(this); }
}

class Add implements Exp {
  Exp left, right;
  public <A> A visit(Visitor<A> vis) {
    return vis.add(this);
  }
}

class Eval implements Visitor<Integer> {
  public Integer lit(Lit a) { return a.value; }
  public Integer add(Add a) { return a.left.visit(this) + a.right.visit(this); }
}

基于访问者(visitor)模式,新增数据构造器仅需扩展访问者接口及实现类,原有代码无需调整,完整实现如下:

java 复制代码
interface Exp { <A> A visit(Visitor<A> vis); }

interface Visitor<A> {
    A lit(Lit a);
    A add(Add a);
    A mult(Mult a);
}

class Lit implements Exp {
    int value;
    public Lit(int value) { this.value = value; }
    public <A> A visit(Visitor<A> vis) { return vis.lit(this); }
}

class Add implements Exp {
    Exp left, right;
    public Add(Exp left, Exp right) { this.left = left; this.right = right; }
    public <A> A visit(Visitor<A> vis) {
        return vis.add(this);
    }
}

class Mult implements Exp {
    Exp left, right;
    public Mult(Exp left, Exp right) { this.left = left; this.right = right; }
    public <A> A visit(Visitor<A> vis) {
        return vis.mult(this);
    }
}

class Eval implements Visitor<Integer> {
    public Integer lit(Lit a) { return a.value; }
    public Integer add(Add a) { return a.left.visit(this) + a.right.visit(this); }
    public Integer mult(Mult a) { return a.left.visit(this) * a.right.visit(this); }
}

class PrettyPrint implements Visitor<String> {
    public String lit(Lit a) { return Integer.toString(a.value); }
    public String add(Add a) { return "(" + a.left.visit(this) + " + " + a.right.visit(this) + ")"; }
    public String mult(Mult a) { return "(" + a.left.visit(this) + " * " + a.right.visit(this) + ")"; }
}

1.1.2 示例

PatternCraft - Visitor:https://www.codewars.com/kata/5682e646d5eddc1e21000017

二、在函数式编程(FP)中的表现形式

函数式编程中,表达式问题的表现与面向对象编程恰好相反:开发者可便捷地为数据类型新增操作函数,但对数据内部表示形式的修改与扩充存在较大实现难度。以 Haskell 语言为例,基础实现如下:

haskell 复制代码
data Exp = Lit Int
         | Add Exp Exp
      -- | Sub Exp Exp -- 若需支持减法运算,下方所有操作函数均需同步修改

eval :: Exp -> Int
eval (Lit x) = x
eval (Add x y) = eval x + eval y

prettyprint :: Exp -> String
prettyprint (Lit x) = show x
prettyprint (Add x y) = "(" ++ prettyprint x ++ "+" ++ prettyprint y ++ ")"

Haskell 作为纯函数式编程语言,不支持面向对象(OOP)编程语法特性,无法复用对象代数(Object Algebra)方案,但具备两套专属成熟解决方案,可完整解决表达式问题。

2.1 函数式编程的方案一:终局无标签(Finally Tagless)

2.1.1 移除数据标签(remove the tags)

该方案的设计逻辑为:将代数数据类型中原有的数据构造器,重构为类型类(typeclass)内的成员方法;对数据新增不同的操作逻辑,等价于为该类型类实现全新的实例。基础实现代码如下:

haskell 复制代码
class Semantics repr where
    val :: Int -> repr
    add :: repr -> repr -> repr
    mul :: repr -> repr -> repr

instance Semantics Int where
    val = id
    add = (+)
    mul = (*)

instance Semantics String where
    val = show
    add = \x y -> "(" ++ x ++ "+" ++ y ++ ")"
    mul = \x y -> "(" ++ x ++ "*" ++ y ++ ")"

该范式被命名为"无标签(tagless)",原因是最终表达式的表示形式不再依赖代数数据类型(ADT)的构造器,而是通过类型类的成员方法完成数据构造与组合。通过标注返回类型,即可调用对应操作逻辑:

haskell 复制代码
-- >>> exp1
-- 10
exp1 :: Int
exp1 = add (mul (val 2) (val 3)) (val 4)

-- >>> exp2
-- "(2*3)+4"
exp2 :: String
exp2 = add (mul (val 2) (val 3)) (val 4)

2.1.2 类型安全缺陷与优化方案

上述基础实现存在类型安全缺陷,无法阻止非法类型组合操作(如布尔值参与数值加法),此类错误无法在编译阶段检出,示例如下:

haskell 复制代码
class Semantics repr where
    val :: Int -> repr
    add :: repr -> repr -> repr

    bool :: Bool -> repr
    and :: repr -> repr -> repr

instance Semantics Int where
    val = id
    add = (+)
    bool = fromEnum
    and = (*)

exp1 :: Int
exp1 = add (bool True) (bool False `and` bool True)

优化思路为:为返回值增加一层函子(Functor)封装,通过类型参数实现严格类型约束,优化后代码如下:

haskell 复制代码
class Semantics repr where
    val :: Int -> repr Int
    add :: repr Int -> repr Int -> repr Int

    bool :: Bool -> repr Bool
    and :: repr Bool -> repr Bool -> repr Bool

newtype Eval a
    = Eval { runEval :: a
           }

instance Semantics Eval where
    val = Eval
    add (Eval x) (Eval y) = Eval (x + y)
    bool = Eval
    and (Eval x) (Eval y) = Eval (x && y)

newtype Pretty a
    = Pretty { runPretty :: String
             }

instance Semantics Pretty where
    val = Pretty . show
    add (Pretty x) (Pretty y) = Pretty (x <> " + " <> y)
    bool = Pretty . show
    and (Pretty x) (Pretty y) = Pretty (x <> " && " <> y)

优化后可在编译阶段完成类型校验,规避非法组合操作:

haskell 复制代码
exp1 :: Int
exp1 = runEval $ add (val 1) (val 2)

exp2 :: Bool
exp2 = runEval $ and (bool True) (bool False)

invalidExp :: Int
invalidExp = runEval $ add (val 1) (bool True) -- ^ Couldn't match type 'Bool' with 'Int'

2.1.3 拓展:语法树构建

基于该范式可轻松实现表达式语法树构建,示例如下:

haskell 复制代码
data Tree a = Leaf String a | Node String [ Tree a ]
    deriving ( Show, Eq )

instance Semantics Tree where
    val x = Leaf "val" x
    add x y = Node "add" [ x, y ]
    bool x = Leaf "bool" x
    and x y = Node "and" [ x, y ]

-- >>> exp3
-- Node "add" [Node "add" [Leaf "val" 4,Leaf "val" 5],Node "add" [Leaf "val" 2,Leaf "val" 3]]
exp3 :: Tree Int
exp3 = add (add (val 4) (val 5)) (add (val 2) (val 3))

2.2 函数式编程的方案二:组合式数据类型(Data types à la carte)

2.2.1 数据构造器的解耦拆分(Fixing)

该方案首先将聚合式数据构造器解耦为独立函子类型,每个构造器对应专属函子:

haskell 复制代码
data Val e = Val Int
    deriving ( Functor )

data Add e = Add e e
    deriving ( Functor )

引入不动点算子(Fix)封装递归嵌套结构,定义如下:

haskell 复制代码
newtype Fix f = In { out :: f (Fix f) }

其中,构造函数In的类型为f (Fix f) -> Fix f,投影函数out的类型为Fix f -> f (Fix f),可完整表达树形嵌套表达式。

引入余积(coproduct)算子:+:实现多函子自由组合,定义及函子实例如下:

haskell 复制代码
infixr 1 :+:
data (:+:) :: (Type -> Type) -> (Type -> Type) -> Type -> Type where
    Inl :: f a -> (f :+: g) a
    Inr :: g a -> (f :+: g) a

instance ( Functor f, Functor g ) => Functor (f :+: g) where
    fmap :: (a -> b) -> (f :+: g) a -> (f :+: g) b
    fmap f (Inl x) = Inl (fmap f x)
    fmap f (Inr x) = Inr (fmap f x)

通过 F-代数定义折叠函数foldFix,实现对嵌套结构的归约运算(F-algebra:https://en.wikipedia.org/wiki/F-algebra):

haskell 复制代码
foldFix :: Functor f => (f a -> a) -> Fix f -> a
foldFix f = f . fmap (foldFix f) . out

其中,foldFix f :: Fix f -> afmap (foldFix f) :: f (Fix f) -> f a,第一个入参为代数(algebra),负责单层结构归约,foldFix将其递归应用于所有层级。

2.2.2 实现一:表达式求值运算(Evaluating)

通过类型类定义求值逻辑,支持多函子组合场景的适配:

haskell 复制代码
class Functor f => Eval f where
    evalAlgebra :: f Int -> Int

instance Eval Val where
    evalAlgebra :: Val Int -> Int
    evalAlgebra (Val x) = x

instance Eval Add where
    evalAlgebra :: Add Int -> Int
    evalAlgebra (Add x y) = x + y

instance ( Eval f, Eval g ) => Eval (f :+: g) where
    evalAlgebra :: (f :+: g) Int -> Int
    evalAlgebra (Inl x) = evalAlgebra x
    evalAlgebra (Inr x) = evalAlgebra x

eval :: Eval f => Fix f -> Int
eval = foldFix evalAlgebra

exp1 :: Int
exp1 = eval (In (Inr (Add (In (Inl (Val 1))) (In (Inl (Val 2))))))

-- >>> exp1
-- 3

2.2.3 优化:构造逻辑的自动化注入(Automating Injection)

为简化手动嵌套构造逻辑,定义子类型包含关系类型类:<:>,实现构造器自动化注入:

haskell 复制代码
class ( Functor sub, Functor sup ) => sub :<: sup where
    inj :: sub a -> sup a

instance Functor f => f :<: f where
    inj = id

instance ( Functor f, Functor g ) => f :<: (f :+: g) where
    inj = Inl

instance ( Functor f, Functor g, Functor h, f :<: g ) => f :<: (h :+: g) where
    inj = Inr . inj

封装注入函数与构造函数,实现表达式简洁书写:

haskell 复制代码
inject :: (g :<: f) => g (Fix f) -> Fix f
inject = In . inj

val :: (Val :<: f) => Int -> Fix f
val x = inject (Val x)

(<+>) :: (Add :<: f) => Fix f -> Fix f -> Fix f
x <+> y = inject (Add x y)

exp2 :: Fix (Val :+: Add)
exp2 = val 1 <+> val 2 <+> (val 3 <+> val 4)

-- >>> eval exp2
-- 10

符号 < : \mathrel{<:} <: 记为 Subsume 。该类型类定义子类型关系,其中 subsup 的子类型,supsub 的父类型。面向对象编程(OOP)中所提及的多态性,绝大多数属于子类型多态。

概括:子类型化是作用于类型的偏序关系;若类型 A A A 是类型 B B B 的子类型,则 A A A 可用于任何预期使用 B B B 的场景。

2.2.4 特性:无侵入双向拓展

基于该范式,可无侵入新增数据构造器与操作函数,满足表达式问题全部约束:

haskell 复制代码
data Mult a = Mult a a
    deriving ( Functor )

instance Eval Mult where
    evalAlgebra :: Mult Int -> Int
    evalAlgebra (Mult x y) = x * y

mult :: (Mult :<: f) => Fix f -> Fix f -> Fix f
mult x y = inject (Mult x y)

exp3 :: Fix (Val :+: Add :+: Mult)
exp3 = val 1 <+> val 2 <+> (val 3 <+> val 4) <+> (val 5 `mult` val 6)

-- >>> eval exp3
-- 40

class Functor f => Display f where
    displayAlgebra :: f String -> String

instance Display Val where
    displayAlgebra :: Val String -> String
    displayAlgebra (Val x) = show x

instance Display Add where
    displayAlgebra :: Add String -> String
    displayAlgebra (Add x y) = "(" ++ x ++ " + " ++ y ++ ")"

instance Display Mult where
    displayAlgebra :: Mult String -> String
    displayAlgebra (Mult x y) = "(" ++ x ++ " * " ++ y ++ ")"

instance ( Display f, Display g ) => Display (f :+: g) where
    displayAlgebra :: (f :+: g) String -> String
    displayAlgebra (Inl x) = displayAlgebra x
    displayAlgebra (Inr x) = displayAlgebra x

display :: Display f => Fix f -> String
display = foldFix displayAlgebra

-- >>> display exp3
-- "(((1 + 2) + (3 + 4)) + (5 * 6))"

2.2.5 进阶优化:类型族(type family)实现

上述类型类实现依赖重叠实例,存在功能限制(子类型仅支持原子类型、父类型需链式余积定义)。可通过闭合类型族(closed type family)优化,支持复合子类型与任意结构父类型的包含关系推导[5]:

haskell 复制代码
type family Or (a :: Bool) (b :: Bool) :: Bool where
    Or 'False 'False = 'False
    Or _ _ = 'True

data Pos = Here | L Pos | R Pos | Sum Pos Pos

data Res = Found Pos | NotFound | Ambiguous

type family Elem (f :: Type -> Type) (g :: Type -> Type) :: Res where
    Elem f f = 'Found 'Here
    Elem f (g :+: h) = Choose f (g :+: h) (Elem f g) (Elem f h)
    Elem f g = 'NotFound

type family Choose (f :: Type -> Type) (g :: Type -> Type) (a :: Res) (b :: Res) :: Res where
    Choose f g ('Found _) ('Found _) = 'Ambiguous
    Choose f g 'Ambiguous _ = 'Ambiguous
    Choose f g _ 'Ambiguous = 'Ambiguous
    Choose f g ('Found a) _ = 'Found ('L a)
    Choose f g _ ('Found b) = 'Found ('R b)
    Choose (f1 :+: f2) g x y = Sum' (Elem f1 g) (Elem f2 g)
    Choose f g _ _ = 'NotFound

type family Sum' (a :: Res) (b :: Res) :: Res where
    Sum' ('Found a) ('Found b) = 'Found ('Sum a b)
    Sum' 'Ambiguous _ = 'Ambiguous
    Sum' _ 'Ambiguous = 'Ambiguous
    Sum' _ _ = 'NotFound

class Subsume (res :: Res) f g where
    inj' :: Proxy res -> f a -> g a
    prj' :: Proxy res -> g a -> Maybe (f a)

instance Subsume ('Found 'Here) f f where
    inj' _ = id
    prj' _ = Just

instance Subsume ('Found a) f g => Subsume ('Found ('L a)) f (g :+: h) where
    inj' _ = Inl . inj' (Proxy @('Found a))
    prj' _ (Inl x) = prj' (Proxy @('Found a)) x
    prj' _ _       = Nothing

instance Subsume ('Found a) f h => Subsume ('Found ('R a)) f (g :+: h) where
    inj' _ = Inr . inj' (Proxy @('Found a))
    prj' _ (Inr x) = prj' (Proxy @('Found a)) x
    prj' _ _       = Nothing

instance ( Subsume ('Found a) f1 g, Subsume ('Found b) f2 g )
    => Subsume ('Found ('Sum a b)) (f1 :+: f2) g where
    inj' _ (Inl x) = inj' (Proxy @('Found a)) x
    inj' _ (Inr x) = inj' (Proxy @('Found b)) x
    prj' _ x = case prj' (Proxy @('Found a)) x of
        Just y  -> Just (Inl y)
        Nothing -> case prj' (Proxy @('Found b)) x of
            Just y  -> Just (Inr y)
            Nothing -> Nothing

inj :: forall f g a. Subsume (Elem f g) f g => f a -> g a
inj = inj' (Proxy @(Elem f g))

prj :: forall f g a. Subsume (Elem f g) f g => g a -> Maybe (f a)
prj = prj' (Proxy @(Elem f g))

优化后用法保持不变,且具备更完善的类型推导能力:

haskell 复制代码
inject :: Subsume (Elem f g) f g => f (Fix g) -> Fix g
inject = In . inj

val :: Subsume (Elem Val g) Val g => Int -> Fix g
val = inject . Val

(<+>) :: Subsume (Elem Add g) Add g => Fix g -> Fix g -> Fix g
(<+>) x y = inject (Add x y)

-- >>> eval exp2
-- 3
exp2 :: Fix (Val :+: Add)
exp2 = val 1 <+> val 2

2.2.6 示例

  1. Finally tagless interpreter:https://www.codewars.com/kata/5424e3bc430ca2e577000048/

  2. Data Types à la Carte:https://www.codewars.com/kata/54808fc8ab03a23e82000a1f/

三、更多阅读

  1. The Expression Problem by Philip Wadler:
    https://homepages.inf.ed.ac.uk/wadler/papers/expression/expression.txt
  2. Freer Monads and Extensible Effects
    https://okmij.org/ftp/Haskell/extensible/

参考文献

  1. Wadler, P. The Expression Problem[Z]. Email, 1998, 11: 1-10.

  2. Zenger, M., Odersky, M. Independently Extensible Solutions to the Expression Problem - IC_TECH_REPORT_200433.pdf https://www.scala-lang.org/docu/files/IC_TECH_REPORT_200433.pdf

  3. Oliveira, B.C.d.S., Cook, W.R. Extensibility for the Masses[C]//Noble, J. (eds) ECOOP 2012 -- Object-Oriented Programming. ECOOP 2012. Lecture Notes in Computer Science, vol 7313. Springer, Berlin, Heidelberg, 2012: 1-25. https://doi.org/10.1007/978-3-642-31057-7_2.

  4. SWIERSTRA, W. Data types à la carte[J]. Journal of Functional Programming, 2008, 18(4): 423-436. https://doi.org/10.1017/S0956796808006758.

  5. Bahr, P. Composing and decomposing data types[C]//Proceedings of the 10th ACM SIGPLAN Workshop on Generic Programming. 2014: 71-82. https://doi.org/10.1145/2633628.2633635.

  6. Carette, J., Kiselyov, O., Shan, Cc. Finally Tagless, Partially Evaluated[C]//Shao, Z. (eds) Programming Languages and Systems. APLAS 2007. Lecture Notes in Computer Science, vol 4807. Springer, Berlin, Heidelberg, 2007: 1-18. https://doi.org/10.1007/978-3-540-76637-7_15.

  7. Kiselyov, O. Typed Tagless Final Interpreters[C]//Gibbons, J. (eds) Generic and Indexed Programming. Lecture Notes in Computer Science, vol 7470. Springer, Berlin, Heidelberg, 2012: 1-20. https://doi.org/10.1007/978-3-642-32202-0_3.

  8. Kiselyov, O., Ishii, H. Freer Monads, More Extensible Effects[C]//Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell. Vancouver, BC, Canada, 2015: 94--105. https://doi.org/10.1145/2804302.2804319.

发布于 2023-05-12 15:23・广东


via