注:本文为 "数据抽象" 相关译文合辑。
英文引文,机翻未校。
如有内容异常,请看原文。
On Understanding Data Abstraction, Revisited
重探数据抽象的理解
William R. Cook
University of Texas at Austin
得克萨斯大学奥斯汀分校
Abstract
摘要
In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called "On understanding types, data abstraction, and polymorphism". Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.
1985 年,我的导师卢卡·卡德利(Luca Cardelli)和彼得·韦格纳(Peter Wegner)在《ACM 计算调查》上发表了一篇题为《论类型、数据抽象和多态性的理解》的论文。他们的研究引发了关于面向对象程序设计语义学和类型理论的大量研究,这类研究至今仍在继续。尽管经过了 25 年的研究,人们对数据抽象的两种形式------抽象数据类型.(ADT)和对象------仍然存在广泛的困惑。本文试图解释二者的差异及其重要性。
Categories and Subject Descriptors
分类与主题描述符
D.3.3 [Programming Languages]: Language Constructs and Features-Abstract data types; D.3.3 [Programming Languages]: Language Constructs and Features-Classes and objects
D.3.3 [程序设计语言]:语言结构与特性 - 抽象数据类型;D.3.3 [程序设计语言]:语言结构与特性 - 类与对象
General Terms
通用术语
Languages
语言
Keywords
关键词
object, class, abstract data type, ADT
对象、类、抽象数据类型
1. Introduction
1. 引言
What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of "objects are a kind of abstract data type".
对象与抽象数据类型之间存在怎样的关系?在过去 20 年里,我曾向多组计算机科学家提出过这个问题,通常是在晚餐或饮品时间。典型的回答大致是"对象是一种抽象数据类型"。
This response is consistent with most programming language textbooks. Tucker and Noonan [57] write "A class is itself an abstract data type". Pratt and Zelkowitz [51] intermix discussion of Ada, C++, Java, and Smalltalk as if they were all slight variations on the same idea. Sebesta [54] writes "the abstract data types in object-oriented languages... are called classes." He uses "abstract data types" and "data abstraction" as synonyms. Scott [53] describes objects in detail, but does not mention abstract data types other than giving a reasonable discussion of opaque types.
这一回答与大多数程序设计语言教材的观点一致。塔克(Tucker)和努南(Noonan)[57] 写道"类本身就是一种抽象数据类型"。普拉特(Pratt)和泽尔科维茨(Zelkowitz)[51] 将 Ada、C++、Java 和 Smalltalk 的相关讨论混为一谈,仿佛它们只是同一思想的细微变体。塞贝斯塔(Sebesta)[54] 写道"面向对象语言中的抽象数据类型......被称为类",他将"抽象数据类型"和"数据抽象"视为同义词。斯科特(Scott)[53] 详细描述了对象,但除了对不透明类型进行了合理讨论外,并未提及抽象数据类型。
So what is the point of asking this question? Everyone knows the answer. It's in the textbooks. The answer may be a little fuzzy, but nobody feels that it's a big issue. If I didn't press the issue, everyone would nod and the conversation would move on to more important topics. But I do press the issue. I don't say it, but they can tell I have an agenda.
那么,提出这个问题的意义何在?似乎人人都知道答案,因为教材上就是这么写的。答案可能有些模糊,但没人觉得这是个大问题。如果我不深究,大家只会点头认同,然后话题就会转向更重要的内容。但我总会深究下去。我虽不明说,但他们能看出我另有想法。
My point is that the textbooks mentioned above are wrong! Objects and abstract data types are not the same thing, and neither one is a variation of the other. They are fundamentally different and in many ways complementary, in that the strengths of one are the weaknesses of the other. The issues are obscured by the fact that most modern programming languages support both objects and abstract data types, often blending them together into one syntactic form. But syntactic blending does not erase fundamental semantic differences which affect flexibility, extensibility, safety and performance of programs. Therefore, to use modern programming languages effectively, one should understand the fundamental difference between objects and abstract data types.
我的观点是,上述教材的说法是错误的!对象和抽象数据类型并非同一事物,也不存在谁是谁的变体这一说法。它们在本质上截然不同,且在许多方面具有互补性------一方的优势正是另一方的劣势。大多数现代程序设计语言同时支持对象和抽象数据类型,且常常将它们融合为一种语法形式,这一事实掩盖了二者的本质区别。但语法上的融合并不能消除本质的语义差异,这些差异会影响程序的灵活性、可扩展性、安全性和性能。因此,要有效使用现代程序设计语言,就必须理解对象与抽象数据类型之间的根本区别。
While objects and ADTs are fundamentally different, they are both forms of data abstraction. The general concept of data abstraction refers to any mechanism for hiding the implementation details of data. The concept of data abstraction has existed long before the term "data abstraction" came into existence. In mathematics, there is a long history of abstract representations for data. As a simple example, consider the representation of integer sets. Two standard approaches to describe sets abstractly are as an algebra or as a characteristic function. An algebra has a sort, or collection of abstract values, and operations to manipulate the values¹. The characteristic function for a set maps a domain of values to a boolean value, which indicates whether or not the value is included in the set. These two traditions in mathematics correspond closely to the two forms of data abstraction in programming: algebras relate to abstract data types, while characteristic functions are a form of object.
尽管对象和抽象数据类型存在本质差异,但它们都是数据抽象的形式。数据抽象的一般概念指的是任何用于隐藏数据实现细节的机制。早在"数据抽象"这一术语出现之前,数据抽象的概念就已经存在了。在数学领域,数据的抽象表示有着悠久的历史。举一个简单的例子,考虑整数集的表示。抽象描述集合的两种标准方法是代数法和特征函数法。代数包含一个类型(或抽象值的集合)以及用于操作这些值的运算¹。集合的特征函数将一个值的域映射到布尔值,用于指示该值是否包含在集合中。数学中的这两种传统与程序设计中数据抽象的两种形式密切对应:代数对应抽象数据类型,而特征函数则是对象的一种形式。
¹ The sort, or carrier set, of an algebra is often described as a set, making this definition circular. Our goal is to define specific set abstractions with restricted operations, which may be based on and assume a more general concept of sets
¹ 代数的类型(sort)或承载集(carrier set)通常被描述为一个集合,这使得该定义陷入循环。我们的目标是定义具有受限操作的特定集合抽象,这些抽象可以基于并假设一个更一般的集合概念。
注:
- sort:在代数规范(algebraic specification)中,"sort" 指代数据的类型或种类。
- carrier set:承载集,指代数结构中具体承载运算的底层集合。
In the rest of this essay, I elaborate on this example to explain the differences between objects and ADTs. The examples focus on non-mutable objects, because they are sufficient to explain the main points. Other topics, including inheritance and reflection, are also ignored in order to focus on the basic idea of data abstraction.
在本文的其余部分,我将详细展开这个例子,以解释对象与抽象数据类型之间的差异。示例将聚焦于不可变对象,因为它们足以阐明核心观点。为了突出数据抽象的基本思想,其他主题(包括继承和反射)也将暂不讨论。
When I'm inciting discussion of this topic over drinks, I don't tell the full story up front. It is more fun to keep asking questions as the group explores the topic. It is a lively discussion, because most of these ideas are documented in the literature and all the basic facts are known. What is interesting is that the conclusions to be drawn from the facts are not as widely known. Most groups eventually work through the differences between objects and ADTs, but I can tell they walk away feeling uneasy, as if some familiar signposts now point in different directions. One source of unease is that the fundamental distinctions are obscured, but not eliminated, in real programming languages. Also, the story is quite complex and multi-faceted. This essay is only an introduction to a large body of literature on the relationship between objects and ADTs.
当我在饮品时间引发关于这个话题的讨论时,我不会一开始就和盘托出。随着大家深入探讨,不断提出问题会更有趣。这类讨论往往十分热烈,因为这些思想大多已在文献中有所记载,所有基本事实都是已知的。有趣的是,从这些事实中得出的结论却并不为人所熟知。大多数讨论小组最终都能理清对象与抽象数据类型之间的差异,但我能看出他们离开时会感到不安,仿佛一些熟悉的路标突然指向了不同的方向。造成这种不安的一个原因是,在实际的程序设计语言中,这些根本区别被掩盖了,但并未消失。此外,这个话题相当复杂且多层面。本文仅对关于对象与抽象数据类型关系的大量文献进行初步介绍。
In my conversations about objects and ADTs, my next step is to push the discussion towards a more precise understanding of data abstraction. What is an abstract data type? What is an object? For abstract data types, there is general agreement.
在关于对象与抽象数据类型的讨论中,接下来我会引导大家更精确地理解数据抽象。什么是抽象数据类型?什么是对象?对于抽象数据类型,人们已经达成了普遍共识。
2. Abstract Data Types
2. 抽象数据类型
An abstract data type (ADT) has a public name, a hidden representation, and operations to create, combine, and observe values of the abstraction. The familiar built-in types in most languages, for example the integer and boolean data types in Algol, Pascal, ML, Java and Haskell, are abstract data types.
抽象数据类型具有一个公共名称、一个隐藏的表示形式,以及用于创建、组合和观察该抽象类型值的运算。大多数语言中常见的内置类型(例如 Algol、Pascal、ML、Java 和 Haskell 中的整数和布尔数据类型)都是抽象数据类型。
In addition to built-in abstract data types, some languages support user-defined abstract data types. User-defined abstract data types that resemble built-in data types were first realized in CLU [37, 36] and Alphard [61] in the 1970s. There were also strong connections to algebraic specification of data types [24, 7] The core ideas introduced in CLU were adapted for ML [42], Ada [49], Modula-2 [60]. As an example, Figure 1 defines an abstraction for integer sets, adapted from the CLU reference manual [36].
除了内置的抽象数据类型外,一些语言还支持用户定义的抽象数据类型。20 世纪 70 年代,CLU [37, 36] 和 Alphard [61] 首次实现了与内置数据类型相似的用户定义抽象数据类型。这与数据类型的代数规范也有着密切联系 [24, 7]。CLU 中引入的核心思想被应用到了 ML [42]、Ada [49] 和 Modula-2 [60] 中。例如,图 1 定义了一个整数集的抽象,改编自 CLU 参考手册 [36]。
set = cluster is empty, contains, insert, isEmpty, union
rep = oneof[
empty: null,
pair: struct[
first: int,
rest: rep
]
]
empty = proc() returns (cvt)
return(rep$make_empty(nil))
end empty;
insert = proc(s: cvt, i: int) returns (cvt)
if contains(up(s), i) then
return(s)
else
return(rep$make_pair(pair$(first: i, rest: s)))
end
end insert;
isEmpty = proc(s: cvt) returns (bool)
typecase s
tag empty: return(true)
tag pair(p: pair): return(false)
end
end isEmpty;
contains = proc(s: cvt, i: int) returns (bool)
typecase s
tag empty: return(false)
tag pair(p: pair):
if p.first = i then
return(true)
else
return(contains(up(p.rest), i))
end
end
end contains;
union = proc(s1: cvt, s2: cvt) returns (cvt)
typecase s1
tag empty: return(s2)
tag pair(p: pair):
return(insert(union(up(p.rest), s2), p.first))
end
end union;
end set
Figure 1. CLU cluster for integer sets
图 1. 整数集的 CLU 集群
The representation type is a list of integers. In discussions of CLU, these values are called "objects" or "data objects", although they are not necessarily the same as objects in object-oriented programming.
其表示类型是整数列表。在 CLU 的相关讨论中,这些值被称为"对象"或"数据对象",但它们与面向对象程序设计中的对象并不一定相同。
CLU used explicit syntax and operators to manage the hiding of the representation. The cvt type represents the public view of the representation type, while the functions up and down convert between public and private views of the type. Rather than explain CLU in detail, it is easier to give the same abstraction in ML, as in Figure 2, where the hiding mechanism is simplified and type inference simplifies the types.
CLU 使用显式语法和运算符来管理表示的隐藏。cvt 类型表示表示类型的公共视图,而 up 和 down 函数则用于在该类型的公共视图和私有视图之间进行转换。与其详细解释 CLU,不如在 ML 中给出相同的抽象(如图 2 所示),其中隐藏机制得到了简化,类型推断也简化了类型定义。
Figure 3 gives the signature of the resulting abstract data type. A signature defines the type name (but not its representation) and the types of the operations. The signature can be extended with a full specification of the behavior of integer sets. Abstract data types support very powerful specification and verification techniques, including equational theories [20, 3, 7] and axiomatic specifications [26, 40, 17]. The specifications work well in this context; they are intuitive, elegant and sound.
图 3 给出了所得抽象数据类型的签名。签名定义了类型名称(但不包括其表示形式)和运算的类型。该签名可以扩展为整数集行为的完整规范。抽象数据类型支持非常强大的规范和验证技术,包括等式理论 [20, 3, 7] 和公理规范 [26, 40, 17]。这些规范在该场景下效果良好,既直观、简洁又可靠。
Clients can declare values of type set and use operations to manipulate the values.
客户端可以声明 set 类型的值,并使用运算来操作这些值。
(* 抽象集合类型定义与操作实现 *)
abstype set = EMPTY | INS of int * set
with
(* 空集常量 *)
val empty = EMPTY;
(* 向集合中插入元素(自动去重) *)
fun insert(s: set, i: int): set =
if not (contains(s, i)) then
INS(i, s)
else
s;
(* 判断集合是否为空 *)
fun isEmpty(s: set): bool =
(s == EMPTY);
(* 判断元素是否存在于集合中 *)
fun contains(s: set, i: int): bool =
case s of
EMPTY => false
| INS(n, r) => if i = n then true else contains(r, i);
(* 合并两个集合(自动去重) *)
fun union(s1: set, s2: set): set =
case s1 of
EMPTY => s2
| INS(n1, r1) => insert(union(r1, s2), n1);
end;
(* 测试逻辑:创建集合并验证元素存在性 *)
let
val a = empty;
val b = insert(a, 3);
in
if contains(b, 2) then "yes" else "no"
end;
Figure 2. ML abstract data type (ADT) for integer sets
图 2. 整数集的 ML 抽象数据类型
(* 1. 定义集合模块的对外接口 *)
signature SET =
sig
type set
val empty : set
val isEmpty : set -> bool
val insert : set * int -> set
val contains: set * int -> bool
val union : set * set -> set
end;
(* 2. 实现该接口的集合模块 *)
structure IntSet :> SET =
struct
abstype set = EMPTY | INS of int * set
with
val empty = EMPTY;
fun isEmpty(s: set): bool =
(s == EMPTY);
fun contains(s: set, i: int): bool =
case s of
EMPTY => false
| INS(n, r) => if i = n then true else contains(r, i);
fun insert(s: set, i: int): set =
if not (contains(s, i)) then
INS(i, s)
else
s;
fun union(s1: set, s2: set): set =
case s1 of
EMPTY => s2
| INS(n1, r1) => insert(union(r1, s2), n1);
end;
end;
(* 3. 测试逻辑 *)
let
val a = IntSet.empty;
val b = IntSet.insert(a, 3);
in
if IntSet.contains(b, 2) then "yes" else "no"
end;
Figure 3. Signature for integer set abstract data type
图 3. 整数集抽象数据类型的签名
But clients cannot inspect the representation. This is why the isEmpty function is needed, because the following program is illegal when written outside of the abstraction:
但客户端无法查看其表示形式。这就是需要 isEmpty 函数的原因,因为以下程序在抽象外部编写时是非法的:
fun test(a : set) = (a == EMPTY);
The function test is attempting to break the encapsulation of the data abstraction to peek at its internal representation. There is also no predefined notion of equality on integer sets. If equality is desired, it must be programmed and made explicit in the ADT interface.
test 函数试图破坏数据抽象的封装性,以窥探其内部表示形式。此外,整数集没有预定义的相等性概念。如果需要相等性判断,则必须通过编程在抽象数据类型接口中明确实现。
2.1 Representation Independence
2.1 表示独立性
The name set is abstract because it has a public name but its details are hidden. This is a fundamental characteristic of abstraction: something is visible on the surface, but the details are hidden. In the case of type abstraction, the type name is public, but the representation is hidden. With procedural abstraction, the procedure interface (name and arguments) is public, but the operational details are hidden. Type abstraction is a technical mechanism that can be used to support data abstraction.
名称 set 是抽象的,因为它具有公共名称,但细节被隐藏。这是抽象的基本特征:表面上可见某些内容,但细节被隐藏。对于类型抽象,类型名称是公共的,但表示形式是隐藏的;对于过程抽象,过程接口(名称和参数)是公共的,但操作细节是隐藏的。类型抽象是一种可用于支持数据抽象的技术机制。
One of the practical benefits of data abstraction is that it allows internal implementation details to be changed without affecting the users of the abstraction. For example, we could modify the code for set to represent integer sets as hash tables or balanced binary trees. For example, Figure 4 is an alternative implementation based on a sorted list representation.
数据抽象的一个实际好处是,它允许修改内部实现细节,而不会影响抽象的使用者。例如,我们可以修改 set 的代码,将整数集表示为哈希表或平衡二叉树。例如,图 4 是一个基于排序列表表示的替代实现。
2.2 Optimization
2.2 优化
A different implementation opens up the possibility for optimizing some of the operations. For example, the union operation in Figure 2 is quite expensive to compute. With a sorted list representation union is computed in linear time. Insertion is faster in some cases, but it may require copying more nodes. Deciding what representations to use, based on the associated algorithmic trade-offs, is a standard software engineering activity.
不同的实现为优化某些运算提供了可能。例如,图 2 中的 union 运算计算成本很高,而使用排序列表表示时,union 运算可以在线性时间内完成。在某些情况下,插入操作会更快,但可能需要复制更多节点。根据相关的算法权衡来决定使用何种表示形式,是一项标准的软件工程工作。
These optimizations depend critically upon an important feature of abstract data types: the ability to inspect the representation of more than one abstract value at the same time. Multiple representations are inspected in the union operation. There is nothing surprising about inspecting multiple representations. It is a natural side-effect of the type system and the fact that all values of type set belong to the abstract data type implementation that created them. As we shall see, the ability to inspect multiple representations does have some important consequences.
这些优化严重依赖于抽象数据类型的一个重要特性:能够同时查看多个抽象值的表示形式。union 运算中就需要查看多个表示形式。查看多个表示形式并不奇怪,这是类型系统的自然结果,也是所有 set 类型的值都属于创建它们的抽象数据类型实现这一事实的必然结果。正如我们将看到的,查看多个表示形式的能力确实会带来一些重要的影响。
2.3 Unique Implementations
2.3 唯一实现
With ML abstypes, CLU clusters, Ada packages and Modula-2 modules there can only be one implementation of an abstract data type in any given program. The implementation is a construct that manages a collection of values that inhabit the type. All the values from a given implementation share the same representation type, although there can be multiple different representational variants within the type. This is usually accomplished by defining the representation type as a labeled sum. The type name set is a globally bound name that refers to a single hidden representation. The type system ensures that it is sound for the implementation to inspect any set value.
在 ML 的 abstype、CLU 的集群、Ada 的包和 Modula-2 的模块中,任何给定程序中一个抽象数据类型只能有一个实现。该实现是一个构造,用于管理属于该类型的一组值。来自同一实现的所有值共享相同的表示类型,尽管该类型内部可能存在多个不同的表示变体。这通常通过将表示类型定义为带标签的和类型来实现。类型名称 set 是一个全局绑定的名称,指向单个隐藏的表示形式。类型系统确保实现可以安全地查看任何 set 值。
Having only one implementation of a data abstraction is limiting. There is already a name clash between the definitions in Figures 2 and 4. One of them had to be given a different name, set2, even though they are really just two different versions of the same abstraction. Client programs have to be edited to choose one or the other implementation.
数据抽象只能有一个实现,这具有一定的局限性。图 2 和图 4 中的定义已经存在名称冲突。尽管它们实际上只是同一抽象的两个不同版本,但其中一个必须被赋予不同的名称(set2)。客户端程序必须通过编辑来选择其中一个实现。
ADTs are also frequently used in C programming [32], using header files as a simple module system. The signature of the type is given in a header file as a forward reference to a structure that is only defined in the implementation file. An example header file for integer sets is given in Figure 5. This trick works because the C compiler does not need to know the format of the representation type, it only needs to know the size of a pointer to the representation.
抽象数据类型也经常在 C 程序设计 [32] 中使用,头文件被用作简单的模块系统。类型的签名在头文件中以向前引用的形式给出,指向仅在实现文件中定义的结构。图 5 给出了一个整数集的头文件示例。这种方法之所以有效,是因为 C 编译器不需要知道表示类型的格式,只需要知道指向该表示的指针的大小。
(* 有序抽象集合(从小到大排序,自动去重,优化查询/合并效率) *)
abstype set2 = EMPTY | INS of int * set2
with
(* 空集常量 *)
val empty = EMPTY;
(* 有序插入元素(保持从小到大,自动去重,优化冗余插入) *)
fun insert(s: set2, i: int): set2 =
case s of
EMPTY => INS(i, s)
| INS(n, r) =>
if i = n then
s (* 元素已存在,直接返回原集合 *)
else if i < n then
INS(i, s) (* 小于当前元素,插入到头部 *)
else
let
val t = insert(r, i) (* 递归插入到尾部集合 *)
in
if r = t then s else INS(n, t) (* 避免无意义的结构重建 *)
end;
(* 判断集合是否为空 *)
fun isEmpty(s: set2): bool =
(s == EMPTY);
(* 有序查询元素(利用有序性优化,大于当前元素直接返回false) *)
fun contains(s: set2, i: int): bool =
case s of
EMPTY => false
| INS(n, r) =>
if i = n then
true
else if i > n then
false (* 有序集合,大于当前元素则后续无匹配,直接返回 *)
else
contains(r, i); (* 小于当前元素,递归查询尾部集合 *)
(* 有序合并两个集合(利用有序性优化,减少冗余插入) *)
fun union(s1: set2, s2: set2): set2 =
case s1 of
EMPTY => s2 (* 第一个集合为空,直接返回第二个集合 *)
| INS(n1, r1) =>
case s2 of
EMPTY => s1 (* 第二个集合为空,直接返回第一个集合 *)
| INS(n2, r2) =>
if n1 = n2 then
insert(union(r1, r2), n1) (* 元素相同,合并尾部后插入该元素 *)
else if n1 < n2 then
insert(union(r1, s2), n1) (* n1更小,优先合并n1的尾部与s2 *)
else
insert(union(s1, r2), n2); (* n2更小,优先合并s1与n2的尾部 *)
end;
Figure 4. Integer set ADT with sorted list representation
图 4. 基于排序列表表示的整数集抽象数据类型
#ifndef SET_H
#define SET_H
// 引入 bool 类型所需的头文件
#include <stdbool.h>
// 1. 声明不透明结构体(隐藏内部实现,对应抽象集合类型,类似SML的abstype)
// 仅声明结构体,不定义内部成员,实现封装(具体实现在.c文件中)
typedef struct set_rep set_rep;
typedef set_rep* set;
// 2. 集合操作接口函数声明(每个接口单独成行,格式统一)
/**
* 创建一个空集合
* @return 指向空集合的指针
*/
set empty(void);
/**
* 判断集合是否为空
* @param s 指向待判断集合的指针
* @return 空集合返回true,非空返回false
*/
bool isEmpty(set s);
/**
* 向集合中插入一个整数元素(去重)
* @param s 指向原集合的指针
* @param i 待插入的整数元素
* @return 指向插入后集合的指针
*/
set insert(set s, int i);
/**
* 判断整数元素是否存在于集合中
* @param s 指向待查询集合的指针
* @param i 待查询的整数元素
* @return 元素存在返回true,不存在返回false
*/
bool contains(set s, int i);
/**
* 合并两个集合(去重)
* 注:union是C语言关键字,不能用作函数名,故改为set_union
* @param s1 指向第一个集合的指针
* @param s2 指向第二个集合的指针
* @return 指向合并后新集合的指针
*/
set set_union(set s1, set s2);
#endif // SET_H
Figure 5. Abstract data type in C header file
图 5. C 头文件中的抽象数据类型
(* 存在类型 SetImp:封装集合的内部表示 rep 与对应的操作接口 *)
type SetImp = ∃ rep . {
empty : rep,
isEmpty : rep → bool,
insert : rep × int → rep,
contains : rep × int → bool,
union : rep × rep → rep
}
or
(* 存在类型 SetImp:兼容格式(替换特殊数学符号) *)
type SetImp = exists rep . {
empty : rep,
isEmpty : rep -> bool,
insert : (rep * int) -> rep,
contains : (rep * int) -> bool,
union : (rep * rep) -> rep
}
Figure 6. Type of first-class ADT set implementations
图 6. 一等抽象数据类型集实现的类型
2.4 Module Systems
2.4 模块系统
The problem of unique implementation is solved by putting abstract data types into modules. ML [39] has a module system that allows multiple implementations for a given signature. The signature of an abstraction can be defined once, and multiple implementations written in separate modules. A client program can then be parameterized over the signature, so that a particular implementation can be selected during module binding. There can be multiple implementations in software repository, but one implementation is used in a given program.
唯一实现的问题可以通过将抽象数据类型放入模块中来解决。ML [39] 具有一个模块系统,允许为给定签名提供多个实现。抽象的签名可以定义一次,而多个实现可以写在不同的模块中。然后,客户端程序可以根据该签名进行参数化,以便在模块绑定期间选择特定的实现。软件仓库中可以有多个实现,但在给定程序中只能使用一个实现。
Allowing multiple implementations is good, but it is still not as flexible as might be desired. Consider a case where one part of a program needs to use the sorted list representation for integer sets, and another part of the program needs to use a binary tree representation. Having two different implementations for an abstraction is possible in ML, Ada, or Module-2. However, the two different parts of the program cannot interoperate. The different parts of the program cannot exchange integer sets. As a result the following program is illegal:
允许多个实现是很好的,但它仍然没有达到期望的灵活性。考虑这样一种情况:程序的一部分需要使用整数集的排序列表表示,而另一部分需要使用二叉树表示。在 ML、Ada 或 Modula-2 中,为一个抽象提供两个不同的实现是可能的。然而,程序的这两个不同部分无法互操作,也无法交换整数集。因此,以下程序是非法的:
fun f(a : set, b : set2) = union(a, b)
There is no union operation to combine a set with a set2. Given the signature we have defined, it is not even possible to write such an operation.
不存在将 set 和 set2 组合起来的 union 运算。根据我们定义的签名,甚至无法编写这样的运算。
The ML module system also allows multiple inter-related abstract types to be defined in a single module. For example, a personnel application might have data abstractions Employee and Department with operations to associate employees with departments.
ML 模块系统还允许在单个模块中定义多个相互关联的抽象类型。例如,一个人事应用程序可能包含 Employee(员工)和 Department(部门)这两个数据抽象,以及用于将员工与部门相关联的运算。
2.5 Formal Models
2.5 形式模型
Formal models of abstract data types are based on existential types [44]. In this model, ADT implementations are first class values with existential type, as defined in Figure 6.
抽象数据类型的形式模型基于存在类型 [44]。在该模型中,抽象数据类型实现是具有存在类型的一等值,如图 6 所定义。
A value of type SetImp is not a set, it is an implementation of a set abstraction. This two-level structure is essential to abstract data types: the first level is an implementation (SetImp) which publishes an abstract type name and a set of operations. Within that implementation, at the second level, are the values that represent elements of the named abstract type (set).
SetImp 类型的值不是一个集合,而是一个集合抽象的实现。这种两级结构对于抽象数据类型至关重要:第一级是实现(SetImp),它公布一个抽象类型名称和一组运算;在该实现内部的第二级是表示所命名抽象类型(set)元素的值。
This existential type is nearly identical to the signature in Figure 3. Intuitively, it asserts that "a type locally identified as rep exists such that the following operations are defined...".
这种存在类型几乎与图 3 中的签名完全相同。直观地说,它断言"存在一个本地标识为 rep 的类型,使得以下运算被定义......"。
Most practical languages do not support the full generality of first-class ADT implementations. Thus existential values and their usage are not familiar to most programmers. Explaining the mechanics of existential types is beyond the scope of this essay. They are described in Cardelli and Wegner's paper [10], and also covered thoroughly in Pierce's book, Types and Programming Languages [50].
大多数实际语言并不支持一等抽象数据类型实现的全部通用性。因此,大多数程序员对存在值及其用法并不熟悉。解释存在类型的机制超出了本文的范围。卡德利(Cardelli)和韦格纳(Wegner)的论文 [10] 中对此进行了描述,皮尔斯(Pierce)的著作《类型与程序设计语言》[50] 中也进行了详细阐述。
To use an existential value, it must be opened to declare a name for the representation type and access the operations. Each time an existential value is opened, it creates a completely new type name. Thus if an ADT implementation is opened twice, the values from one instance cannot be mixed with values from the other instance. In practice, it is standard to open all ADTs once in the global scope of the program. The ML module system has more sophisticated sharing mechanisms that allow multiple implementations to coexist, while allowing interoperability between multiple uses of the same abstractions. Even in this case values from the two different implementations cannot be mixed.
要使用存在值,必须将其打开,以声明表示类型的名称并访问运算。每次打开存在值时,都会创建一个全新的类型名称。因此,如果一个抽象数据类型实现被打开两次,那么来自一个实例的值不能与来自另一个实例的值混合使用。在实践中,标准做法是在程序的全局作用域中一次性打开所有抽象数据类型。ML 模块系统具有更复杂的共享机制,允许多个实现共存,同时支持同一抽象的多次使用之间的互操作性。即使在这种情况下,来自两个不同实现的值也不能混合使用。
2.6 Summary
2.6 总结
An abstract data type is a structure that implements a new type by hiding the representation of the type and supplying operations to manipulate its values. There are several ways in which abstract data types seem fundamentally right.
抽象数据类型是一种通过隐藏类型的表示形式并提供用于操作其值的运算来实现新类型的结构。在多个方面,抽象数据类型似乎在本质上是合理的。
- They work just like built-in types.
它们的工作方式与内置类型相同。 - They have sound proof techniques.
它们具有可靠的证明技术。
ADTs can be implemented efficiently, even for complex operations that require inspection of multiple abstract values.
即使对于需要查看多个抽象值的复杂运算,抽象数据类型也可以高效实现。
From a type theory viewpoint, abstract data types have a fundamental model based on existential types. Existential types are the dual of universal types, which are the basis for parametric polymorphism (called generics in Java and C#). The duality of universal and existential types is fundamental, and it leaves little room for any other alternative. What else could there be?
从类型理论的角度来看,抽象数据类型具有基于存在类型的基本模型。存在类型是全称类型的对偶,而全称类型是参数多态性(在 Java 和 C# 中称为泛型)的基础。全称类型和存在类型的对偶性是根本性的,几乎没有其他替代方案的空间。还能有别的形式吗?
- There is a solid connection to mathematics. An ADT has the same form as an abstract algebra: a type name representing an abstract set of values together with operations on the values. The operations can be unary, binary, multi-ary, or nullary (that is, constructors) and they are all treated uniformly.
与数学有着紧密的联系。抽象数据类型与抽象代数具有相同的形式:一个表示抽象值集合的类型名称,以及作用于这些值的运算。这些运算可以是一元、二元、多元或零元(即构造函数),并且都被统一处理。
All of these observations lead to the general conclusion that abstract data types are the way to define data abstractions. This belief is so deep-seated, so obviously correct, that it is almost impossible to think of any alternative. Many people take "abstract data type" and "data abstraction" as synonyms.
所有这些观察结果都得出了一个普遍结论:抽象数据类型是定义数据抽象的方式。这种信念根深蒂固,显然是正确的,以至于几乎无法想到任何替代方案。许多人将"抽象数据类型"和"数据抽象"视为同义词。
But abstract data types are not the only way to define data abstractions. The alternative is fundamentally different.
但抽象数据类型并不是定义数据抽象的唯一方式。另一种方式在本质上截然不同。
3. Objects
3. 对象
Object-oriented programming has its origin in the language Simula 67 [16]. Zilles published a paper describing a form of objects [62] before he started working with Liskov and switched his focus to ADTs. At the same time, Smalltalk [55, 28], Actors [25] and Scheme [56, 1] all explored objects in an untyped setting. Smalltalk especially formulated these ideas into a philosophically and practically compelling language and environment for object-oriented programming. As these languages were all dynamically typed, they did not immediately contribute to the ongoing dialog about statically typed data abstraction in the form of ADTs.
面向对象程序设计起源于 Simula 67 语言 [16]。齐勒斯(Zilles)在开始与利斯科夫(Liskov)合作并将研究重点转向抽象数据类型之前,发表了一篇描述对象形式的论文 [62]。与此同时,Smalltalk [55, 28]、Actors [25] 和 Scheme [56, 1] 都在无类型环境中探索了对象。特别是 Smalltalk,将这些思想构建成了一种在哲学和实践上都极具吸引力的面向对象程序设计语言和环境。由于这些语言都是动态类型的,它们并未立即为关于静态类型数据抽象(以抽象数据类型形式存在)的持续讨论做出贡献。
There is not a single universally accepted model of object-oriented programming. The model that I present here is recognized as valid by experts in the field, although there certainly are other valid models. In particular, I present objects in a denotational style which I believe exposes their core concepts in an intuitive way. I believe that operational approaches obscure the essential insights.
面向对象程序设计没有一个被普遍接受的单一模型。我在此呈现的模型得到了该领域专家的认可,尽管肯定还存在其他有效的模型。具体来说,我以指称语义的风格呈现对象,我认为这种风格能够以直观的方式揭示其核心概念。我认为操作语义的方法会掩盖关键的见解。
In this section I discuss a pure form of object-oriented programming with interfaces [9, 8]. The practical realities of popular languages are discussed in Section 5.
在本节中,我将讨论一种带有接口的纯面向对象程序设计形式 [9, 8]。流行语言的实际情况将在第 5 节中讨论。
To begin with, let us reconsider the idea of integer sets. One alternative way to formulate integer sets is as the characteristic function:
首先,让我们重新考虑整数集的概念。一种替代的表述方式是将整数集表示为特征函数:
t y p e I S e t = I n t → B o o l e a n type ISet = Int \to Boolean typeISet=Int→Boolean
The type Int → Boolean \text{Int} \to \text{Boolean} Int→Boolean is the type of functions from integer to boolean. It is clear that this is a different way to think about sets than the abstract data types presented in the previous section. Consider a few values of this type:
类型 Int → Boolean \text{Int} \to \text{Boolean} Int→Boolean 是从整数到布尔值的函数类型。显然,这与上一节中介绍的抽象数据类型对集合的理解方式不同。考虑该类型的几个值:
E m p t y = λ i . f a l s e I n s e r t ( s , n ) = λ i . ( i = n or s ( i ) ) U n i o n ( s 1 , s 2 ) = λ i . ( s 1 ( i ) or s 2 ( i ) ) \begin{align*} &Empty &&= \lambda i.\ false \\ &Insert (s, n) &&= \lambda i.\ (i = n \text{ or } s(i)) \\ &Union \left(s_{1}, s_{2}\right) &&= \lambda i.\ \left(s_{1}(i) \text{ or } s_{2}(i)\right) \end{align*} EmptyInsert(s,n)Union(s1,s2)=λi. false=λi. (i=n or s(i))=λi. (s1(i) or s2(i))
The expression λ i . e \lambda i.e λi.e represents a function with a parameter named i i i and a result expression e e e. The empty set is just a function that always returns false. Inserting n n n into a set s s s creates a function that tests for equality with n n n or membership in the functional set s s s. Given these definitions, it is easy to create and manipulate sets:
表达式 λ i . e \lambda i.e λi.e 表示一个带有参数 i i i 和结果表达式 e e e 的函数。空集是一个始终返回 false 的函数。将 n n n 插入集合 s s s 会创建一个函数,该函数检查输入值是否等于 n n n 或是否属于函数式集合 s s s。根据这些定义,创建和操作集合非常简单:
a = I n s e r t ( E m p t y , 1 ) a = Insert(Empty, 1) a=Insert(Empty,1)
b = I n s e r t ( a , 3 ) b = Insert(a, 3) b=Insert(a,3)
print a ( 3 ) a(3) a(3) -- results in true
In what sense could ISet be understood as defining a data abstraction for integer sets? We have been conditioned to think in terms of representations and operations. But these concepts do not apply in this case. One might say that this approach represents sets as functions from integers to booleans. But this 'representation' looks like an interface, not a concrete representation.
在什么意义上可以将 ISet 理解为定义了整数集的数据抽象?我们习惯于从表示形式和运算的角度思考,但这些概念在这种情况下并不适用。有人可能会说,这种方法将集合表示为从整数到布尔值的函数。但这种"表示形式"看起来更像一个接口,而非具体的表示形式。
Note that there is no "contains" operation, because the set itself i s is is the contains operation. Although it may not seem like it, the characteristic function is the pure object-oriented approach to defining integer sets. You may not accept this statement immediately, because I have not talked about any classes, methods, or inheritance, which are supposed to be characteristic of objects.
需要注意的是,这里没有"contains"运算,因为集合本身就是 contains 运算。尽管看起来可能并非如此,但特征函数是定义整数集的纯面向对象方法。你可能不会立即接受这一说法,因为我还没有讨论类、方法或继承,而这些被认为是对象的特征。
3.1 Object Interfaces
3.1 对象接口
ISet is an object-oriented interface to an integer set data abstraction. The function is an observation of the set, and a set is 'represented' by the observations that can be performed upon it. One problem with this interface is that there is no way to tell if the set is empty. A more complete interface is given in Figure 7. It is a record type with four components corresponding to methods. The field names of the record are in lowercase, to distinguish them from other uses of the same names. The result is a standard object-oriented interface for immutable integer set objects.
ISet 是整数集数据抽象的面向对象接口。函数是对集合的一种观察,而集合通过可对其执行的观察来"表示"。该接口的一个问题是无法判断集合是否为空。图 7 给出了一个更完整的接口,它是一个包含四个对应方法的记录类型。记录的字段名称为小写,以区别于这些名称的其他用法。结果是一个用于不可变整数集对象的标准面向对象接口。
interface ISet = {
isEmpty : bool,
contains : int → bool,
insert : int → ISet,
union : ISet → ISet
}
Figure 7. Object-oriented integer set interface
图 7. 面向对象的整数集接口
(* 递归类型定义集合:利用 μ this 绑定自身,实现自引用的集合记录 *)
Empty = μ this. {
isEmpty = true,
contains = λ i. false,
insert = λ i. Insert(this, i),
union = λ s. s
}
Insert(s, n) = if s.contains(n) then
s
else
μ this. {
isEmpty = false,
contains = λ i. (i = n or s.contains(i)),
insert = λ i. Insert(this, i),
union = λ s'. Union(this, s')
}
Union(s1, s2) = μ this. {
isEmpty = s1.isEmpty and s2.isEmpty,
contains = λ i. (s1.contains(i) or s2.contains(i)),
insert = λ i. Insert(this, i),
union = λ s'. Union(this, s')
}
Figure 8. Object-oriented integer set implementations
图 8. 面向对象的整数集实现
An essential observation is that object interfaces do not use type abstraction: there is no type whose name is known but representation is hidden. The type ISet is defined as a record type containing functions from known types to known types. Instead, objects use procedural abstraction to hide behavior. This difference has significant consequences for use of the two forms of data abstraction.
一个关键的观察结果是,对象接口不使用类型抽象:不存在名称已知但表示形式隐藏的类型。ISet 类型被定义为包含从已知类型到已知类型的函数的记录类型。相反,对象使用过程抽象来隐藏行为。这种差异对这两种数据抽象形式的使用产生了重大影响。
Object interfaces are essentially higher-order types, in the same sense that passing functions as values is higher-order. Any time an object is passed as a value, or returned as a value, the object-oriented program is passing functions as values and returning functions as values. The fact that the functions are collected into records and called methods is irrelevant. As a result, the typical object-oriented program makes far more use of higher-order values than many functional programs.
对象接口本质上是高阶类型,就像将函数作为值传递是高阶的一样。每当对象作为值传递或作为值返回时,面向对象程序都是在传递函数作为值并返回函数作为值。函数被收集到记录中并称为方法,这一事实并不重要。因此,典型的面向对象程序比许多函数式程序更频繁地使用高阶值。
The empty operation in the ADT is not part of the object-oriented ISet interface. This is because it is not an observation on sets, it is a constructor of sets.
抽象数据类型中的 empty 运算不是面向对象 ISet 接口的一部分。这是因为它不是对集合的观察,而是集合的构造函数。
3.2 Classes
3.2 类
Several implementations for the ISet interface are defined in Figure 8. The contains method is the same as the simple functions given above. The definitions have the same types, after redefining ISet.
图 8 定义了 ISet 接口的多个实现。contains 方法与上面给出的简单函数相同。在重新定义 ISet 后,这些定义具有相同的类型。
The special symbol μ \mu μ is used to define recursive values [50]. The syntax μ x . f \mu x. f μx.f defines a recursive value where the name x x x can appear in the expression f f f. The meaning of μ x . f \mu x.f μx.f is the value of f f f where occurrences of x x x represent recursive references within f f f to itself. Objects are almost always self-referential values, so every object definition uses μ \mu μ. As a convention, we use this as the name x x x, but any name could be used. The bound name x x x corresponds to self in Smalltalk or this in C++.
特殊符号 μ \mu μ 用于定义递归值 [50]。语法 μ x . f \mu x. f μx.f 定义了一个递归值,其中名称 x x x 可以出现在表达式 f f f 中。 μ x . f \mu x.f μx.f 的含义是 f f f 的值,其中 x x x 的出现表示 f f f 内部对自身的递归引用。对象几乎都是自引用值,因此每个对象定义都使用 μ \mu μ。按照惯例,我们使用 this 作为名称 x x x,但也可以使用任何其他名称。绑定名称 x x x 对应于 Smalltalk 中的 self 或 C++ 中的 this。
Each of these definitions correspond to a class in object-oriented programming. In this encoding, classes are only used to construct objects. The use of classes as types is discussed later.
这些定义中的每一个都对应于面向对象程序设计中的一个类。在这种编码方式中,类仅用于构造对象。类作为类型的使用将在后面讨论。
The definition of class state, or member variables, is different from Java [21]. In this encoding, the member variables are listed as parameters on the class, as in Scala [47].
类状态(或成员变量)的定义与 Java [21] 不同。在这种编码方式中,成员变量被列为类的参数,类似于 Scala [47]。
Several of the method bodies are repeated in these definitions. The insert method simply invokes the Insert class to create a new ISet object with one more member. Inheritance could be used to reuse a single method definition. Inheritance is often mentioned as one of the essential characteristics of object-oriented programming. However, inheritance will not be used in this section because it is neither necessary for, nor specific to, object-oriented programming [13].
这些定义中重复了多个方法体。insert 方法仅调用 Insert 类来创建一个多了一个成员的新 ISet 对象。可以使用继承来重用单个方法定义。继承通常被认为是面向对象程序设计的基本特征之一。然而,本节不会使用继承,因为它既不是面向对象程序设计所必需的,也不是其特有的 [13]。
A client of these classes looks just like a Java program, with the familiar method invocation style:
这些类的客户端看起来与 Java 程序非常相似,具有熟悉的方法调用风格:
Empty.insert(3).union(Empty.insert(1))
.insert(5).contains(4)
Selecting a function to invoke from a record containing function values is usually called dynamic binding. This term is not a very intuitive description of what is essentially an invocation of a higher-order function.
从包含函数值的记录中选择要调用的函数通常称为动态绑定。这个术语对于本质上是高阶函数调用的操作来说,并不是一个非常直观的描述。
Just as the ADT version of integer sets had two levels (set implementations and set values), the object-oriented version has two levels as well: interfaces and classes. A class is a procedure that returns a value satisfying an interface. Although Java allows class constructors to be overloaded with more than one definition, it is clear that one of the primary purposes of a class is to construct objects.
正如整数集的抽象数据类型版本有两个层次(集实现和集值)一样,面向对象版本也有两个层次:接口和类。类是一个返回满足接口的值的过程。尽管 Java 允许类构造函数重载多个定义,但显然类的主要目的之一是构造对象。
3.3 Autognosis
3.3 自认知性
A careful examination of the union operator in the object interface, in Figure 7, reveals that the parameter is typed by an interface. This means that the union method in a set object cannot know the representation of the other set being unioned. Fortunately, the union operator does not need to know the representation of other sets, it just needs to be able to test membership. The Union class in Figure 8 constructs an object that represents the union of two sets s 1 s_1 s1 and s 2 s_2 s2.
仔细观察图 7 中对象接口的 union 运算符,可以发现其参数由接口类型化。这意味着集合对象中的 union 方法无法知道要进行并集运算的另一个集合的表示形式。幸运的是,union 运算符不需要知道其他集合的表示形式,只需要能够测试成员资格即可。图 8 中的 Union 类构造了一个表示两个集合 s 1 s_1 s1 和 s 2 s_2 s2 并集的对象。
To me, the prohibition of inspecting the representation of other objects is one of the defining characteristics of object-oriented programming. I term this the autognostic principle:
在我看来,禁止查看其他对象的表示形式是面向对象程序设计的定义特征之一。我将其称为自认知原则:
An object can only access other objects through their public interfaces.
对象只能通过其他对象的公共接口访问它们。
Autognosis means 'self knowledge'. An autognostic object can only have detailed knowledge of itself. All other objects are abstract.
自认知(Autognosis)的意思是"自我认知"。具有自认知性的对象只能详细了解自身,所有其他对象都是抽象的。
The converse is quite useful: any programming model that allows inspection of the representation of more than one abstraction at a time is not object-oriented.
其逆命题也非常有用:任何允许同时查看多个抽象表示形式的程序设计模型都不是面向对象的。
One of the most pure object-oriented programming models yet defined is the Component Object Model (COM) [5, 22]. It enforces all of these principles rigorously. Programming in COM is very flexible and powerful as a result. There is no built-in notion of equality. There is no way to determine if an object is an instance of a given class.
组件对象模型(COM)[5, 22] 是迄今为止定义的最纯粹的面向对象程序设计模型之一。它严格执行所有这些原则。因此,COM 程序设计具有非常高的灵活性和强大的功能。COM 中没有内置的相等性概念,也无法确定一个对象是否是某个给定类的实例。
Autognosis has a profound impact on the software engineering properties of a system. In particular, an autognostic system is much more flexible. But at the same time, it can be more difficult to optimize operations. More significantly, there can be subtle relationships between the public interface of a class and the ability to implement behavior, as discussed in Section 3.5.
自认知性对系统的软件工程特性具有深远影响。特别是,具有自认知性的系统更加灵活。但同时,它也可能更难优化运算。更重要的是,类的公共接口与行为实现能力之间可能存在微妙的关系,这将在第 3.5 节中讨论。
3.4 Flexibility
3.4 灵活性
Object interfaces do not prescribe a specific representation for values, but instead accept any value that implements the required methods. As a result, objects are flexible and extensible with new representations. The flexibility of object interfaces can be illustrated easily by defining several new kinds of set. For example, the set of all even integers, and the set of all integers, are easily defined:
对象接口不为值规定特定的表示形式,而是接受任何实现了所需方法的值。因此,对象具有灵活性,并且可以通过新的表示形式进行扩展。通过定义几种新的集合类型,可以很容易地说明对象接口的灵活性。例如,所有偶数的集合和所有整数的集合可以轻松定义:
ml
Even = µ this. {
isEmpty = false,
contains = λ i. (i mod 2 = 0),
insert = λ i. Insert(this, i),
union = λ s. Union(this, s)
}
Full = µ this. {
isEmpty = false,
contains = λ i. true,
insert = λ i. this,
union = λ s. this
}
The Full set returns itself as the result of any insert or union operation. This example also illustrates that objects can easily represent infinite sets easily.
Full 集合在任何插入或并集运算后都返回自身。这个示例还表明,对象可以轻松地表示无限集合。
These new sets can be intermixed with the sets defined above. Other specialized sets can also be defined, including the set of prime numbers or sets representing intervals.
这些新集合可以与上面定义的集合混合使用。还可以定义其他专用集合,包括素数集合或表示区间的集合。
ml
Interval(n, m) = µ this. {
isEmpty = (n > m),
contains = λ i. (n ≤ i and i ≤ m),
insert = λ i. Insert(this, i),
union = λ s. Union(this, s)
}
There is no direct equivalent to this kind of flexibility when using abstract data types. This difference is fundamental: abstract data types have a private, protected representation type that prohibits tampering or extension. Objects have behavioral interfaces which allow definition of new implementations at any time.
使用抽象数据类型时,没有直接对应的这种灵活性。这种差异是根本性的:抽象数据类型具有私有、受保护的表示类型,禁止篡改或扩展;而对象具有行为接口,允许随时定义新的实现。
The extensibility of objects does not depend upon inheritance, but rather is an inherent property of object interfaces.
对象的可扩展性不依赖于继承,而是对象接口的固有属性。
3.5 Interface Trade-Offs
3.5 接口权衡
The choice of interfaces to an object can affect which operations are efficient, which are slow, and also which operations are impossible to define.
对象接口的选择会影响哪些运算高效、哪些运算缓慢,以及哪些运算无法定义。
For example, it is not possible to augment the integer set interface with an intersect operation, because it is not possible to determine if the intersection of two sets is empty without iterating over the sets. It is commonplace to include iterator methods in collection classes like the ones given here. But iterators do not interact well with infinite sets. Significant software engineering decisions must be made when designing interfaces, but these issues are rarely discussed in programming language textbooks.
例如,无法为整数集接口添加 intersect(交集)运算,因为如果不遍历集合,就无法确定两个集合的交集是否为空。在类似这里给出的集合类中包含迭代器方法是很常见的,但迭代器与无限集合的交互效果不佳。设计接口时必须做出重要的软件工程决策,但这些问题在程序设计语言教材中很少被讨论。
One problem with object interfaces is that efficiency considerations often allow implementation issues to influence the design of interfaces. Adding public methods that inspect the hidden representation can significantly improve efficiency. But it also restricts the flexibility and extensibility of the resulting interface.
对象接口的一个问题是,出于效率考虑,实现问题往往会影响接口的设计。添加查看隐藏表示形式的公共方法可以显著提高效率,但这也会限制所得接口的灵活性和可扩展性。
3.6 Optimization
3.6 优化
The optimization of the union method based on sorted lists is not possible in the object-oriented implementation, without modifying the interfaces. The optimization would be possible if the interfaces included a method to iterate the set contents in sorted order. Extending an object interface with more public methods can significantly improve performance, but it also tends to reduce flexibility. If the sets used a more sophisticated representation, optimizations might require more representational details to be exposed in the public interface.
在面向对象实现中,如果不修改接口,就无法基于排序列表对 union 方法进行优化。如果接口包含一个按排序顺序迭代集合内容的方法,则可以进行这种优化。通过添加更多公共方法来扩展对象接口可以显著提高性能,但这也往往会降低灵活性。如果集合使用更复杂的表示形式,优化可能需要在公共接口中暴露更多的表示细节。
There are several optimizations in the object implementation in Figure 8. The first is that the union method on empty sets is the identity function. The second is that the insert class does not always construct a new value. It only creates a new value if the number being inserted is not in the set already.
图 8 中的对象实现包含几项优化。第一项是,空集上的 union 方法是恒等函数;第二项是,insert 类并不总是构造新值,只有当要插入的数字不在集合中时才会创建新值。
It is not necessary to include insert and union as methods inside the object interface, because they can be defined as classes that operate on any sets. The optimization of union in the empty set class is one reason why it is useful to internalize the creation operations in the object interface.
不需要将 insert 和 union 作为方法包含在对象接口中,因为它们可以被定义为作用于任何集合的类。空集类中 union 方法的优化是将创建运算内置到对象接口中的一个原因。
3.7 Simulation
3.7 模拟
Object-oriented programming was first invented in the context of the simulation language Simula [16, 4]. The original intent was to simulate real-world systems, but I believe that simulation also allows one object to simulate, or pretend to be, another object.
面向对象程序设计最初是在模拟语言 Simula [16, 4] 的背景下发明的。其初衷是模拟现实世界的系统,但我认为模拟还允许一个对象模拟(或伪装成)另一个对象。
For example, the set Interval(2, 5) simulates a set that has integers 2 through 5 inserted into it. According to the principle of autognosis, there should be no way for any part of the program to distinguish between the interval and the inserted set. There are many operations that violate this principle, including pointer equality and instanceof tests.
例如,集合 Interval(2, 5) 模拟了一个插入了整数 2 到 5 的集合。根据自认知原则,程序的任何部分都不应该能够区分这个区间集合和插入了这些整数的集合。但有许多运算违反了这一原则,包括指针相等性判断和 instanceof 测试。
Simulation also provides a basis for verification of object-oriented programs. If two objects simulate each other, forming a bisimulation, then they are equivalent [41]. The concept of simulation and bisimulation are powerful mathematical concepts for analyzing the behaviors.
模拟还为面向对象程序的验证提供了基础。如果两个对象相互模拟,形成互模拟关系,那么它们是等价的 [41]。模拟和互模拟的概念是分析行为的强大数学工具。
3.8 Specifications and Verification
3.8 规范与验证
Object-oriented programming has caused significant problems for verification efforts [34, 45, 2]. This is not surprising if you understand that object-oriented programming is high-order procedural programming; objects are a form of first-class procedure value, which are passed as arguments and returned as values everywhere. It is difficult to verify programs that combine first-class higher-order functions and imperative state.
面向对象程序设计给验证工作带来了重大问题 [34, 45, 2]。如果理解了面向对象程序设计是高阶过程式程序设计,这一点就不足为奇了:对象是一等过程值的一种形式,在程序中被到处作为参数传递和作为值返回。验证结合了一等高阶函数和命令式状态的程序是很困难的。
A common complaint is that it is impossible to determine what code will execute when invoking a method. This is no different from common uses of first-class functions. If this objection is taken seriously, then similar complaints must be leveled against ML and Haskell, because it is impossible (in general) to determine what code will run when invoking a function value.
一个常见的抱怨是,无法确定调用方法时将执行什么代码。这与一等函数的常见用法没有区别。如果认真对待这一反对意见,那么也必须对 ML 和 Haskell 提出类似的抱怨,因为(通常情况下)无法确定调用函数值时将运行什么代码。
More significantly, it is possible to create bad objects easily. For example, the following object does not meet the specification for integer sets:
更重要的是,很容易创建不良对象。例如,以下对象不符合整数集的规范:
ml
Bad = µ this. {
isEmpty = (random() > 0.5),
contains = λ i. (time() mod i = 1),
insert = λ i. this,
union = λ s. Insert(3, s)
}
It reports that it is empty 50% of the time, and includes integers randomly based on time of day. Object interfaces can be given behavioral specifications, which can be verified to prohibit bad objects.
它有 50% 的概率报告自己为空集,并且根据一天中的时间随机包含整数。可以为对象接口提供行为规范,通过验证来禁止不良对象。
A more subtle problem is that objects do not necessarily encapsulate state effectively [27]. The problem arises when the state of an object is itself a collection of objects. There is a tendency for the internal objects to leak out and become external, at which point the abstract boundary is lost. This problem motivates the ongoing research effort on ownership types [6].
一个更微妙的问题是,对象并不一定能有效地封装状态 [27]。当一个对象的状态本身是一组对象时,就会出现这个问题。内部对象可能会泄露到外部,此时抽象边界就会消失。这个问题推动了关于所有权类型的持续研究 [6]。
One particularly difficult problem is that methods can be re-entered while they are running [46]. This causes problems for the standard Hoare-style approach to verification. In this approach, the class enforces an invariant, and every procedure (method) is given a precondition and a post-condition. The problem is that any method calls within the body of the method may loop back around and invoke some other method of the object being verified. In this case the other method may be called while the object is in an inconsistent state. It may also modify the object state, to invalidate the assumptions used to verify the original method.
一个特别困难的问题是,方法在运行过程中可能会被重入 [46]。这给标准的霍尔(Hoare)风格验证方法带来了问题。在这种方法中,类强制执行一个不变式,每个过程(方法)都有一个前置条件和一个后置条件。问题在于,方法体内部的任何方法调用都可能循环回来,调用正在被验证的对象的其他方法。在这种情况下,其他方法可能会在对象处于不一致状态时被调用,并且可能会修改对象状态,从而破坏验证原始方法所依据的假设。
Abstract data types do not usually have this problem because they are built in layers; each layer invokes lower layers, but lower layers do not invoke higher layers. Not all systems can be organized in this fashion, however. Complex systems often require notifications, or call-backs, which allow lower layers to call into higher layers. This can cause problems for verification if call-backs are included in ADTs.
抽象数据类型通常不会有这个问题,因为它们是分层构建的:每一层调用更低的层,但更低的层不会调用更高的层。然而,并非所有系统都能以这种方式组织。复杂系统通常需要通知(或回调),允许较低层调用较高层。如果抽象数据类型中包含回调,这也会给验证带来问题。
Object-oriented programming is designed to be as flexible as possible. It is almost as if it were designed to be as difficult to verify as possible.
面向对象程序设计旨在尽可能灵活,这几乎就像是为了尽可能难以验证而设计的。
3.9 Some More Theory
3.9 更多理论
The object interface has some interesting relationships to the abstract data type signature in Figures 3 and 6. First, the methods have one fewer argument than the corresponding operations in the ADT signature. In each case, the rep argument is missing. Second, the rep in the ADT operations corresponds to a recursive reference to ISet in each method of the object interface. The similarity can be expressed by the following type function:
对象接口与图 3 和图 6 中的抽象数据类型签名存在一些有趣的关系。首先,方法比抽象数据类型签名中的相应运算少一个参数,在每种情况下,都缺少 rep 参数。其次,抽象数据类型运算中的 rep 对应于对象接口每个方法中对 ISet 的递归引用。这种相似性可以通过以下类型函数表示:
ml
type F(t) = {
isEmpty : bool,
contains : int → bool,
insert : int → t,
union : t → t
}
The types given above can be rewritten in terms of F F F:
上述类型可以用 F F F 重写为:
I S e t = F ( I S e t ) ISet = F(ISet) ISet=F(ISet)
S e t I m p = ∃ r e p . r e p × ( r e p → F ( r e p ) ) SetImp = \exists rep. rep \times (rep \to F(rep)) SetImp=∃rep.rep×(rep→F(rep))
The original definition of SetImp is isomorphic to this new definition. To see the relationship, note that in r e p → F ( r e p ) rep \to F(rep) rep→F(rep) the function type with domain rep supplies the missing argument that appears in all the ADT operations. The cartesian product with rep supplies the empty constructor.
SetImp 的原始定义与这个新定义是同构的。要理解这种关系,请注意在 r e p → F ( r e p ) rep \to F(rep) rep→F(rep) 中,以 rep 为域的函数类型提供了所有抽象数据类型运算中缺失的参数。与 rep 的笛卡尔积提供了空构造函数。
The definition of SetImp above is the encoding of a final coalgebra X → F ( X ) X \to F(X) X→F(X) into the polymorphic λ-calculus [19]. The only problem is that F F F is not a covariant functor, because of the union method. This encoding also corresponds to the greatest fixedpoint of F F F, which corresponds to the recursive type ISet. The relationship between coalgebra and objects is an active research topic [29].
上面 SetImp 的定义是将最终余代数 X → F ( X ) X \to F(X) X→F(X) 编码到多态 λ 演算中 [19]。唯一的问题是,由于 union 方法的存在, F F F 不是一个协变函子。这种编码也对应于 F F F 的最大不动点,而最大不动点对应于递归类型 ISet。余代数与对象之间的关系是一个活跃的研究课题 [29]。
3.10 Summary
3.10 总结
An object is a value exporting a procedural interface to data or behavior. Objects use procedural abstraction for information hiding, not type abstraction. Object and their types are often recursive. Objects provide a simple and powerful form of data abstraction. They can be understood as closures, first-class modules, records of functions, or processes. Objects can also be used for procedural abstraction.
对象是一个导出数据或行为的过程接口的值。对象使用过程抽象进行信息隐藏,而不是类型抽象。对象及其类型通常是递归的。对象提供了一种简单而强大的数据抽象形式,可以将其理解为闭包、一等模块、函数记录或进程。对象也可以用于过程抽象。
Unlike abstract data types, many people find objects to be deeply disturbing. They are fundamentally higher-order, unlike abstract data types. With an object, you are never quite certain what it is going to do: What method is being called? What kind of object is it really?
与抽象数据类型不同,许多人认为对象令人深感困惑。它们在本质上是高阶的,这与抽象数据类型不同。对于一个对象,你永远无法完全确定它会做什么:将调用什么方法?它实际上是什么类型的对象?
On the other hand, many people find objects to be deeply appealing in their simplicity and flexibility. They do not require complex type systems. Inheritance allows recursive values to be extended in powerful ways.
另一方面,许多人被对象的简单性和灵活性深深吸引。它们不需要复杂的类型系统,继承允许以强大的方式扩展递归值。
The fact that objects are autognostic, so that they can only know themselves, is also confusing. On the one hand, it interferes with desirable optimizations that require inspection of multiple representations. One solution is to expose representational details in the object's interface, which limits flexibility. The benefits of autognosis are often subtle and only realized as a system grows and evolves.
对象具有自认知性(只能了解自身)这一事实也令人困惑。一方面,它会干扰需要查看多个表示形式的理想优化。一种解决方案是在对象的接口中暴露表示细节,但这会限制灵活性。自认知性的好处通常是微妙的,只有在系统不断发展和演进过程中才能体现出来。
Finally, as parts of a long and rich tradition of abstraction, objects too-not just ADTs-are fundamentally grounded in mathematics.
最后,作为漫长而丰富的抽象传统的一部分,对象(而不仅仅是抽象数据类型)在本质上也植根于数学。
4. Relationships between ADTs and OOP
4. 抽象数据类型与面向对象程序设计(OOP)的关系
Although object-oriented programming and abstract data types are two distinct forms of data abstraction, there are many relationships between them. Many simple abstractions can be implemented in either style, although the usages of the resulting programs is quite different.
尽管面向对象程序设计和抽象数据类型是两种不同的数据抽象形式,但它们之间存在许多联系。许多简单的抽象可以用两种风格中的任何一种实现,尽管所得程序的用法有很大不同。
4.1 Static Versus Dynamic Typing
4.1 静态类型与动态类型
One of the most significant differences between abstract data types and objects is that objects can be used to define data abstractions in a dynamically typed language.
抽象数据类型和对象之间最显著的差异之一是,对象可以用于在动态类型语言中定义数据抽象。
Objects do not depend upon a static type system; all they need is some form of first-class functions or processes.
对象不依赖于静态类型系统,它们只需要某种形式的一等函数或进程。
Abstract data types depend upon a static type system to enforce type abstraction. It is not an accident that dynamic languages use objects instead of user-defined abstract data types. Dynamic languages typically support built-in abstract data types for primitive types; the type abstraction here is enforced by the runtime system.
抽象数据类型依赖于静态类型系统来强制执行类型抽象。动态语言使用对象而不是用户定义的抽象数据类型并非偶然。动态语言通常为基本类型支持内置的抽象数据类型,这里的类型抽象由运行时系统强制执行。
Type systems only enforce structural properties of programs; they do not ensure conformance to a specification. But with ADTs, the type system can ensure that if the ADT implementation is correct, then all programs based on it will operate correctly. The type system prevents outside clients from tampering with the implementation. Pure object interfaces allow any structurally compatible implementation, thus the type system does not prohibit bad implementations from being used.
类型系统仅强制执行程序的结构属性,不保证符合规范。但对于抽象数据类型,如果抽象数据类型实现正确,类型系统可以确保所有基于它的程序都能正确运行。类型系统防止外部客户端篡改实现。纯对象接口允许任何结构兼容的实现,因此类型系统不禁止使用不良实现。
4.2 Simple and Complex Operations
4.2 简单运算与复杂运算
One point of overlap between objects and abstract data types is that simple data abstractions can be implemented equally well in either style. The difference between simple and complex data abstractions is whether or not they have operations, like the union operation in the set ADT, that inspect the representation of multiple abstract values.
对象和抽象数据类型的一个重叠之处是,简单的数据抽象可以用两种风格同样好地实现。简单数据抽象和复杂数据抽象的区别在于,它们是否具有像集合抽象数据类型中的 union 运算那样查看多个抽象值表示形式的运算。
In this essay I call an operation "complex" if it inspects multiple representations. In some of the literature complex operations are called "binary". Literally speaking, a binary operation is one that accepts two inputs of the abstract type. For an object, a binary method is one that takes a second value of the abstract type, in addition to the abstract value whose method is being invoked. According to these definitions, union is always binary.
在本文中,如果它会查看多个表示形式,则本文将这类运算称为"复杂运算"。在部分文献中,复杂运算也被称为"二元运算"。从字面意义上讲,二元运算是指接受两个抽象类型输入的运算。对于对象而言,二元方法是指除了调用该方法的抽象值之外,还接受另一个抽象类型值作为参数的方法。根据这些定义,union 运算始终是二元的。
However, not all binary methods are complex. This depends on how the operation is implemented. A binary operation can be implemented by invoking public methods on the abstract arguments. Doing so does not require the representation of the two values to be inspected. The union operation in Figures 1 and 2 are simple. But the union operation in Figure 4 is complex.
然而,并非所有二元方法都是复杂的。这取决于运算的实现方式。二元运算可以通过调用抽象参数的公共方法来实现,这种实现方式不需要查看两个值的表示形式。图 1 和图 2 中的 union 运算属于简单运算,但图 4 中的 union 运算则是复杂运算。
Pure object-oriented programming does not support complex operations. Doing so requires inspection of another object's representation, using instance-of or similar means.
纯面向对象程序设计不支持复杂运算。若要实现复杂运算,需要通过 instanceof 或类似方式查看另一个对象的表示形式。
Any abstract data type with only simple operations can be implemented without loss of functionality, but more simply and extensibly, with objects.
任何仅包含简单运算的抽象数据类型都可以用对象实现,且不会损失功能,同时实现方式更简洁、可扩展性更强。
Consider an ADT implementation with the following type, where t does not appear in (\sigma_{i}), (\tau_{j}), (\rho_{j}), or (\delta_{k}).
考虑一个具有以下类型的抽象数据类型实现,其中 (t) 未出现在 (\sigma_{i})、(\tau_{j})、(\rho_{j}) 或 (\delta_{k}) 中。
ml
F(t) = {
ci : σi →t,
oj : t ×τj →ρj,
mk : t ×δk →t
}
ADT : ∃t.F(t)
The methods have been partitioned into constructors, observations and mutators. The constructors (c_{i}) create values of type t. The observations take an input of type t with additional arguments and produce values of some other type. The mutators take an input of type t and produce a result of type t. These patterns are exhaustive, because there are no complex methods. (\tau_{j}) or (\delta_{k}) is unit if there are no other arguments besides t for a given operation.
这些方法被划分为构造函数、观察函数和修改函数。构造函数 (c_{i}) 用于创建 (t) 类型的值;观察函数接受一个 (t) 类型输入和其他额外参数,并返回其他类型的值;修改函数接受一个 (t) 类型输入,并返回一个 (t) 类型结果。由于不存在复杂方法,这些模式涵盖了所有可能情况。如果某个运算除了 (t) 之外没有其他参数,则 (\tau_{j}) 或 (\delta_{k}) 为单位类型。
Create a new type I to represent the object interface:
定义一个新类型 (I) 来表示对象接口:
ml
interface I = {
oj : τj →ρj,
mk : δk →I
}
For the constructors, define a family of functions that invoke a wrap function that creates the object. The notation for this example is that of Pierce's book Types and Programming Languages [50].
对于构造函数,定义一组函数,这些函数调用一个 wrap 函数来创建对象。本示例使用的符号源自皮尔斯(Pierce)的著作《类型与程序设计语言》[50]。
ml
Ci : σi →T
Ci(x : σi) =
let {*t, p} = ADT in
wrap[t](p, p.ck(x))
wrap : ∀t. F(t) →I
wrap[t](p, x) = {
oj = λa:τj. p.mj(x, a);
mk = λa:δk. wrap[t](p, p.mk(x, a));
}
The constructors first open the ADT, construct an appropriate value of type t and then wrap it as an object. This transformation is a direct corollary of the basic definitions of ADTs [44] and objects [13].
构造函数首先打开抽象数据类型,构造一个合适的 (t) 类型值,然后将其包装为对象。这种转换是抽象数据类型[44] 和对象 [13] 基本定义的直接推论。
The converse, however, is not necessarily true. It is possible to take any fixed set of object-oriented classes that implement an interface and convert them to an ADT. One simple way to do it is to use objects as the representation type for the ADT, but rewriting the abstractions is always possible. However, the result is no longer extensible, so the conversion incurs a loss of flexibility.
然而,其逆命题并不一定成立。可以将任何一组实现了某个接口的固定面向对象类转换为抽象数据类型。一种简单的方法是将对象用作抽象数据类型的表示类型,但始终可以通过重写抽象来实现转换。不过,转换后的结果将不再具有可扩展性,因此这种转换会导致灵活性的损失。
4.3 Extensibility Problem
4.3 可扩展性问题
When implementing data abstractions, there are two important dimensions of extensibility. New representational variants can be added, or new operations can be added. This observation suggests it is natural to organize the behaviors into a matrix with representations on one axis and observations/actions on the other. Then extensibility can be viewed as adding a column or row to the matrix.
在实现数据抽象时,可扩展性包含两个重要维度:可以添加新的表示变体,或者添加新的运算。这一观察结果表明,将行为组织成一个矩阵是很自然的------矩阵的一个轴为表示形式,另一个轴为观察/操作。此时,可扩展性可以看作是向矩阵中添加一列(新运算)或一行(新表示形式)。
In the 1970s, as work began on understanding data abstraction, Reynolds published a prophetic paper that identified the key differences between objects and abstract data types [52, 23], although I think he did not realize he was describing objects. Reynolds noticed that abstract data types facilitate adding new operations, while "procedural data values" (objects) facilitate adding new representations. Since then, this duality has been independently discovered at least three times [18, 14, 33].
20 世纪 70 年代,随着人们开始深入理解数据抽象,雷诺兹(Reynolds)发表了一篇具有前瞻性的论文,指出了对象与抽象数据类型之间的关键差异 [52, 23],尽管我认为他当时并未意识到自己描述的是对象。雷诺兹发现,抽象数据类型便于添加新运算,而"过程式数据值"(即对象)便于添加新表示形式。此后,这一双重性至少被独立发现了三次 [18, 14, 33]。
This duality has practical implications for programming [14]. Abstract data types define operations that collect together the behaviors for a given action. Objects organize the matrix the other way, collecting together all the actions associated with a given representation. It is easier to add new operations in an ADT, and new representations using objects. Although not discussed in detail here, object-oriented programs can use inheritance to add new operations [14].
这种双重性对程序设计具有实际意义 [14]。抽象数据类型定义的运算会将特定操作相关的行为集中在一起;而对象则以相反的方式组织矩阵,将特定表示形式相关的所有操作集中在一起。在抽象数据类型中添加新运算更简单,而使用对象添加新表示形式更便捷。尽管本文未详细讨论,但面向对象程序可以通过继承来添加新运算 [14]。
Wadler later gave the problem a catchy name, the "Expression Problem", based on the well-known canonical example of a data abstraction for expressions with operations to print, evaluate, or perform other actions [58].
后来,瓦德勒(Wadler)基于一个著名的标准示例------带有打印、求值或其他操作的表达式数据抽象------为这个问题起了一个生动的名称:"表达式问题"[58]。
The extensibility problem has been solved in numerous ways, and it still inspires new work on extensibility of data abstractions [48, 15]. Multi-methods are another approach to this problem [11]. More complex variations, involving integration of independent extensions, have still not been completely resolved.
可扩展性问题已经有多种解决方案,并且仍然启发着关于数据抽象可扩展性的新研究 [48, 15]。多方法是解决该问题的另一种途径 [11]。而涉及独立扩展集成的更复杂变体,至今仍未完全解决。
4.4 Imperative State and Polymorphism
4.4 命令式状态与多态性
Issues of imperative state and polymorphism have been avoided in this essay because they are, for the most part, orthogonal to the issues of data abstraction. The integer sets discussed in this paper can be generalized to polymorphic sets, set<. These generalization can be carried out for either abstract data types or objects. While there is significant work involved in doing so, the issues of polymorphism do not interact very much with the issues relating to data abstraction.
本文回避了命令式状态和多态性相关问题,因为它们在很大程度上与数据抽象问题相互独立。本文讨论的整数集可以推广为多态集 set<,这种推广既适用于抽象数据类型,也适用于对象。尽管推广过程涉及大量工作,但多态性相关问题与数据抽象相关问题的交互并不多。
Both abstract data types and objects can be defined in either a pure functional or imperative style. Pure functional objects are quite common, although not as common as they could be. Issues of state are largely orthogonal from a language design viewpoint. However, imperative programming has a significant impact on verification.
抽象数据类型和对象都可以采用纯函数式或命令式风格定义。纯函数式对象相当常见,尽管其普及程度仍有提升空间。从语言设计的角度来看,状态相关问题在很大程度上是独立的。然而,命令式程序设计会对验证工作产生重大影响。
5. Reality
5. 实际应用现状
The reality in practical programming languages is not so pure and simple. It turns out that statically typed object-oriented languages all support both pure objects and also a form of abstract data types. They also support various hybrids.
实际程序设计语言的情况并非如此纯粹和简单。事实证明,静态类型的面向对象语言均同时支持纯对象和某种形式的抽象数据类型,还支持多种混合形式。
5.1 Object-Oriented Programming in Java
5.1 Java 中的面向对象程序设计
While Java is not a pure object-oriented language, it is possible to program in a pure object-oriented style by obeying the following rules:
尽管 Java 并非纯面向对象语言,但通过遵循以下规则,可以采用纯面向对象风格进行程序设计:
- Classes only as constructors: A class name may only be used after the keyword new.
- 类仅作为构造函数:类名只能在关键字 new 之后使用。
- No primitive equality: The program must not use primitive equality (==). Primitive equality exposes representation and prevents simulation of one object by another.
- 禁用原生相等性:程序不得使用原生相等性判断(==)。原生相等性会暴露表示形式,阻碍一个对象对另一个对象的模拟。
In particular, classes may not be used as types to declare members, method arguments or return values. Only interfaces may be used as types. Also, classes may not be used in casts or to test with instanceof.
具体而言,类不能用作声明成员、方法参数或返回值的类型,只能使用接口作为类型。此外,类不能用于强制类型转换或 instanceof 测试。
This is generally considered good object-oriented style. But what if you were forced to follow this style, because the language you were using required it? Smalltalk comes close. Since Smalltalk is dynamically typed, classes are only used as constructors. It does support instanceof, although it is rarely used.
这通常被认为是良好的面向对象风格。但如果由于所使用的语言要求,你不得不遵循这种风格,会怎么样呢?Smalltalk 与此非常接近。由于 Smalltalk 是动态类型的,类仅用作构造函数。它确实支持 instanceof,尽管很少被使用。
One other way to break encapsulation in Java is through the use of reflection, although this is not common when writing most programs. Reflection is useful when writing metatools (e.g. debuggers) and program generators. However, use of reflection appears to be growing more widespread. More research is needed to quantify the effect of reflection on data abstraction and encapsulation.
在 Java 中,另一种破坏封装性的方式是使用反射,尽管在编写大多数程序时这种情况并不常见。反射在编写元工具(如调试器)和程序生成器时非常有用。然而,反射的使用似乎越来越广泛。需要开展更多研究来量化反射对数据抽象和封装性的影响。
5.2 ADTs in Java
5.2 Java 中的抽象数据类型
It takes a little more work to encode abstract data types in statically typed object-oriented programming languages.
在静态类型的面向对象程序设计语言中,编码实现抽象数据类型需要更多的工作。
java
class ASet {
// declare representation fields
// 声明表示字段
// no public constructor
// 无公共构造函数
static ASet empty();
static ASet insert(ASet s, int n);
static bool contains(ASet s, int n);
static ASet union(ASet a, ASet b);
}
Using a class name as a type introduces type abstraction. A class hides its representation. Object-oriented languages do not always support the sums-of-products data structures found in other languages, but such types can be simulated using an abstract class with a subclass for each variant in the sum type. Pattern matching on these types can then be implemented by using instanceof and appropriate casts. One direct encoding uses static methods for all the ADT operations, and the class just holds the representation.
将类名用作类型会引入类型抽象。类会隐藏其表示形式。面向对象语言并不总是支持其他语言中存在的积之和数据结构,但可以通过抽象类(为积之和类型中的每个变体定义一个子类)来模拟此类类型。然后,可以通过 instanceof 和适当的强制类型转换来实现对这些类型的模式匹配。一种直接的编码方式是为所有抽象数据类型运算使用静态方法,而类仅用于存储表示形式。
java
class CSet {
// declare representation fields
// 声明表示字段
// no public constructor
// 无公共构造函数
static CSet empty();
CSet insert(Integer n);
bool contains(Integer n);
CSet union(CSet b);
}
To summarize, when a class name is used as a type, it represents an abstract data type.
总而言之,当类名被用作类型时,它表示的是抽象数据类型。
5.3 Haskell Type Classes
5.3 Haskell 类型类
Type classes in Haskell [30] are a powerful mechanism for parameterization and extensibility [59]. A type class is an algebraic signature that associates a group of operations with one or more type names. A type class for integer sets, defined below, is very similar to the existential type in Figure 6, but in this case uses curried functions:
Haskell 中的类型类 [30] 是一种强大的参数化和可扩展机制 [59]。类型类是一种代数签名,将一组运算与一个或多个类型名相关联。下面定义的整数集类型类与图 6 中的存在类型非常相似,但此处使用的是柯里化函数:
haskell
class Set s where
empty :: s
isEmpty :: s →Bool
insert :: s →Int →s
contains:: s →Int →Bool
union :: s →s →s
Functions can be written using the generic operations:
可以使用通用运算编写函数:
haskell
test :: Set s ⇒s →Bool
test s = contains(union(insert(s, 3), insert(empty, 4)), 5)
The qualification on the type of test indicates that the type s is any instance of Set. Any type can made an instance of Set by defining the appropriate operations:
test 类型的限定条件表明,类型 s 是 Set 的任意实例。通过定义相应的运算,任何类型都可以成为 Set 的实例:
haskell
instance Set [Int] where
empty = []
isEmpty = (== [])
insert = flip (:)
contains = flip elem
union = (++)
Instance definitions can connect type classes with actual types that come from different libraries, and all three parts can be written without prearranged knowledge of the others. As a result, type classes are flexible and extensible.
实例定义可以将类型类与来自不同库的实际类型相关联,且这三部分(类型类、实际类型、实例定义)的编写无需预先知晓彼此。因此,类型类具有灵活性和可扩展性。
A type can only be an instance of a class in one way. For example, there is no way to define sorted lists and lists as both being different instances of Set. This restriction can always be bypassed by creating a new type that is a tagged or labeled version of an existing type, although this can introduce undesirable bookkeeping when tagging values.
一种类型只能以一种方式成为某个类的实例。例如,无法将排序列表和普通列表同时定义为 Set 的不同实例。不过,始终可以通过创建现有类型的带标签版本(一种新类型)来规避这一限制,尽管在为值添加标签时可能会引入不必要的记录工作。
Type classes are similar to object interfaces in allowing a method to operate on any value that has the necessary operations.
类型类与对象接口相似,都允许方法作用于任何具有所需运算的值。
On the other hand, type classes are based on algebraic signatures as in abstract data types. The main difference is that type classes do not enforce any hiding of representations. As a result, they provide parametric abstraction over type signatures, without the information hiding aspect of ADTs. Given the success of Haskell, one might argue that encapsulation is somewhat overrated.
另一方面,类型类与抽象数据类型一样,都基于代数签名。主要区别在于,类型类不强制隐藏表示形式。因此,它们提供了对类型签名的参数化抽象,却不具备抽象数据类型的信息隐藏特性。考虑到 Haskell 的成功,有人可能会认为封装性在一定程度上被高估了。
Type classes are not autognostic. When a function is qualified by a type class, the same type instance must be used for all values within that function. Type classes do not allow different instances to interoperate. There are other ways in which Haskell provides abstraction and information hiding, for example, by parametericity.
类型类不具有自认知性。当函数被类型类限定后,该函数内的所有值都必须使用相同的类型实例。类型类不允许不同实例之间互操作。Haskell 还通过其他方式提供抽象和信息隐藏,例如参数化。
On the other hand, the object-oriented data abstractions given here can also be coded in Haskell. In addition, an existential type can be used to combine the type class operations with a value to create a form of object [31]. In this encoding, the type class acts as a method table for the value.
另一方面,本文给出的面向对象数据抽象也可以在 Haskell 中编码实现。此外,可以使用存在类型将类型类运算与值组合起来,创建一种形式的对象 [31]。在这种编码方式中,类型类充当该值的方法表。
5.4 Smalltalk
5.4 Smalltalk 语言
There are many interesting aspects of the Smalltalk language and system. One curious fact is that Smalltalk has no built-in control flow and very few built-in types. To see how this works, consider the Smalltalk implementation of Booleans.
Smalltalk 语言和系统有许多有趣的特性。一个奇特的事实是,Smalltalk 没有内置的控制流,且内置类型非常少。为了理解其工作原理,我们来看看 Smalltalk 中布尔类型的实现。
There are two Boolean classes in Smalltalk, named True and False. They both implement a two-argument method called ifTrue:ifFalse:.
Smalltalk 中有两个布尔类,分别名为 True 和 False。它们都实现了一个带有两个参数的方法 ifTrue:ifFalse:。
smalltalk
class True
ifTrue: a ifFalse: b
^ a value
class False
ifTrue: a ifFalse: b
^ b value
Method names in Smalltalk are sequences of keyword labels, where each keyword identifies a parameter. The body of the True method returns the result of sending the value message to the first argument, a. The body of the False method returns the second argument, b.
Smalltalk 中的方法名是关键字标签的序列,每个关键字对应一个参数。True 类的方法体返回向第一个参数 a 发送 value 消息的结果;False 类的方法体返回第二个参数 b。
The value method is needed because a and b represent thunks or functions with a single dummy argument. A thunk is created by enclosing statements in square brackets. A conditional is implemented by sending two thunks to a Boolean value.
需要 value 方法是因为 a 和 b 表示的是块(thunk)或带有单个哑参数的函数。块通过将语句括在方括号中创建。条件语句通过向布尔值传递两个块来实现。
smalltalk
(x > y) ifTrue: [ x print ] ifFalse: [ y print ]
The implementation of Booleans and conditionals in Smalltalk is exactly the same as for Church booleans in the λ-calculus [12]. Given that objects are the only way to implement data abstraction in an untyped language, it makes sense that the same kind of data would be used in Smalltalk and the untyped λ-calculus. It would be possible to implement a RandomBoolean class that acts as true or false based on the flip of a coin, or a LoggingBoolean that traced how many computations were performed. These booleans could be used anywhere that the standard booleans are used, including in low-level system code.
Smalltalk 中布尔类型和条件语句的实现与 λ 演算中的邱奇布尔值完全相同 [12]。由于在无类型语言中,对象是实现数据抽象的唯一方式,因此 Smalltalk 和无类型 λ 演算使用相同类型的数据是合理的。我们可以实现一个 RandomBoolean 类,其值根据抛硬币结果为真或假;或者实现一个 LoggingBoolean 类,用于跟踪执行的计算次数。这些布尔类型可以在标准布尔类型适用的任何地方使用,包括底层系统代码。
Smalltalk numbers are not Church numerals, although they share some characteristics. In particular, numbers in Smalltalk implement iteration, just as they do in the Church encoding. Similarly, Smalltalk collections implement a reduce operator analogous to the Church encoding of lists.
Smalltalk 中的数字并非邱奇数字,尽管它们具有一些相似特性。特别是,Smalltalk 中的数字实现了迭代功能,这与邱奇编码中的数字类似。同样,Smalltalk 中的集合实现了一个归约运算符,与列表的邱奇编码类似。
The Smalltalk system does include a primitive integer type, implemented as an ADT for efficiency. The primitive types are wrapped in high-level objects, which communicate with each other through an ingenious interface to perform coercions and implement both fixed and infinite precision arithmetic. Even with these wrappers, I claim that Smalltalk is not truly "objects all the way down" because the implementation depends upon primitive ADTs. It may be that objects are simply not the best way to implement numbers. More analysis is needed to determine the efficiency costs and whether the resulting flexibility is useful in practice.
Smalltalk 系统确实包含一个基本整数类型,为了提高效率,该类型以抽象数据类型的形式实现。这些基本类型被包装在高级对象中,这些对象通过巧妙的接口相互通信,以执行类型转换并实现固定精度和无限精度算术。尽管有这些包装,我认为 Smalltalk 并非真正意义上的"全程对象",因为其实现依赖于基本抽象数据类型。或许对象并不是实现数字的最佳方式。需要开展更多分析来确定其效率成本,以及由此带来的灵活性在实践中是否有用。
One conclusion you could draw from this analysis is that the untyped λ-calculus was the first object-oriented language.
从这一分析中可以得出一个结论:无类型 λ 演算是第一种面向对象语言。
6. Discussion
6. 讨论
Academic computer science has generally not accepted the fact that there is another form of data abstraction besides abstract data types. Hence the textbooks give the classic stack ADT and then say "objects are another way to implement abstract data types". Sebesta focuses on imperative data abstractions without complex methods, using stacks as an example, so it is not surprising that he does not see any difference between objects and ADTs [54]. Tucker and Noonan also illustrate data abstraction with stacks [57]. But they also provide a Java implementation of a type-checker and evaluator that appears to have been translated directly from ML case statements, implemented using instanceof in Java. The resulting program is a poor illustration of the capabilities of object-oriented programming.
学术界的计算机科学领域普遍没有接受"除抽象数据类型外,还存在另一种数据抽象形式"这一事实。因此,教材中通常会给出经典的栈抽象数据类型示例,然后称"对象是实现抽象数据类型的另一种方式"。塞贝斯塔(Sebesta)以栈为例,重点讨论了不包含复杂方法的命令式数据抽象,因此他没有发现对象与抽象数据类型之间的差异并不奇怪 [54]。塔克(Tucker)和努南(Noonan)也使用栈来阐释数据抽象 [57]。但他们提供的类型检查器和求值器的 Java 实现,似乎是直接从 ML 的 case 语句翻译而来,并使用 Java 的 instanceof 实现。由此产生的程序未能很好地展现面向对象程序设计的能力。
Some textbooks do better than others. Louden [38] and Mitchell [43] have the only books I found that describe the difference between objects and ADTs, although Mitchell does not go so far as to say that objects are a distinct kind of data abstraction.
有些教材的表现相对较好。劳登(Louden)[38] 和米切尔(Mitchell)[43] 的著作是我发现的仅有的两本描述了对象与抽象数据类型之间差异的书籍,尽管米切尔并未明确指出对象是一种独特的数据抽象形式。
The rise of objects interrupted a long-term project in academia to create a formal model of data based on ADTs. Several widely used languages were designed with ADTs as their fundamental form of data abstraction: ML, Ada, and Modula-2. As object-oriented programming became more prominent, these languages have adopted or experimented with objects.
对象的兴起中断了学术界一项长期项目------基于抽象数据类型创建数据的形式化模型。有几种广泛使用的语言在设计时将抽象数据类型作为其基本的数据抽象形式,例如 ML、Ada 和 Modula-2。随着面向对象程序设计的日益普及,这些语言也开始采用或尝试引入对象。
Object-oriented programming has also been subject to extensive academic research. However, I believe the academic community as a whole has not adopted objects as warmly as they were received in industry. I think there are three reasons for this situation. One is that the conceptual foundations for objects, discussed here, are not widely known. The second is that academics tend to be more interested in correctness than flexibility. Finally, programming language researchers tend to work with data abstractions that are more natural as ADTs.
面向对象程序设计也受到了广泛的学术研究。然而,我认为整个学术界对对象的接受程度并不像工业界那样热烈。我认为造成这种情况的原因有三点:第一,本文所讨论的对象的概念基础尚未被广泛知晓;第二,学者们往往更关注正确性,而非灵活性;第三,程序设计语言研究人员通常研究的是更适合作为抽象数据类型的数据抽象。
There are significant design decisions involved in choosing whether to implement a given abstraction with ADTs or with objects. In her history of CLU [35], Barbara Liskov discussed many of these issues, and gave good arguments for her choice of the ADT style. For example, she writes that "although a program development support system must store many implementations of a type..., allowing multiple implementations within a single program seems less important." This may be true if the types in question are stacks and integer sets, but when the abstractions are windows, file systems, or device drivers, it is essential to allow multiple implementations running within the same system.
选择使用抽象数据类型还是对象来实现某个特定抽象,涉及到重要的设计决策。芭芭拉·利斯科夫(Barbara Liskov)在其关于 CLU 的历史回顾 [35] 中,讨论了许多相关问题,并为其选择抽象数据类型风格提供了充分的理由。例如,她写道:"尽管程序开发支持系统必须存储一种类型的多个实现......但在单个程序中允许多个实现似乎并不那么重要。"如果所讨论的类型是栈和整数集,这可能是正确的;但当抽象是窗口、文件系统或设备驱动程序时,允许在同一系统中运行多个实现则至关重要。
To me it is unfortunate that Liskov also wrote that "CLU is an object-oriented language in the sense that it focuses attention on the properties of data objects and encourages programs to be developed by considering abstract properties of data." I believe that there is no technically or historically meaningful sense in which CLU is an object-oriented language. I do believe that modern object-oriented languages have been influenced by CLU (especially in the encapsulation of representation) but this does not make CLU into an object-oriented language.
令我感到遗憾的是,利斯科夫还写道:"从关注数据对象的属性并鼓励通过考虑数据的抽象属性来开发程序这一意义上来说,CLU 是一种面向对象语言。"我认为,无论从技术上还是历史意义上,CLU 都不能被称为面向对象语言。我确实认为现代面向对象语言受到了 CLU 的影响(特别是在表示形式的封装方面),但这并不意味着 CLU 本身就是一种面向对象语言。
Acknowledgements
致谢
There are too many people to thank individually for all their discussions on the topic of this essay. I thank Olivier Danvy, Shriram Krishnamurthi, Doug Lea, Yannis Smaragdakis, Kasper Osterbye, and Gilad Bracha for their comments on the essay itself.
有太多人参与了关于本文主题的讨论,无法逐一致谢。我要感谢奥利维耶·丹维(Olivier Danvy)、希拉姆·克里希纳穆尔蒂(Shriram Krishnamurthi)、道格·李(Doug Lea)、亚尼斯·斯马拉加基斯(Yannis Smaragdakis)、卡斯珀·奥斯特拜(Kasper Osterbye)和吉拉德·布拉查(Gilad Bracha)对本文本身提出的宝贵意见。
7. Conclusion
7. 结论
Objects and abstract data types (ADTs) are two different forms of data abstraction. They can both implement simple abstractions without complex methods, but objects are extensible while ADTs are easier to verify. Significant differences arise when implementing abstractions with complex operations, for example comparisons or composition operators. Object interfaces support the same level of flexibility, but often force a trade-off between interface simplicity and efficiency. Abstract data types support clean interfaces, optimization, and verification, but do not allow mixing or extending the abstractions. Mathematically oriented types, including numbers and sets, typically involve complex operations that manipulate multiple abstract values, and are best defined using ADTs. Most other types including files, device drivers, graphic objects, often do not require optimized complex operations, and so are best implemented as objects.
对象和抽象数据类型是两种不同的数据抽象形式。它们都可以实现不包含复杂方法的简单抽象,但对象具有可扩展性,而抽象数据类型更易于验证。当实现包含复杂运算(例如比较运算或组合运算符)的抽象时,二者会呈现出显著差异。对象接口支持同等程度的灵活性,但通常需要在接口简洁性和效率之间进行权衡。抽象数据类型支持清晰的接口、优化和验证,但不允许混合或扩展抽象。面向数学的类型(包括数字和集合)通常涉及操作多个抽象值的复杂运算,最适合使用抽象数据类型定义。而大多数其他类型(包括文件、设备驱动程序、图形对象)通常不需要优化的复杂运算,因此最适合实现为对象。
Modern object-oriented languages support a mixture of object-oriented and ADT functionality, allowing programmers to choose ADT style for specific situations. In modern object-oriented languages, the issue boils down to whether or not classes are used as types. In a pure object-oriented style, classes are only used to construct objects, and interfaces are used for types. When classes are used as types, the programmer is implicitly choosing to use a form of abstract data type. The decision affects how easy it is for the program to be extended and maintained over time, and also how easy it is to optimize complex operations. Understanding the fundamental differences between objects and ADTs can help in choosing to use them wisely.
现代面向对象语言同时支持面向对象和抽象数据类型的功能,允许程序员根据特定情况选择抽象数据类型风格。在现代面向对象语言中,核心问题归结为是否将类用作类型。在纯面向对象风格中,类仅用于构造对象,而接口用作类型。当类被用作类型时,程序员实际上是在选择使用一种形式的抽象数据类型。这一决策会影响程序的可扩展性和可维护性,以及复杂运算的优化难度。理解对象与抽象数据类型之间的根本差异,有助于明智地选择使用它们。
References
参考文献
1\] N. Adams and J. Rees. Object-oriented programming in Scheme. In Proceedings of the ACM Conf. on Lisp and Functional Programming, pages 277--288, 1988.
\[2\] P. America. A behavioral approach to subtyping object-oriented programming languages. In Proceedings of the REX Workshop/School on the Foundations of Object-Oriented Languages, volume 173 of Lecture Notes in Computer Science, 1990.
\[3\] J. Bergstra and J. Tucker. Initial and final algebra semantics for data type specifications: Two characterisation theorems. Research Report IW 142, Stichting Mathematisch Centrum, 1980.
\[4\] G. M. Birtwistle. DEMOS: a system for discrete event modelling on Simula. Springer-Verlag, 1987.
\[5\] D. Box. Essential COM (DevelopMentor Series). Addison-Wesley Professional, 1998.
\[6\] C. Boyapati, B. Liskov, and L. Shrira. Ownership types for object encapsulation. SIGPLAN Notices, 38(1):213--223, 2003.
\[7\] R. Burstall and J. Goguen. Putting theories together to make specifications. In International Joint Conferences on Artificial Intelligence, pages 1045--1058. Department of Computer Science, Carnegie-Mellon University, 1977.
\[8\] P. Canning, W. Cook, W. Hill, and W. Olthoff. Interfaces for strongly-typed object-oriented programming. In Proceedings of ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications, pages 457--467, 1989.
\[9\] L. Cardelli. A semantics of multiple inheritance. In Semantics of Data Types, volume 173 of Lecture Notes in Computer Science, pages 51--68. Springer-Verlag, 1984.
\[10\] L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. Computing Surveys, 17(4):471--522, 1986.
\[11\] C. Chambers. Object-oriented multi-methods in Cecil. In ECOOP '92: Proceedings of the European Conference on Object-Oriented Programming, pages 33--56. Springer-Verlag, 1992.
\[12\] A. Church. The Calculi of Lambda Conversion. Princeton University Press, 1941.
\[13\] W. Cook. A Denotational Semantics of Inheritance. PhD thesis, Brown University, 1989.
\[14\] W. Cook. Object-oriented programming versus abstract data types. In Proceedings of the REX Workshop/School on the Foundations of Object-Oriented Languages, volume 173 of Lecture Notes in Computer Science, 1990.
\[15\] B. C. d. S. Oliveira. Modular visitor components: A practical solution to the expression families problem. In S. Drossopoulou, editor, 23rd European Conference on Object Oriented Programming (ECOOP), 2009.
\[16\] O.-J. Dahl, B. Myhrhaug, and K. Nygaard. The SIMULA 67 common base language. Technical report, Norwegian Computing Center, 1970. Publication S-22.
\[17\] H.-D. Ehrich. On the theory of specification, implementation and parameterization of abstract data types. J. ACM, 29(1):206--227, 1982.
\[18\] A. Filinski. Declarative continuations and categorical duality. Master's thesis DIKU Report 89/11, University of Copenhagen, 1989.
\[19\] J. Gibbons. Unfolding abstract datatypes. In MPC '08: Proceedings of the 9th international conference on Mathematics of Program Construction, pages 110--133, 2008.
\[20\] J. Goguen, J. Thatcher, and E. Wagner. An initial algebra approach to the specification, correctness, and implementation of abstract data types. Current Trends in Programming Methodology, IV:80--149, 1978.
\[21\] J. Gosling, B. Joy, G. Steele, and G. Bracha. Java™ Language Specification. Addison-Wesley Professional, 2005.
\[22\] D. N. Gray, J. Hotchkiss, S. LaForge, A. Shalit, and T. Weinberg. Modern languages and Microsoft's Component Object Model. Commun. ACM, 41(5):55--65, 1998.
\[23\] C. A. Gunter and J. C. Mitchell, editors. Theoretical aspects of object-oriented programming: types, semantics, and language design. MIT Press, 1994.
\[24\] J. Guttag. The Specification and Application to Programming of Abstract Data Types. Report, University of Toronto, Computer Science Department, 1975.
\[25\] C. Hewitt, P. Bishop, I. Greif, B. Smith, T. Matson, and R. Steiger. Actor induction and meta-evaluation. In POPL '73: Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 153--168. ACM, 1973.
\[26\] C. A. R. Hoare. Proof of correctness of data representation. Acta Informatica, 1:271--281, 1972.
\[27\] J. Hogg, D. Lea, A. Wills, D. deChampeaux, and R. Holt. The Geneva convention on the treatment of object aliasing. SIGPLAN OOPS Messenger, 3(2):11--16, 1992.
\[28\] D. Ingalls. The Smalltalk-76 programming system. In POPL, pages 9--16, 1978.
\[29\] B. Jacobs. Objects and classes, co-algebraically. In Object orientation with parallelism and persistence, pages 83--103. 1996.
\[30\] S. P. Jones. Haskell 98 Language and Libraries: The Revised Report. Cambridge University Press, 2003.
\[31\] S. P. Jones. Classes, Jim, but not as we know them. type classes in Haskell: what, why, and whither. ECOOP Keynote, 2009.
\[32\] B. W. Kernighan and D. Ritchie. C Programming Language (2nd Edition). Prentice Hall PTR, 1988.
\[33\] S. Krishnamurthi, M. Felleisen, and D. P. Friedman. Synthesizing object-oriented and functional design to promote re-use. In European Conference on Object-Oriented Programming, pages 91--113. Springer, 1998.
\[34\] B. Liskov. Keynote address - data abstraction and hierarchy. In OOPSLA '87: Addendum to the Proceedings on Object-oriented programming systems, languages and applications (Addendum), pages 17--34, 1987.
\[35\] B. Liskov. A history of CLU. In History of programming languages-II, pages 471--510. ACM, 1996.
\[36\] B. Liskov, R. Atkinson, T. Bloom, E. Moss, J. C. Schaffert, R. Scheifler, and A. Snyder. CLU Reference Manual. Springer-Verlag, 1981.
\[37\] B. Liskov and S. Zilles. Programming with abstract data types. SIGPLAN Notices, 9(4):50--59, 1974.
\[38\] K. C. Louden. Programming Languages: Principles and Practice. Wadsworth Publ. Co., 1993.
\[39\] D. B. MacQueen. Modules for Standard ML. In Conference on LISP and Functional Programming, 1984.
\[40\] B. Mahr and J. Makowsky. An axiomatic approach to semantics of specification languages. In Proceedings of the 6th Conference on Theoretical Computer Science, volume 145 of Lecture Notes in Computer Science, pages 211--219. Springer-Verlag, 1983.
\[41\] R. Milner. Communication and Concurrency. Prentice-Hall, 1989.
\[42\] R. Milner, M. Tofte, and R. Harper. The definition of Standard ML. MIT Press, 1990.
\[43\] J. C. Mitchell. Concepts in Programming Languages. Cambridge University Press, 2001.
\[44\] J. C. Mitchell and G. D. Plotkin. Abstract types have existential type. In Proceedings of the ACM Symp. on Principles of Programming Languages. ACM, 1985.
\[45\] P. Müller, A. Poetzsch-Heffter, and G. T. Leavens. Modular invariants for layered object structures. Sci. Comput. Program., 62(3):253--286, 2006.
\[46\] D. A. Naumann. Observational purity and encapsulation. Theor. Comput. Sci., 376(3):205--224, 2007.
\[47\] M. Odersky, L. Spoon, and B. Venners. Programming in Scala: A Comprehensive Step-by-step Guide. Artima Inc, 2008.
\[48\] M. Odersky and M. Zenger. Independently extensible solutions to the expression problem. In Proceedings FOOL 12, 2005. http://homepages.inf.ed.ac.uk/wadler/fool.
\[49\] U. S. D. of Defense. Reference manual for the Ada programming language. ANSI/MIL-STD-1815 A, 1983.
\[50\] B. C. Pierce. Types and Programming Languages. MIT Press, 2002.
\[51\] T. W. Pratt and M. V. Zelkowitz. Programming languages: design and implementation. Prentice-Hall, 1995.
\[52\] J. C. Reynolds. User-defined types and procedural data structures as complementary approaches to data abstraction. In New Advances in Algorithmic Languages, pages 157--168. INRIA, 1975.
\[53\] M. L. Scott. Programming Language Pragmatics. Morgan Kaufmann, 2000.
\[54\] R. Sebesta. Concepts of Programming Languages, Eighth Edition. Addison-Wesley, 2007.
\[55\] J. F. Shoch. An overview of the programming language Smalltalk-72. SIGPLAN Notices, 14(9):64--73, 1979.
\[56\] G. Steele. LAMBDA: The ultimate declarative. Technical Report AIM-379, MIT AI LAB, 1976.
\[57\] A. B. Tucker and R. E. Noonan. Programming Languages: Principles and Paradigms, Second Edition. McGraw-Hill Higher Education, 2007.
\[58\] P. Wadler. The expression problem. Mail to the javagenericity mailing list, 1998.
\[59\] P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In POPL '89: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 60--76. ACM, 1989.
\[60\] N. Wirth. Programming in Modula-2. Springer-Verlag, 1983.
\[61\] W. A. Wulf, R. L. London, and M. Shaw. An introduction to the construction and verification of Alphard programs. IEEE Transactions on Software Engineering, SE-24(4), 1976.
\[62\] S. N. Zilles. Procedural encapsulation: A linguistic protection mechanism. SIGPLAN Notices, 8(9):142--146, 1973.
*** ** * ** ***
## via:
* On Understanding Data Abstraction, Revisited 2009
[https://www.cs.utexas.edu/\~wcook/Drafts/2009/essay.pdf](https://www.cs.utexas.edu/~wcook/Drafts/2009/essay.pdf)
* On Understanding Types,Data Abstraction, and Polymorphism 1985