前言

晚上床上玩手机有点上头，导致一整天下午之前都没怎么精神刷课，于是调整了下学这套课程的排期，多排了一天嘻嘻，这下轻松多了。

本文是学习standford CS145 Introduction to Databases系列视频的第三篇笔记，主要包括第七章到第九章的内容：

关系型数据库设计
XML的查询
统一建模语言(Unified Modeling Language， or UML)

个人博客页观感更加喵，踩一踩留个脚印谢谢喵：CS145 Intro to databases 学习笔记3------关系型数据库设计、XML的查询、UML | Andrew的个人博客 (andreww1219.github.io)

一、关系型数据库设计

1. 基本概念

现有关系表 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R，关系表的所有属性的集合为 <math xmlns="http://www.w3.org/1998/Math/MathML"> U U </math>U

设 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X、 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z皆为 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R上的一组属性，即 <math xmlns="http://www.w3.org/1998/Math/MathML"> X , Y , Z ⊆ U X, Y, Z \subseteq U </math>X,Y,Z⊆U

1.1 函数依赖(Functional Dependencies)

(1) 定义

对 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∀ t , u ∈ R \forall t, u \in R </math>∀t,u∈R，如果 <math xmlns="http://www.w3.org/1998/Math/MathML"> t [ X ] = u [ X ] t[X] = u[X] </math>t[X]=u[X]，则有 <math xmlns="http://www.w3.org/1998/Math/MathML"> t [ Y ] = u [ Y ] t[Y] = u[Y] </math>t[Y]=u[Y]，其中 <math xmlns="http://www.w3.org/1998/Math/MathML"> t [ X ] t[X] </math>t[X]代表取元组 <math xmlns="http://www.w3.org/1998/Math/MathML"> t t </math>t所有 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X中的属性

则称， <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X，记为 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y

讲人话就是确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X就确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y ，称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

(2) Trivial or Nontrivial

trivial意为平常的、琐碎的、不重要的，所以trivial的Functional Dependency（以下简称FD）就是在讲废话，不能传递更多信息。

Trivial FD：当 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⊆ X Y \subseteq X </math>Y⊆X，我们称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y是 Trivial FD。因为确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X肯定就确定了它的子集 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y（讲废话

Nontivial FD：相对地，当 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⊈ X Y \nsubseteq X </math>Y⊈X，我们称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y是 Nontrivial FD。

Complete nontrivial FD：当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y完全不重合时，即 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ∩ X = ∅ Y \cap X = \varnothing </math>Y∩X=∅，我们称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y是 Complete nontrivial FD

(3) 属性的闭包(Closure of Attributes)

给定许多依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> S S </math>S，和属性集 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

设集合 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e t = X set = X </math>set=X

重复以下步骤直到 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e t set </math>set不发生改变:

如果有 <math xmlns="http://www.w3.org/1998/Math/MathML"> A → B A \rightarrow B </math>A→B并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> A A </math>A在 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e t set </math>set当中，那么 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e t = s e t ∪ B set = set \cup B </math>set=set∪B

最后得到的 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e t set </math>set成为 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X在 <math xmlns="http://www.w3.org/1998/Math/MathML"> S S </math>S上的闭包(Closure)，记作 <math xmlns="http://www.w3.org/1998/Math/MathML"> X + X^+ </math>X+

讲人话，闭包就是已知许多依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> S S </math>S，确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X后，能够确定的所有属性

1.2 部分依赖

当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y，取 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X的一子集 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ′ X' </math>X′，若 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ′ → Y X' \rightarrow Y </math>X′→Y，则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y部分依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

讲人话就是确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X中一部分值就能确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y ，称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y部分依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

1.3 传递依赖

当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y → Z Y \rightarrow Z </math>Y→Z，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ↛ X Y \nrightarrow X </math>Y↛X，则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z传递依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

讲人话就是确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X能确定 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y，确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y又能确定 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z ，并且确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y不能确定 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X （在范式比较中会提到为什么这么定义），就说 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z传递依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

1.4 多值依赖(Multivalued Dependencies, or MVD)

设 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z = U − X − Y Z = U - X - Y </math>Z=U−X−Y，即 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z为属性集 <math xmlns="http://www.w3.org/1998/Math/MathML"> U U </math>U中除了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y的剩余所有属性

(1) 定义

对 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∀ t 1 , t 2 ∈ R \forall t_1, t_2 \in R </math>∀t1,t2∈R，如果 <math xmlns="http://www.w3.org/1998/Math/MathML"> t 1 [ X ] = t 2 [ X ] t_1[X] = t_2[X] </math>t1[X]=t2[X]，则 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∃ v ∈ R \exists v \in R </math>∃v∈R，使得 <math xmlns="http://www.w3.org/1998/Math/MathML"> v [ X ] = t 1 [ X ] = t 2 [ X ] v[X] = t_1[X] = t_2[X] </math>v[X]=t1[X]=t2[X]，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> v [ Y ] = t 1 [ Y ] v[Y] = t_1[Y] </math>v[Y]=t1[Y]，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> v [ Z ] = t 2 [ Z ] v[Z] = t_2[Z] </math>v[Z]=t2[Z]，则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y多值依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

讲人话就是，如果有 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 1 ) (x, y_1, z_1) </math>(x,y1,z1)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 2 ) (x, y_1, z_2) </math>(x,y1,z2)，则必有 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 2 ) (x, y_1, z_2) </math>(x,y1,z2)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 2 , z 1 ) (x, y_2, z_1) </math>(x,y2,z1)（Y和Z是对称的），则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y多值依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X，记作 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ↠ X Y \twoheadrightarrow X </math>Y↠X

也就是说，确定了 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X为 <math xmlns="http://www.w3.org/1998/Math/MathML"> x x </math>x时， <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z会有很多种取值 <math xmlns="http://www.w3.org/1998/Math/MathML"> y 1 , y 2 , . . . , y m y_1, y_2, ..., y_m </math>y1,y2,...,ym和 <math xmlns="http://www.w3.org/1998/Math/MathML"> z 1 , z 2 , . . . , z n z_1, z_2, ..., z_n </math>z1,z2,...,zn， <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z的取值的所有组合都会出现

从定义中，我们可以看出，判断 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y是否多值依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X，还需要考虑全属性集 <math xmlns="http://www.w3.org/1998/Math/MathML"> U U </math>U。也就是说，同样的属性集 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y和 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X，在不同的全属性集 <math xmlns="http://www.w3.org/1998/Math/MathML"> U U </math>U中，它们可能多值依赖的关系不同。

(2) Trivial or Nontrivial

Trivial MVD：当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ∪ Y = U X \cup Y = U </math>X∪Y=U或 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⊆ X Y \subseteq X </math>Y⊆X时，称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ↠ Y X \twoheadrightarrow Y </math>X↠Y为 Trivial MVD。

如果有 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 1 ) (x, y_1, z_1) </math>(x,y1,z1)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 2 , z 2 ) (x, y_2, z_2) </math>(x,y2,z2)

当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ∪ Y = U X \cup Y = U </math>X∪Y=U时， <math xmlns="http://www.w3.org/1998/Math/MathML"> Z = ∅ Z = \varnothing </math>Z=∅，得 <math xmlns="http://www.w3.org/1998/Math/MathML"> z 1 = z 2 = N U L L z_1 = z_2 = NULL </math>z1=z2=NULL。那么 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 2 ) = ( x , y 1 , N U L L ) (x, y_1, z_2) = (x, y_1, NULL) </math>(x,y1,z2)=(x,y1,NULL)， <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 2 , z 1 ) = ( x , y 2 , N U L L ) (x, y_2, z_1) = (x, y_2, NULL) </math>(x,y2,z1)=(x,y2,NULL)已经在表中，自行满足了多值依赖的条件，

当 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⊆ X Y \subseteq X </math>Y⊆X时，有 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y，得 <math xmlns="http://www.w3.org/1998/Math/MathML"> y 1 = y 2 y_1 = y_2 </math>y1=y2。那么 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 1 , z 2 ) = ( x , y 2 , z 2 ) (x, y_1, z_2) = (x, y_2, z_2) </math>(x,y1,z2)=(x,y2,z2)， <math xmlns="http://www.w3.org/1998/Math/MathML"> ( x , y 2 , z 1 ) = ( x , y 1 , z 1 ) (x, y_2, z_1) = (x, y_1, z_1) </math>(x,y2,z1)=(x,y1,z1)已经在表中自行满足了多值依赖的条件

所以满足以上其中一种条件的多值依赖不能传递更多的信息，可称为Trivial MVD

Nontrivial MVD：当 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y和 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X不满足以上条件，则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ↠ Y X \twoheadrightarrow Y </math>X↠Y为Nontrivial MVD

2. 分解关系模式(Decompostion of Relational Schema)

2.1 分解的原则

将关系 <math xmlns="http://www.w3.org/1998/Math/MathML"> R ( U ) R(U) </math>R(U)分解为 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 ( U 1 ) R_1(U_1) </math>R1(U1)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 ( U 2 ) R_2(U_2) </math>R2(U2)后，应当满足：

<math xmlns="http://www.w3.org/1998/Math/MathML"> U 1 ∪ U 2 = U U_1 \cup U_2 = U </math>U1∪U2=U
<math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 ⋈ R 2 = R R_1 \bowtie R_2 = R </math>R1⋈R2=R （无损连接性）

实际上满足第二条就会满足第一条（小声

"Good" Decomposition，也就是一个好的分解，应当满足无损连接性 （能通过自然连接得到原始表）和依赖保留性（能在每个表中直接得到原来的所有依赖），详见前言中的参考资料

2.2 BCNF分解

(1) BCNF的定义

注意：在实际情况中，可能不止有一组元素能够作为主键，比如 <math xmlns="http://www.w3.org/1998/Math/MathML"> X + = U X^+ = U </math>X+=U， <math xmlns="http://www.w3.org/1998/Math/MathML"> Y + = U Y^+ = U </math>Y+=U，那么称 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y都为超键，去掉超键中任意属性会使其不再为超键的，叫做候选键 ，我们可以在多个候选键中选一个叫主键。

假设 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y都是候选键， <math xmlns="http://www.w3.org/1998/Math/MathML"> X ∪ Y X \cup Y </math>X∪Y也就是 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y中的所有元素都称作主属性。而不是只有我们选定的主键中的元素才叫做主属性

对 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R中任一nontrivial FD <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y， <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X都是候选键的超集，那么就称 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R满足鲍依斯-科得范式(Boyce Codd Normal Form, or BCNF)

(2) BCNF分解算法

找到 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R中的候选键

重复以下步骤直到所有的关系都满足BCNF:

关系 <math xmlns="http://www.w3.org/1998/Math/MathML"> R ′ R' </math>R′的依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> A → B A \rightarrow B </math>A→B不满足BCNF
将 <math xmlns="http://www.w3.org/1998/Math/MathML"> R ′ R' </math>R′ 分解为 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 = ( A , B ) R_1 = (A, B) </math>R1=(A,B)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 ( A , r e s t ) R_2(A, rest) </math>R2(A,rest)， <math xmlns="http://www.w3.org/1998/Math/MathML"> r e s t rest </math>rest为 <math xmlns="http://www.w3.org/1998/Math/MathML"> A A </math>A和 <math xmlns="http://www.w3.org/1998/Math/MathML"> B B </math>B之外的剩余部分
找到 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 R_1 </math>R1和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 R_2 </math>R2中的FD和候选键

2.3 4NF分解

(1) 4NF的定义

对 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R中任一nontrivial MVD <math xmlns="http://www.w3.org/1998/Math/MathML"> X ↠ Y X \twoheadrightarrow Y </math>X↠Y， <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X都是候选键的超集，那么就称 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R满足4NF

(2) 4NF分解算法

找到 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R中的候选键

重复以下步骤直到所有的关系都满足4NF:

关系 <math xmlns="http://www.w3.org/1998/Math/MathML"> R ′ R' </math>R′的依赖 <math xmlns="http://www.w3.org/1998/Math/MathML"> A ↠ B A \twoheadrightarrow B </math>A↠B不满足4NF
将 <math xmlns="http://www.w3.org/1998/Math/MathML"> R ′ R' </math>R′ 分解为 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 = ( A , B ) R_1 = (A, B) </math>R1=(A,B)和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 ( A , r e s t ) R_2(A, rest) </math>R2(A,rest)， <math xmlns="http://www.w3.org/1998/Math/MathML"> r e s t rest </math>rest为 <math xmlns="http://www.w3.org/1998/Math/MathML"> A A </math>A和 <math xmlns="http://www.w3.org/1998/Math/MathML"> B B </math>B之外的剩余部分
找到 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 R_1 </math>R1和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 R_2 </math>R2中的MVD和候选键

3. 范式之间的比较

3.1 每一层范式解决的问题

1NF：每个元组的每个属性的值不能是集合，不可再分关系。1NF是 <math xmlns="http://www.w3.org/1998/Math/MathML"> R R </math>R作为关系模型的最基本的条件
2NF：在1NF的基础上消除了非主属性对候选键的部分依赖，也就是非主属性不能只依赖于一部分主属性，应该完全依赖于所有主属性。
3NF：在2NF的基础上消除了非主属性对候选键的传递依赖。也就是非主属性不能依赖于其他非主属性，应该直接完全依赖于所有主属性。

前面我们讲到了传递依赖的定义：

当 <math xmlns="http://www.w3.org/1998/Math/MathML"> X → Y X \rightarrow Y </math>X→Y，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y → Z Y \rightarrow Z </math>Y→Z，并且 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ↛ X Y \nrightarrow X </math>Y↛X，则称 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z传递依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X

这里要求 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ↛ X Y \nrightarrow X </math>Y↛X是因为，如果 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y → X Y \rightarrow X </math>Y→X，那么 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z部分依赖于 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y，而部分依赖已经在2NF中消除了。
3NF究竟还遗留了哪些问题？

我们现在只解决了主属性和非主属性之间的冗余，而BCNF希望解决多个候选键之间，也就是主属性之间的冗余。

当然，如果关系中只有一个候选键，那么3NF和BCNF其实没什么区别。要举出有多个后续键的例子比较困难，所以很难区分喵qwq

BCNF：在3NF的基础上消除了候选键之间的部分依赖和传递依赖。

至此，我们已经解决的冗余都是针对依赖关系而言，而依赖是一对一的。

倘若我们有 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ↠ Y X \twoheadrightarrow Y </math>X↠Y，剩余部分为 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z，其中 <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X对 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y Y </math>Y是1对m， <math xmlns="http://www.w3.org/1998/Math/MathML"> X X </math>X对 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z是1对n，他们的组合就有 <math xmlns="http://www.w3.org/1998/Math/MathML"> m × n m \times n </math>m×n个元组，而实际上用两个表 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 R_1 </math>R1和 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 2 R_2 </math>R2共 <math xmlns="http://www.w3.org/1998/Math/MathML"> m + n m + n </math>m+n个元组就能表示这两个一对多的关系，而且 <math xmlns="http://www.w3.org/1998/Math/MathML"> R 1 ⋈ R 2 = R R_1 \bowtie R_2 = R </math>R1⋈R2=R也符合分解的原则。

4NF：在BCNF的基础上消除了nontrivial MVD。

3.2 BCNF和4NF的不足

BCNF和4NF会对所有FD和MVD做检查和分解，当分解出来的表的属性过少时，我们需要对多个表做自然连接才能验证我们原来的FD和MVD，对查询不是很友好，所以应该尽量先选择能保留更多属性的分解。然后具体问题具体分析，权衡好查询负载(Query Workload)和过分解(Over Decomposition)之间的关系。

二、XML的查询

1. XPath

视频中关于XPath的语法都能在这里找到：XPath 教程 | 菜鸟教程 (runoob.com)

主要列举一些示例方便快速复习：

xpath 复制代码

// 谓语(Predicates)：将用于查找某些特定节点的条件嵌入到方括号[]中，将其称之为谓语
// 查询价格小于90且存在作者的姓是Ullman的一本书
// 注意：[]中的条件只针对存在
doc("BookstoreQ.xml")/Bookstore/Book[@Price < 90 and
    Authors/Author/Last_Name ="Ullman" ]/Title

xpath 复制代码

// 内置函数(Built-in Functions)
// 查询Remark属性中包含"great"的书
doc("BookstoreQ.xml")//Book[contains(Remark, "great" )]/Title

xpath 复制代码

// self join
doc("BookstoreQ.xml")//Magazine[Title
    doc("BookstoreQ.xml")//Book/Title]

xpath 复制代码

// axes(轴)
// 查询所有父节点不为Bookstore和不为Book的节点
doc("BookstoreQ.xml")/Bookstore//*[name(parent::*)!="Bookstore"
    and name(parent::*)!="Book"]
// 查询所有跟同层级节点有相同Title的Book或Magzine
doc("BookstoreQ.xml")/Bookstore/(Book|Magazine)[Title = 
    following-sibling::*/Title or Title = preceding-sibling::*/Title]

xpath 复制代码

由于谓语只能表示"存在"(exists)关系，当我们想表示"任意"(forall)时，需要使用内置函数count
// 查询所有作者的First_Name都包含'J'的书
doc("BookstoreQ.xml")//Book[
    count(Authors/Author[contains(First_Name,"J")]) = 
    count(Authors/Author/First_Name)]

2. XQuery

视频中大多数的语法能在这里找到：XQuery 教程 | 菜鸟教程 (runoob.com)

内置函数需要在XPath中查询：XPath、XQuery 以及 XSLT 函数 | 菜鸟教程 (runoob.com)

限定表达式等在上述链接找不到，可以在这里看：限定表达式 (XQuery) - SQL Server | Microsoft Learn

2.1 FLWOR表达式

xquery 复制代码

For $var in expr
Let $var := expr
Where condition
Order by expr
Return expr

2.2 代码示例

xquery 复制代码

// 查询价格小于90且存在作者的姓是Ullman的一本书
for $b in doc("BookstoreQ.xml")/Bookstore/Book
where $b/@Price < 90 
 and $b/Authors/Author/Last_Name = "Ullman"
return $b/Title

xquery 复制代码

for $b in doc("BookstoreQ.xml")/Bookstore/Book

// 限定表达式 ( some | every ) <variable> in <Expression> (,...) satisfies <Expression>  
// some表示"存在"  every表示"任意"
// 查询Title中存在作者的First_Name的Book
where some $fn in $b/Authors/Author/First_Name satisfies
    contains($b/Title, $fn)
    
// 构造返回结果    
return <Book>
        { $b/Title }
        { $b/Authors/Author/First_Name }
    </Book>

xquery 复制代码

// 聚合函数
// 查询所有Book的Price的平均值
<Average>
{ let $plist := doc("BookstoreQ.xml")/Bookstore/Book/@Price
    return avg($plist)}
</Average>

xquery 复制代码

for $b in doc("BookstoreQ.xml")/Bookstore/Book
order by $b/@Price
return <Book>
         { $b/Title }
         <Price>{ $b/data(@Price))</Price>
       </Book>

xquery 复制代码

// 过滤重复值
// distinct-values返回 值的列表，而不是 节点的列表
for $n in distinct-values(doc("Bookstore0.xml" )//Last_Name)
return <Last_Name> {$n} </Last_Name>

3. XSLT(EXtensible Stylesheet Language)

XSLT支持将XML转换为其他文档，比如XHTML。对XML的转换也可以看作是查询和构造结果

示例如下：

xml 复制代码

<xsl:stylesheet version="2.0" xmins:xsl="http://mmr.w3.org/1999/XSl/Transform"
<xsl:output method="xml" indent="yes" onit-xml-declaration-"yes" />
//外层是基本格式

// template + match匹配特定节点
<xsl:template match="Book">
<BookTitle>
    // 选择特定节点中的子节点并构造返回结果
    <xsl:value-of select="Title"/></BookTitle>
</xsl:template>

<xsl:template match="Magazine">
    <MagazineTitle> <xsl:value-of select="Title" /> </MagazineTitle>
</xsl:template>

</xsl:stylesheet>

xml 复制代码

// 不等号需要使用转义 小于号：$lt;  大于号$gt;
// copy-of + select返回选择到的节点
<xsl:template match="Book[@Price &lt; 90]">
    <xsl:copy-of select="." />
<xsl:template>

// 不满足所有匹配的节点和属性会成串返回，因此得作以下处理
<xsl:template match="text()" />

// 更加具体的限定条件所构造的匹配，优先度更高
// 当节点同时满足两个优先度相同的匹配规则时（最好不要，会报错），以最后一个匹配规则为结果

xml 复制代码

// apply-templates + select 将 template + select 的结果嵌入其中
<xsl:template match="*|@*|text()"
    <xsl:copy>
        <xsl:apply-templates select="*|@*|text()/>
    </xsl:copy>
</xsl:template>

xml 复制代码

<xsl:template match="/"> 
<html> 
    <body> 
    <h2>My CD Collection</h2> 
    <table border="1"> 
        <tr bgcolor="#9acd32"> 
            <th>Title</th> 
            <th>Artist</th>
            <th>Price</th> 
        </tr> 
        // for-each + select 遍历选择到的所有节点
        <xsl:for-each select="catalog/cd"> 
            // if + test 选择满足条件的节点
            <xsl:if test="price &gt; 10"> 
                <tr>
                    <td><xsl:value-of select="title"/></td>
                    <td><xsl:value-of select="artist"/></td>
                    <td><xsl:value-of select="price"/></td> 
                </tr> 
            </xsl:if> 
        </xsl:for-each> 
    </table> 
    </body> 
</html> 
</xsl:template>

三、统一建模语言(UML)

UML用图形化的表示方法描述对数据模型作建模，可以直接翻译为关系模型

1. 类(Classes)

关系模型中的每一个表可以描述为一个了类，如

Student(sID, sName, GPA)

可以与下述模型互相转换：

plaintext 复制代码

+-----------+
|  Student  |
+-----------+
|   sID pk  |
|   sName   |
|    GPA    |
+-----------+
| <methods> |     
+-----------+

2. 关联和关联类(Associations and Association Classes)

一个类跟另一个类有联系，即有关联，可以有一条线将两个类相连来表示，而关联所产生的附加信息我们可以用一个关联类表示，如：

plaintext 复制代码

+-------+               +-------+  
|   C1  |  Association  |   C2  |
+-------+-------+-------+-------+
|   A1  |       |       |   A2  |
+-------+       |       +-------+
                |
             +--+----+ 
             |  C3   |
             +-------+
             |  A3   | 
             +-------+

其中C1和C2关联到对方的数量可能不同，比如一个学生只能选一个专业，记为1..1，一个专业有至少一个学生修读，记为1..*，如下：

plaintext 复制代码

+-------+               +-------+  
|   C1  |  Association  |   C2  |
+-------+-------+-------+-------+
|   A1  |1..*   |   1..1|   A2  |
+-------+       |       +-------+
                |
             +--+----+ 
             |  C3   |
             +-------+
             |  A3   | 
             +-------+

实际上，如果对应的数量是0..1或者1..1，那么关联表是多余的，可以将关联的属性归到C1中，如下：

plaintext 复制代码

+-------+               +-------+  
|   C1  |  Association  |   C2  |
+-------+-------+-------+-------+
|   A1  |1..*       1..1|   A2  |
|   A3  |               +-------+
+-------+

3. 子类(Subclasses)

一个父类可以有很多个子类，子类拥有父类的主键和额外的属性或关联，如下：

plaintext 复制代码

            +------+
            |  C1  |
            +------+
            |  K pk|
            |  A1  |
            +--+---+
               |
 +-------------+------------+
 |             |            |
 |             |            |
+--+--+    +--+--+         +-------+               +-------+  
| C2  |    |  C3 |         |   C4  |  Association  |   C5  |
+-----+    +-----+         +-------+---------------+-------+
| A2  |    |  A3 |         |   A4  |               |   A5  |
+-----+    +-----+         +-------+               +-------+

不完整/完整(incomplete /complete)：

当子类能够表示所有的情况时，我们说这些子类是complete的，否则则是incomplete的

重叠/不相交 (overlapping /disjoint)：

当有对象能够同时属于多个子类时，我们说这些子类是overlapping的，否则则是disjoint的

将带子类的UML翻译为关系模型有多种办法：

父类建表，每一个子类就是一个表，只包含子类的属性和父类的主键，适用于disjoint + incomplete的情况
父类可不建表，每一个子类就是一个表，它不仅包含子类的属性和父类的主键，也包含父类的属性，适用于disjoint + complete的情况
用一个表包含父类和所有子类的属性：适用于heavily overlapping的情况

Subclass relations contain superclass key+ specialized attrs

Subclass relations contain all attributes

One relation containing all superclass + subclass attrs.

4. 组合和聚合(Composition and Aggregation)

组合和聚合都是关联的特例，强调整体和部分的关系，在转换为关系模型时跟关联没有区别，但是在UML中的描述存在语义上的区别。

组合中的整体和部分具有强依赖，整体的对象负责部分的对象的生命周期，如鸟和翅膀

而聚合的整体和部分可以独立存在，如汽车和轮胎，部门和员工

详见前言中的参考资料。

CS145 Intro to databases 学习笔记3——关系型数据库设计、XML的查询、UML

前言

一、关系型数据库设计

1. 基本概念

1.1 函数依赖(Functional Dependencies)

(1) 定义

(2) Trivial or Nontrivial

(3) 属性的闭包(Closure of Attributes)

1.2 部分依赖

1.3 传递依赖

1.4 多值依赖(Multivalued Dependencies, or MVD)

(1) 定义

(2) Trivial or Nontrivial

2. 分解关系模式(Decompostion of Relational Schema)

2.1 分解的原则

2.2 BCNF分解

(1) BCNF的定义

(2) BCNF分解算法

2.3 4NF分解

(1) 4NF的定义

(2) 4NF分解算法

3. 范式之间的比较

3.1 每一层范式解决的问题

3.2 BCNF和4NF的不足

二、XML的查询

1. XPath

2. XQuery

2.1 FLWOR表达式

2.2 代码示例

3. XSLT(EXtensible Stylesheet Language)

三、统一建模语言(UML)

1. 类(Classes)

2. 关联 和 关联类(Associations and Association Classes)

3. 子类(Subclasses)

4. 组合和聚合(Composition and Aggregation)

2. 关联和关联类(Associations and Association Classes)