Query Processing 查询处理

Query Processing Overview • Objective: Provide correct answer to query (almost) as efficiently as possible 查询处理的目的是高效准确查询Query Processing Operations • Query processing involves several operations: --- Lexical & syntactic analysis -- transform SQL into an internal form --- Normalisation -- collecting AND and OR predicates --- Semantic analysis -- i.e., does the query make sense? --- Simplification -- e.g., remove common or redundant sub-expressions --- Generating an execution plan -- query optimisation --- Executing the plan and returning results to the client • To describe most of these, we need to use Relational Algebra - 词法和语法分析 - 将 SQL 转换为内部形式 - 规范化 - 收集 AND 和 OR 谓词 - 语义分析即查询是否有意义？ - 简化--例如，删除常见或多余的子表达式 - 生成执行计划--查询优化 - 执行计划并将结果返回给客户端 - 要描述其中的大部分操作，我们需要使用关系代数

Sets and Cartesian Product • Set -- a collection of objects characterized by some defining property --- e.g., a column in a database table such as last names of all staff • Cartesian Product of sets (×) -- one of the operations on sets --- e.g., consider two sets in the staff table --- set of all first names ◦ fName = {Mary, David} --- set of all last names ◦ lName = Howe, Ford --- their cartesian product ◦ fName × lName = (Mary,Howe), (Mary,Ford), (David, Howe), (David, Ford)集合 - 以某些定义属性为特征的对象集合笛卡尔乘积

Relation Relation -- defined between two sets and is a subset of cartesian product between those two sets • e.g., FirstNameOf = (Mary, Howe), (David, Ford)关系--在两个集合之间定义，是这两个集合的卡特积子集

Relational Model • The name 'relational model' comes from this mathematical notion of relation --- Where a relation is a set (collection) of tuples that have related objects such as first name and last name of the same person --- e.g., (fName, lName) is a relation • We can have relations over any number of sets --- e.g., (staffNo, fName, lName, position) • In general we can denote a relation as (A,B,C,D,... ,Z) where A, B, C and Z are all its attribute sets 关系是具有相关对象（如同一个人的名字和姓氏）的元组集合（集合）

Introducing Relational Algebra • What is relational algebra (RA) and why is it useful? --- RA is a symbolic formal way of describing relational operations --- RA says how, as well as what (order is important) --- Can use re-write rules to simplify and optimise complex queries... • Maths example: --- a + b · x + c · x 2 + d · x 3 ; 3 adds, 3 multiplies, 2 powers; --- a + x · (b + x · (c + x · d)); 3 adds, 3 multiplies;关系代数是描述关系操作的一种符号化的正式方式关系代数说明如何操作以及操作什么（顺序很重要） - 可以使用重写规则来简化和优化复杂的查询简而言之关系代数就是描述关系操作符号的含义

Basic Relational Algebra Operators • The basic RA operators are: --- Selection σ (Sigma); Projection π (Pi); Rename ρ (Rho) • Examples基本关系代数运算符- 选择 σ (Sigma)；投影 π (Pi)；重命名 ρ (Rho)

关系代数符号

Query Processing Example • Example: find all managers who work at a London Branch:

• There are at least 3 ways of writing this in RA notation:One of these will be the most efficient -- but which??

Lexical & Syntactical Analysis & Query Trees • Lexical & syntactical analysis involves: --- identifying keywords & literals --- identifying table names & aliases --- mapping aliases to table names --- identifying column names --- checking columns exist in tables • The output of this phase is a relational algebra tree (RAT)词法和句法分析与查询树 - 词法和句法分析包括： - 识别关键字和文字 - 识别表名和别名 - 将别名映射到表名 - 识别列名 - 检查列是否存在于表中 - 该阶段的输出是关系代数树 (RAT)

Semantic Analysis • Does the query make sense? --- Is the query legal SQL? --- Is the RAT connected? -- if not, query is incomplete! • Can the query be simplified? -- for example:查询的意义是否为合法SQL RAT连接情况（没有连接查询不完整）查询的简化

Normalisation & Normal Forms 正则化和正则表达式

• Why is this useful? -- sometimes a query might best be split into subqueries (remember set operations?): 将查询拆分成子查询• Disjunctions suggest union:分词代表并集

• Conjunctions suggest intersection:连词代表交集

Some RA Equivalences Rules (Re-Write Rules) • There are many equivalence rules (see C&B pp.736--739). Here are a few:RA等价规则

Generating Query Plans • Most RDBMSs generate candidate query plans by using RA re-write rules to generate alternate RATs and to move operations around each tree:生成查询计划--大多数 RDBMS 通过使用 RA 重写规则生成备用 RAT 并在每个树上移动操作，从而生成候选查询计划

• For complex queries, there may be a very large number of candidate plans...复杂查询可能有多种候选计划

Heuristic Query Optimisation Rules To avoid considering all possible plans, many DBMSs use heuristic rules: • keep together selections (σ) on the same table • perform selections as early as possible • re-write selection on a cartesian product as a join • perform "small joins" first • keep together projections (π) on the same relation • apply projections as early as possible • if duplicates are to be eliminated, use a sort algorithm启发式查询优化规则为避免考虑所有可能的计划，许多 DBMS 使用启发式规则： - 将同一表上的选择 (σ) 保持在一起 - 尽早执行选择 - 将卡特积上的选择重写为连接 - 先执行 "小连接" - 将同一关系上的投影 (π) 保持在一起 - 尽早应用投影 - 如果要消除重复，则使用排序算法

Cost-Based Query Optimisation • Remember, accessing disc blocks is expensive! • Ideally, the query optimiser should take into account: --- the size (cardinality) of each table --- which tables have indexes --- the type of each index -- clustered, non-clustered --- which predicates can be evaluated using an index --- how much memory query will need -- and for how long --- whether the query can be split over multiple CPUs基于成本的查询优化 - 理想情况下，查询优化器应考虑以下因素 - 每个表的大小（cardinality） - 哪些表有索引 - 每个索引的类型聚类、非聚类 - 哪些谓词可以使用索引进行评估 -查询需要多少内存，以及需要多长时间 - 查询是否可以在多个 CPU 上进行分割