Query Processing Overview • Objective: Provide correct answer to query (almost) as efficiently as possible 查询处理的目的是 高效准确查询Query Processing Operations • Query processing involves several operations: --- Lexical & syntactic analysis -- transform SQL into an internal form --- Normalisation -- collecting AND and OR predicates --- Semantic analysis -- i.e., does the query make sense? --- Simplification -- e.g., remove common or redundant sub-expressions --- Generating an execution plan -- query optimisation --- Executing the plan and returning results to the client • To describe most of these, we need to use Relational Algebra - 词法和语法分析 - 将 SQL 转换为内部形式 - 规范化 - 收集 AND 和 OR 谓词 - 语义分析 即查询是否有意义? - 简化--例如,删除常见或多余的子表达式 - 生成执行计划--查询优化 - 执行计划并将结果返回给客户端 - 要描述其中的大部分操作,我们需要使用关系代数
Sets and Cartesian Product • Set -- a collection of objects characterized by some defining property --- e.g., a column in a database table such as last names of all staff • Cartesian Product of sets (×) -- one of the operations on sets --- e.g., consider two sets in the staff table --- set of all first names ◦ fName = {Mary, David} --- set of all last names ◦ lName = Howe, Ford --- their cartesian product ◦ fName × lName = (Mary,Howe), (Mary,Ford), (David, Howe), (David, Ford)集合 - 以某些定义属性为特征的对象集合 笛卡尔乘积
Relation Relation -- defined between two sets and is a subset of cartesian product between those two sets • e.g., FirstNameOf = (Mary, Howe), (David, Ford)关系--在两个集合之间定义,是这两个集合的卡特积子集
Relational Model • The name 'relational model' comes from this mathematical notion of relation --- Where a relation is a set (collection) of tuples that have related objects such as first name and last name of the same person --- e.g., (fName, lName) is a relation • We can have relations over any number of sets --- e.g., (staffNo, fName, lName, position) • In general we can denote a relation as (A,B,C,D,... ,Z) where A, B, C and Z are all its attribute sets 关系是具有相关对象(如同一个人的名字和姓氏)的元组集合(集合)
Introducing Relational Algebra • What is relational algebra (RA) and why is it useful? --- RA is a symbolic formal way of describing relational operations --- RA says how, as well as what (order is important) --- Can use re-write rules to simplify and optimise complex queries... • Maths example: --- a + b · x + c · x 2 + d · x 3 ; 3 adds, 3 multiplies, 2 powers; --- a + x · (b + x · (c + x · d)); 3 adds, 3 multiplies;关系代数是描述关系操作的一种符号化的正式方式 关系代数说明如何操作以及操作什么(顺序很重要) - 可以使用重写规则来简化和优化复杂的查询 简而言之 关系代数就是描述关系操作符号的含义
Basic Relational Algebra Operators • The basic RA operators are: --- Selection σ (Sigma); Projection π (Pi); Rename ρ (Rho) • Examples基本关系代数运算符- 选择 σ (Sigma);投影 π (Pi);重命名 ρ (Rho)
关系代数符号
Query Processing Example • Example: find all managers who work at a London Branch:
• There are at least 3 ways of writing this in RA notation:One of these will be the most efficient -- but which??
Lexical & Syntactical Analysis & Query Trees • Lexical & syntactical analysis involves: --- identifying keywords & literals --- identifying table names & aliases --- mapping aliases to table names --- identifying column names --- checking columns exist in tables • The output of this phase is a relational algebra tree (RAT)词法和句法分析与查询树 - 词法和句法分析包括: - 识别关键字和文字 - 识别表名和别名 - 将别名映射到表名 - 识别列名 - 检查列是否存在于表中 - 该阶段的输出是关系代数树 (RAT)
Semantic Analysis • Does the query make sense? --- Is the query legal SQL? --- Is the RAT connected? -- if not, query is incomplete! • Can the query be simplified? -- for example:查询的意义 是否为合法SQL RAT连接情况(没有连接查询不完整) 查询的简化
Normalisation & Normal Forms 正则化和正则表达式
• Why is this useful? -- sometimes a query might best be split into subqueries (remember set operations?): 将查询拆分成子查询• Disjunctions suggest union:分词代表并集
• Conjunctions suggest intersection:连词代表交集
Some RA Equivalences Rules (Re-Write Rules) • There are many equivalence rules (see C&B pp.736--739). Here are a few:RA等价规则
Generating Query Plans • Most RDBMSs generate candidate query plans by using RA re-write rules to generate alternate RATs and to move operations around each tree:生成查询计划--大多数 RDBMS 通过使用 RA 重写规则生成备用 RAT 并在每个树上移动操作,从而生成候选查询计划
• For complex queries, there may be a very large number of candidate plans...复杂查询可能有多种候选计划
Heuristic Query Optimisation Rules To avoid considering all possible plans, many DBMSs use heuristic rules: • keep together selections (σ) on the same table • perform selections as early as possible • re-write selection on a cartesian product as a join • perform "small joins" first • keep together projections (π) on the same relation • apply projections as early as possible • if duplicates are to be eliminated, use a sort algorithm启发式查询优化规则 为避免考虑所有可能的计划,许多 DBMS 使用启发式规则: - 将同一表上的选择 (σ) 保持在一起 - 尽早执行选择 - 将卡特积上的选择重写为连接 - 先执行 "小连接" - 将同一关系上的投影 (π) 保持在一起 - 尽早应用投影 - 如果要消除重复,则使用排序算法
Cost-Based Query Optimisation • Remember, accessing disc blocks is expensive! • Ideally, the query optimiser should take into account: --- the size (cardinality) of each table --- which tables have indexes --- the type of each index -- clustered, non-clustered --- which predicates can be evaluated using an index --- how much memory query will need -- and for how long --- whether the query can be split over multiple CPUs基于成本的查询优化 - 理想情况下,查询优化器应考虑以下因素 - 每个表的大小(cardinality) - 哪些表有索引 - 每个索引的类型 聚类、非聚类 - 哪些谓词可以使用索引进行评估 -查询需要多少内存,以及需要多长时间 - 查询是否可以在多个 CPU 上进行分割