如何维护 Oracle B*tree 索引。 多列index是合并一起指向rowid的

尽管这是一份较旧的文档,但以下信息仍与更高版本相关。

Although this is an older document, the information below is still relevant to later versions.

范围

它旨在帮助试图了解如何维护 Oracle B*tree 索引的用户。

Oracle 版本 8 提供了五种索引方案:

B*树索引
B*树簇索引
哈希簇索引
反向键索引
位图索引

Oracle version 8 provides five indexing schemes:

B*tree indexes
B*tree cluster indexes IOT?
hash cluster indexes
reverse key indexes
bitmap indexes

本文仅关注目前最常
用的 B*tree 索引。B*树索引的理论超出了本文
的范围;有关更多信息,请参阅处理数据
结构的计算机科学教科书。

++索引块的格式++

在 B*tree 索引中,索引块要么是分支块(B*tree 索引中的上层块
),要么是叶块(最低级别的索引块)。分支
块包含指向较低级别索引块的索引数据。叶块
包含每个索引数据值和用于查找
实际行的相应 ROWID。

Format of Index Blocks

Within a B*tree index, index blocks are either branch blocks, the upper blocks
within the B*tree index, or leaf blocks, the lowest level index blocks. Branch
blocks contain index data that point to lower level index blocks. Leaf blocks
contain every indexed data value and a corresponding ROWID used to locate the
actual row.

Index Block Format

复制代码
               Index Block Format

|-----------------------------------------------------|
|                                                     |
|            Index Block Header                       |
|                                                     |
------------------------------------------------------|
|                                                     |
|             Space reserved for future updates       |
|             and inserts of new rows with the        |
|             appropriate key values                  |
|                                                     |
|-----------------------------------------------------| <- PCTFREE say 10
|                                                     |
|     Index Key Data                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|                                                     |
|-----------------------------------------------------|

++B*树索引创建++

使用 CREATE INDEX 语句创建 B*tree 索引时,可以指定参数
PCTFREE。PCTFREE指定为将来更新和插入索引块留
出的空间百分比。值为 0 时,不会为将来的插入和更新保留
空间。它允许在创建索引时填充
块的整个数据区域。如果未指定 PCTFREE,则
默认为 10。此值保留每个块的 10%,用于更新现有
键值和插入新键值。

B*tree Index Creation

When a B*tree index is created using the CREATE INDEX statement, the parameter
PCTFREE can be specified. PCTFREE specifies the percentage of space to leave
free for future updates and insertions to an index block. A value of 0 reserves
no space for future inserts and updates. It allows the entire data area of the
block to be filled when the index is created. If PCTFREE is not specified, it
defaults to 10. This value reserves 10% of each block for updates to existing
key values, and inserts of new key values.

因此,PCTFREE仅在初始索引创建时才有意义。它导致 B* 树的最佳
分裂,为后续生长做准备。这个想法是在
初始创建期间尽可能多地进行拆分,并避免在以后插入到表中时
支付罚款。这就是索引上的高
PCTFREE设置为您提供的。但是,如果插入的键
单调递增(例如日期/时间字段),则最好使用 PCTFREE=0。只有最
右边的索引叶块会插入其中,因此在创建时在其他叶块中留
出空间是没有意义的。

Thus PCTFREE is only relevant at initial index creation. It causes optimal
splitting of the B*-tree in preparation for subsequent growth. The idea is to
do as much splitting as possible during initial creation and avoid having to
pay the penalty later during insertion into the table. This is what a high
PCTFREE setting on an index gives you. However, if your inserted keys are
monotonically increasing (say a date/time field) a PCTFREE=0 is best. Only the
rightmost index leaf block will be inserted into, so there's no point leaving
room in the other leaf blocks at creation time.

在索引创建之后,索引块可以容纳完整的
可用数据区域(包括 ITL 空间)的密钥。因此,在可用数据区域被完全使用之前,索引块将不需要
拆分。底线是,一旦您通过了索引创建阶段,就不会查看PCTFREE。要记住的一件事是
,索引中的每一行都只有一个正确的块,它可以位于其中,具体取决于键值。

Following index creation, an index block can accommodate keys up to the full
available data area including space for ITLs. Thus an index block will not
require splitting until the available data area is fully used. The bottom line
is PCTFREE is not looked at once you pass the index creation phase. One thing
to remember is that each row in the index has only one correct block it can
live in, based on the key value.

Inserting an index entry after index creation

++在创建索引后插入索引条目++

创建索引后,新的表行将创建新的索引条目。此条目
将根据索引键值
插入到相应的索引叶块中,直到叶块已满。如果在插入时索引叶块已满,则
将发生索引块拆分,将一半的索引条目放入每个新的
索引叶块中。在索引数据块中,为索引
块标头保留空间。块的其余部分可用于索引键。

如果索引键值具有多个条目,则这些条目将按 ROWID 顺序放入
叶块中。因此,将
依次扫描索引键值的所有块,直到找到具有大于新行的 ROWID 的条目,
并将该行插入到该块中(这可能会导致额外的块拆分)。

After index creation, a new table row will create a new index entry. This entry
is inserted into the appropriate index leaf block based on the index key values
until the leaf block is full. If on insert the index leaf block is full, then
an index block split will occur putting half of the index entries into each new
index leaf block. Within an index data block, space is reserved for the index
block header. The rest of the block is available for index keys.

50%-50%

Where an index key value has multiple entries, these entries are made into the
leaf block in ROWID order. So all the blocks for the index key value are
scanned in turn until an entry is found with a greater ROWID than the new row,
and the row is inserted into that block (which may cause extra block splitting).

这样做的原因是针对以下查询:

从表中选择some_columns,其中 COL_A = valueA 和 COL_B = valueB;

如果 COL_A 和 COL_B 都具有非唯一索引,则由于每个索引中的
条目按 ROWID 顺序存储,因此可以轻松查找两个索引中
出现的 ROWID。否则,我们必须先对 ROWID 进行排序,
然后才能找到两个索引中出现的 ROWID。

The reason for this is for queries such as:

SELECT some_columns FROM table WHERE COL_A = valueA and COL_B = valueB;

If COL_A and COL_B both have non-unique indexes, then because the entries in

each index are stored in ROWID order, it makes it easy to find the ROWID's that

occur in both indexes. Otherwise we would have to sort the ROWID's before we

could find the ROWID's that occur in both indexes.

++更新索引条目++

实际上没有对索引进行更新的概念。
更新表行时,将删除旧索引键并插入新键(在 B*tree 中的正确
位置)。

++删除索引条目++

当要删除索引条目时,将从索引叶
块中删除该行,并将索引叶块内的空间释放到该块中,以便
使用适当的键范围进一步插入。如果一个叶块有一个
条目,它仍然是树的一部分,因此只能容纳在位置上属于它的
条目。一旦叶块完全为空,它就会
被放在空闲列表中,此时它可用于为索引块
拆分提供服务。

Updating an index entry

There is reallyno concept of an UPDATEto an index. When a table row is
updated, the old index key is deleted and a new key inserted (at the correct
location in the B*tree).

Deleting an index entry

When a index entry is to be deleted, the row is deleted from the index leaf
block and the space within the index leaf block released to the block for
further inserts with the appropriate key range. If a leaf block has even one
entry it is still part of the tree, hence only entries that belong in it
positionally can be accommodated. Once a leaf block is completely empty it is
put on the free list, at which point it can be used to service an index block
split.

++索引碎片++

若要确定索引碎片,可以使用以下 SQL 语句:

分析索引 &index_name 验证结构;

col 名称标题"索引名称"格式 a30
col del_lf_rows标题"已删除|叶子行"格式99999999
列lf_rows_used标题"Used|叶子行的格式99999999
col ibadness 标题 '% Deleted|叶行的格式 999.99999

SELECT name,del_lf_rows,lf_rows

  • del_lf_rows lf_rows_used,to_char
    (del_lf_rows / (lf_rows)*100,'999.99999') ibadness
    FROM index_stats
    其中 name = upper('&&index_name');

取消定义index_name

根据经验,如果 10-15% 的表数据发生变化,则应
考虑重新生成索引。

++B*树平衡++

Oracle 索引以 B* 树的形式实现,这些树始终是平衡的。

在 Oracle B*tree 中,树的根位于级别 0。在非常小的 B*tree
中,根块也可以是 Leaf 块。

在大多数情况下,级别 0 到 N-2(其中 N 是
树的高度)上的块是分支块。分支块不包含数据,它们只包含
分隔符,用于从根块导航到叶
块。

在 Oracle B* 树中,所有 Leaf 块都处于 N-1 级别。存储在 B*tree
中的所有数据都存储在 Leaf 块中。

"平衡树"的定义是所有数据都在同一级别上。
这意味着树中所有数据的路径长度相同。由于所有
数据都存储在 Leaf 块中,并且所有 Leaf 块都在同一级别
上,因此 B*树始终保持平衡。没有办法使 B* 树不平衡。

Index Fragmentation

To ascertain index fragmentation, the following SQL statement can be used:

ANALYZE INDEX &&index_name VALIDATE STRUCTURE;

col name heading 'Index Name' format a30
col del_lf_rows heading 'Deleted|Leaf Rows' format 99999999
col lf_rows_used heading 'Used|Leaf Rows' format 99999999
col ibadness heading '% Deleted|Leaf Rows' format 999.99999

SELECT name,
del_lf_rows,
lf_rows - del_lf_rows lf_rows_used,
to_char(del_lf_rows / (lf_rows)*100,'999.99999') ibadness
FROM index_stats
where name = upper('&&index_name');

undefine index_name

As a rule of thumb if 10-15% of the table data changes, then you should
consider rebuilding the index.

B*Tree Balancing

Oracle indexes are implemented as B* Trees which are always balanced.

In an Oracle B*tree the root of the tree is at level 0. In a very small B*tree
the root block can also be a Leaf block.

In most cases, blocks on levels 0 through N-2 (where N is the height of the
tree) are Branch blocks. Branch blocks do not contain data, they simply contain
separators which are used to navigate from the root block to the Leaf
blocks.

All Leaf blocks are at level N-1 in Oracle B*trees. All data stored in a B*tree
is stored in the Leaf blocks.

The definition of a 'Balanced Tree' is that all the data is on the same level.
Which means that the path to all data in the tree is the same length. Since all
the data is stored in Leaf blocks and all the Leaf blocks are on the same level
the B*trees are always balanced. There is no way to unbalance a B* tree.

++删除 B*Tree 中的许多行++

如果表有 100,000 行,并且 100,000 行中的 99,999 行和索引条目将被
删除。index是如何平衡的?

在这种情况下,将从索引中删除行,并将空块插入
到索引空列表中。这些块仍然是索引的一部分,在遍历索引时
需要访问。

因此,如果索引中有一个条目,则树中只有一个块(根/叶
块)。搜索树时,
只需访问一个块即可回答查询。如果您加载一个有 100,000 行的 B* 树,并得到一个有
3 个级别的树。级别 0 和 1 是用于访问级别 2 的 Leaf 块中
的数据的分支块。查询此树时,首先使用搜索键访问根
块,以在树的第一
级中找到正确的分支块。接下来,使用搜索键和 Branch 块查找应包含要查找的键的正确 Leaf
块。因此,需要三次块
访问才能回答相同的查询。现在,如果从
树中删除了 99,999 行,则行将从索引中删除,但索引不会折叠。
在这种情况下,您仍然有一个 3 级索引来存储一行,并且仍然需要
三个块访问才能访问该行。我们需要三个块访问而不是一个,
但这并不意味着这棵树是不平衡的。这棵树
仍然是平衡的,它只是包含很多空块。

这样做的原因是,当数据插入到 Leaf 块中并且
没有空间容纳插入时,会发生一种非常昂贵的操作,称为拆分
。拆分会创建一个新的 Leaf 块,可能
还会创建新的 Branch 块,以保持树的平衡。拆分操作是迄今为止在维护 B* 树时完成的
最昂贵的操作,因此我们
竭尽全力避免它们。通过在大量删除后不将现在未使用的关卡从
B*Tree 中折叠出来,这些关卡(带有拆分)不必在将来的插入中
重新创建。

如果要删除表中的大多数行,并且不会很快
重新填充,则建议删除索引,删除行,然后重新创建
索引。

通过删除索引,可以保存删除期间节约需要执行
的索引维护,从而加快删除速度。通过在删除后
重新创建索引,可以创建最佳高度的索引,并将 Leaf 块
填充到最佳级别。

++搜索词++

删除、插入、重建

Deletion of many rows in the B*Tree

If a table has 100,000 rows and 99,999 of 100,000 rows and index entries are
deleted. How is the index balanced?

In this case the rows are deleted from the index, and the empty blocks inserted
onto the index free list. These blocks are still part of the index and will
need to be accessed when traversing the index.

Thus if an index has one entry in it, there is only one block (a root/leaf
block) in the tree. When searching the tree only one block needs to be accessed
to answer the query. If you load a B* Tree with 100,000 rows and get a tree with
say 3 Levels. Levels zero and one are Branch blocks used to access the data in
the Leaf blocks on level 2. When querying this tree you first access the root
block using the search key to find correct Branch block in level one of the
Tree. Next you use the search key and the Branch block to find the correct Leaf
block that should contain the key being sought. So it takes three block
accesses to answer the same query. Now if 99,999 rows were deleted from the
tree the rows are removed from the index but the index is not collapsed. In
this case you still have a 3 level index to store the one row and it will still
take three block accesses to access that row. The fact that we need three block
accesses instead of one does not mean this tree is unbalanced. The tree is
still balanced it just contains a lot of empty blocks.

The reason for doing this is that, when data is inserted into a Leaf block, and
there is no room for the insert, a very expensive operation called a split
occurs. The split creates a new Leaf block and possibly new Branch blocks as
well to maintain the balance of the tree. The split operation is by far the
most expensive operation that is done in the maintenance of B* trees so we go
to great lengths to avoid them. By not collapsing the now unused levels out of
the B*Trees after large deletes these levels (with splits) do not have to be
recreated during future inserts.

If most of the rows in a table are going to be deleted, and not refilled soon
after, it is advisable to drop the index, delete the rows, and then recreate
the index.

By dropping the index you save the index maintenance that needs to be done
during the delete thus speeding up the delete. By recreating the index after
the delete you create an index of optimal height and with the Leaf blocks
filled to an optimal level.

Search Words

delete insert rebuild

相关推荐
Java探秘者4 分钟前
Maven下载、安装与环境配置详解:从零开始搭建高效Java开发环境
java·开发语言·数据库·spring boot·spring cloud·maven·idea
2301_7869643610 分钟前
3、练习常用的HBase Shell命令+HBase 常用的Java API 及应用实例
java·大数据·数据库·分布式·hbase
阿维的博客日记1 小时前
图文并茂解释水平分表,垂直分表,水平分库,垂直分库
数据库·分库分表
wrx繁星点点2 小时前
事务的四大特性(ACID)
java·开发语言·数据库
小小娥子3 小时前
Redis的基础认识与在ubuntu上的安装教程
java·数据库·redis·缓存
DieSnowK3 小时前
[Redis][集群][下]详细讲解
数据库·redis·分布式·缓存·集群·高可用·新手向
-XWB-3 小时前
【MySQL】数据目录迁移
数据库·mysql
老华带你飞3 小时前
公寓管理系统|SprinBoot+vue夕阳红公寓管理系统(源码+数据库+文档)
java·前端·javascript·数据库·vue.js·spring boot·课程设计
我明天再来学Web渗透3 小时前
【hot100-java】【二叉树的层序遍历】
java·开发语言·数据库·sql·算法·排序算法
Data 3174 小时前
Hive数仓操作(十一)
大数据·数据库·数据仓库·hive·hadoop