B树index 的维护 Oracle

批量DML 跳过index 是不可能的

Applies To

All Users
Oracle WebCenter Content - Version 12.2.1.0.0 and later

Summary

We are ingesting a large amount of scanned content, and are passing the DirectReleaseNewCheckinDoc=1 parameter to allow it to bypass Indexing.

We are also performing large batches of deletes, but are experiencing an Indexer slowdown.

Is there a similar parameter that could be set to allow the Indexer to be bypassed on Delete?

Solution

This type of setting does not currently exist out of the box.

Please note though if such a setting were to exist for the deletes to bypass the index, it would allow for deleted items to persist in the search index.

So the only way to recover is to implement a full index rebuild, which would take even more time. It is highly recommended to not bypass the deletes against the index during ingestion and allow it to process as usual.

If a setting is still desired, please note the following Enhancement Request has been submitted.

Bug 28788265 - Request for Flag/Setting to Bypass Index for Deletes.

*Please use this note to following the ER. FAQ2076 How To Monitor Bugs And Enhancement Requests Via My Oracle Support

本文解释了Oracle B*tree索引的维护方式。

虽然这是一份较早的文件，但以下信息对后续版本仍然相关。

它旨在帮助试图理解Oracle B*tree索引如何维护的用户。

解决方案

Oracle 8 版本提供了五种索引方案：

B*树索引

B*树簇索引

哈希簇索引

反向键索引

位图索引

本文仅关注目前最

常用的B*树索引。B*树索引的理论超出本文

范围;更多信息请参阅计算机科学中涉及数据

结构的教材。

索引块格式

在B*树索引中，索引块要么是分支块，即B*树索引

中的上层块，要么是叶子块，是最低层的索引块。分支

块包含指向低层索引块的索引数据。叶节点

块包含所有索引数据值及用于定位

实际行的相应 ROWID。

复制代码

               Index Block Format|-----------------------------------------------------||                                                     ||            Index Block Header                       ||                                                     |------------------------------------------------------||                                                     ||             Space reserved for future updates       ||             and inserts of new rows with the        ||             appropriate key values                  ||                                                     ||-----------------------------------------------------| <- PCTFREE say 10|                                                     ||             Index Key Data                          ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||-----------------------------------------------------|

B*树索引创建

当使用 CREATE INDEX 语句创建 B*tree 索引时，可以指定参数

PCTFREE。PCTFREE 指定了为未来更新和插入索引块保留

的空间百分比。值为0则

不留空间给未来的插入和更新。它允许在创建索引时填充整个

块的数据区域。如果未指定PCTFREE，则

默认为10。该值保留每个区块的10%用于更新现有

键值和插入新键值。

因此，PCTFREE 仅在初始索引创建时具有相关性。它促使B*树进行最佳

分裂，为后续生长做准备。其理念是在初始创建时尽可能

多地进行分割，避免

在插入表格时支付罚款。这就是指数上高

PCTFREE设置能带来的效果。然而，如果你插入的密钥是

单调递增的（比如日期/时间字段），那么PCTFREE=0是最佳的。只有最

右边的索引叶块会入，所以创建时其他叶子块没有必要留

空。

索引创建后，索引块可以容纳最高可达所有

可用数据区域的键，包括用于 ITL 的空间。因此，索引块在可用数据区域完全使用之前无需

拆分。关键

是，一旦你通过索引创建阶段，PCTFREE就不会被考虑。

需要记住的是，索引中的每一行只有一个正确的块可以

存在，基于键值。

索引创建后插入索引条目

索引创建后，新的表行会创建新的索引条目。该条目

根据索引键值

插入相应的索引叶块，直到叶节点块满为止。如果插入时索引叶块已满，则

会发生索引块拆分，将一半的索引条目放入每个新的

索引叶块。在索引数据块内，保留空间用于索引

块头部。该区块的其余部分可用于索引键。

当索引键值有多个条目时，这些条目按 ROWID 顺序进入

叶块。因此，索引键值的所有块会依

次扫描，直到找到一个 ROWID 大于新行的条目，

然后将该行插入该块（这可能导致额外的块分割）。

原因在于以下查询：

从表中选some_columns COL_A = valueA，COL_B = valueB;

如果COL_A和COL_B都有非唯一索引，那么由于每个索引中的

条目按 ROWID 顺序存储，便于找到出现在

两个索引中的 ROWID。否则我们必须先排序 ROWID，才能

找到同时出现在两个索引中的 ROWID。

更新索引条目

索引实际上没有"更新"这个概念。当表格行更新

时，旧索引键会被删除，并插入一个新键（在 B*树的正确

位置）。

删除索引条目

当索引条目要删除时，该行从索引叶

块中删除，并释放到该块，以便

用相应键域进行后续插入。如果一个叶块有哪怕一个

条目，它仍然是树的一部分，因此只有位置上属于该

树的条目才能被容纳。一旦叶区块完全空，它会被

放入自由列表，此时可用于服务索引区块

拆分。

索引碎片化

为了确定索引碎片，可以使用以下SQL语句：

分析索引index_name验证结构;

栏名标题"索引名称"格式 a30

栏del_lf_rows标题"已删除|Leaf Rows的格式99999999

lf_rows_used栏标题为"Used|Leaf Rows 的格式 99999999

col ibadness 标题为 '% Deleted|Leaf Rows的赛制 999.99999

SELECT name，

del_lf_rows，

lf_rows - del_lf_rows lf_rows_used，

to_char（del_lf_rows / （lf_rows）*100，'999.9999'） ibadness

FROM index_stats

其中 name = upper （'&&index_name'）;

未定义index_name

经验法则是，如果表中10-15%的数据发生变化，那么你应该

考虑重建索引。

B*树平衡

Oracle 索引以 B* 树实现，始终保持平衡。

在神谕B*树中，树的根节点是0级。在非常小的B*树

中，根块也可以是叶块。

在大多数情况下，0至N-2层（其中N为

树的高度）的区块是分支区块。分支块不包含数据，仅包含

分隔符，用于从根块导航到叶节点

块。

所有叶子方块在神谕B*树中都是N-1级。所有存储在B*树

中的数据都存储在叶子块中。

"平衡树"的定义是所有数据处于同一层级。

这意味着树中所有数据的路径长度相同。由于所有

数据存储在叶子块中，且所有叶子块处于同一层级

，B*树始终保持平衡。没有办法让B*树失去平衡。

B*树中许多行被删除

如果一个表有10万行，那么在10万行中的99,999行和索引条目被

删除。指数是如何平衡的？

此时，行从索引中删除，空块被

插入索引自由列表。这些区块仍然是索引的一部分，

需要在遍历索引时访问。

因此，如果索引中只有一个条目，树中只有一个块（根/叶

块）。搜索树时只需访问一个块

即可回答查询。如果你加载一棵B*树，里面有10万行，结果得到一棵树，比如说有

3级。第0级和第1级是分支块，用于访问第2层叶子区块中

的数据。查询该树时，首先使用搜索键访问根

块，以找到树一级

的正确分支块。接着你用搜索键和分支块找到应该包含所需密钥的正确叶子

块。所以回答同一个问题需要三次区块

访问。现在，如果从

树中删除了99,999行，这些行会从索引中移除，但索引并未被折叠。

在这种情况下，你仍然有一个三级索引来存储那一行，访问该行仍然

需要三次块访问。我们需要三个块

访问而不是一个，并不意味着这棵树不平衡。树依然平衡

，只是包含了很多空方块。

这样做的原因是，当数据插入叶节点

块时，没有插入空间时，

会发生一个非常昂贵的操作，称为分割。分裂会创建新的叶子块，可能

还会形成新的分支块，以维持树的平衡。分割作业是B*树维护中

成本最高的，因此我们极

力避免。通过不在大规模删除后将已废弃的关卡从B*树中折叠出来

，这些带有分割的关卡在未来插入时无需

重新创建。

如果表格中的大部分行将被删除且不会很快

重新填充，建议先删除索引，删除行，然后重新创建

索引。

删除索引可以省去删除时需要维护

的索引，从而加快删除速度。删除后

重新创建索引，可以创建一个高度最佳且叶子块填充

至最佳水平的索引。

搜索词

删除插入重建

Applies To

All Users

Summary

This article explains how Oracle B*tree indexes are maintained.

Although this is an older document, the information below is still relevant to later versions.

It is intended to assist users who are trying to understand how an Oracle B*tree index is maintained.

Solution

Oracle version 8 provides five indexing schemes:

B*tree indexes

B*tree cluster indexes

hash cluster indexes

reverse key indexes

bitmap indexes

This article is only concerned with B*tree indexes which are currently the most

commonly used. The theory of B*tree indexes is beyond the scope of this

article; for more information refer to computer science texts dealing with data

structures.

Format of Index Blocks

Within a B*tree index, index blocks are either branch blocks, the upper blocks

within the B*tree index, or leaf blocks, the lowest level index blocks. Branch

blocks contain index data that point to lower level index blocks. Leaf blocks

contain every indexed data value and a corresponding ROWID used to locate the

actual row.

复制代码

               Index Block Format|-----------------------------------------------------||                                                     ||            Index Block Header                       ||                                                     |------------------------------------------------------||                                                     ||             Space reserved for future updates       ||             and inserts of new rows with the        ||             appropriate key values                  ||                                                     ||-----------------------------------------------------| <- PCTFREE say 10|                                                     ||             Index Key Data                          ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||                                                     ||-----------------------------------------------------|

B*tree Index Creation

When a B*tree index is created using the CREATE INDEX statement, the parameter

PCTFREE can be specified. PCTFREE specifies the percentage of space to leave

free for future updates and insertions to an index block. A value of 0 reserves

no space for future inserts and updates. It allows the entire data area of the

block to be filled when the index is created. If PCTFREE is not specified, it

defaults to 10. This value reserves 10% of each block for updates to existing

key values, and inserts of new key values.

Thus PCTFREE is only relevant at initial index creation. It causes optimal

splitting of the B*-tree in preparation for subsequent growth. The idea is to

do as much splitting as possible during initial creation and avoid having to

pay the penalty later during insertion into the table. This is what a high

PCTFREE setting on an index gives you. However, if your inserted keys are

monotonically increasing (say a date/time field) a PCTFREE=0 is best. Only the

rightmost index leaf block will be inserted into, so there's no point leaving

room in the other leaf blocks at creation time.

Following index creation, an index block can accommodate keys up to the full

available data area including space for ITLs. Thus an index block will not

require splitting until the available data area is fully used. The bottom line

is PCTFREE is not looked at once you pass the index creation phase. One thing

to remember is that each row in the index has only one correct block it can

live in, based on the key value.

Inserting an index entry after index creation

After index creation, a new table row will create a new index entry. This entry

is inserted into the appropriate index leaf block based on the index key values

until the leaf block is full. If on insert the index leaf block is full, then

an index block split will occur putting half of the index entries into each new

index leaf block. Within an index data block, space is reserved for the index

block header. The rest of the block is available for index keys.

Where an index key value has multiple entries, these entries are made into the

leaf block in ROWID order. So all the blocks for the index key value are

scanned in turn until an entry is found with a greater ROWID than the new row,

and the row is inserted into that block (which may cause extra block splitting).

The reason for this is for queries such as:

SELECT some_columns FROM table WHERE COL_A = valueA and COL_B = valueB;

If COL_A and COL_B both have non-unique indexes, then because the entries in

each index are stored in ROWID order, it makes it easy to find the ROWID's that

occur in both indexes. Otherwise we would have to sort the ROWID's before we

could find the ROWID's that occur in both indexes.

Updating an index entry

There is really no concept of an UPDATE to an index. When a table row is

updated, the old index key is deleted and a new key inserted (at the correct

location in the B*tree).

Deleting an index entry

When a index entry is to be deleted, the row is deleted from the index leaf

block and the space within the index leaf block released to the block for

further inserts with the appropriate key range. If a leaf block has even one

entry it is still part of the tree, hence only entries that belong in it

positionally can be accommodated. Once a leaf block is completely empty it is

put on the free list, at which point it can be used to service an index block

split.

Index Fragmentation

To ascertain index fragmentation, the following SQL statement can be used:

ANALYZE INDEX &&index_name VALIDATE STRUCTURE;

col name heading 'Index Name' format a30

col del_lf_rows heading 'Deleted|Leaf Rows' format 99999999

col lf_rows_used heading 'Used|Leaf Rows' format 99999999

col ibadness heading '% Deleted|Leaf Rows' format 999.99999

SELECT name,

del_lf_rows,

lf_rows - del_lf_rows lf_rows_used,

to_char(del_lf_rows / (lf_rows)*100,'999.99999') ibadness

FROM index_stats

where name = upper('&&index_name');

undefine index_name

As a rule of thumb if 10-15% of the table data changes, then you should

consider rebuilding the index.

B*Tree Balancing

Oracle indexes are implemented as B* Trees which are always balanced.

In an Oracle B*tree the root of the tree is at level 0. In a very small B*tree

the root block can also be a Leaf block.

In most cases, blocks on levels 0 through N-2 (where N is the height of the

tree) are Branch blocks. Branch blocks do not contain data, they simply contain

separators which are used to navigate from the root block to the Leaf

blocks.

All Leaf blocks are at level N-1 in Oracle B*trees. All data stored in a B*tree

is stored in the Leaf blocks.

The definition of a 'Balanced Tree' is that all the data is on the same level.

Which means that the path to all data in the tree is the same length. Since all

the data is stored in Leaf blocks and all the Leaf blocks are on the same level

the B*trees are always balanced. There is no way to unbalance a B* tree.

Deletion of many rows in the B*Tree

If a table has 100,000 rows and 99,999 of 100,000 rows and index entries are

deleted. How is the index balanced?

In this case the rows are deleted from the index, and the empty blocks inserted

onto the index free list. These blocks are still part of the index and will

need to be accessed when traversing the index.

Thus if an index has one entry in it, there is only one block (a root/leaf

block) in the tree. When searching the tree only one block needs to be accessed

to answer the query. If you load a B* Tree with 100,000 rows and get a tree with

say 3 Levels. Levels zero and one are Branch blocks used to access the data in

the Leaf blocks on level 2. When querying this tree you first access the root

block using the search key to find correct Branch block in level one of the

Tree. Next you use the search key and the Branch block to find the correct Leaf

block that should contain the key being sought. So it takes three block

accesses to answer the same query. Now if 99,999 rows were deleted from the

tree the rows are removed from the index but the index is not collapsed. In

this case you still have a 3 level index to store the one row and it will still

take three block accesses to access that row. The fact that we need three block

accesses instead of one does not mean this tree is unbalanced. The tree is

still balanced it just contains a lot of empty blocks.

The reason for doing this is that, when data is inserted into a Leaf block, and

there is no room for the insert, a very expensive operation called a split

occurs. The split creates a new Leaf block and possibly new Branch blocks as

well to maintain the balance of the tree. The split operation is by far the

most expensive operation that is done in the maintenance of B* trees so we go

to great lengths to avoid them. By not collapsing the now unused levels out of

the B*Trees after large deletes these levels (with splits) do not have to be

recreated during future inserts.

If most of the rows in a table are going to be deleted, and not refilled soon

after, it is advisable to drop the index, delete the rows, and then recreate

the index.

By dropping the index you save the index maintenance that needs to be done

during the delete thus speeding up the delete. By recreating the index after

the delete you create an index of optimal height and with the Leaf blocks

filled to an optimal level.

Search Words

delete insert rebuild