To help with diagnosing excessive redo generation.
First, some background on redo generation:
What is the Online Redo Log?
The most crucial structure for recovery operations is the online redo log,
which consists of two or more pre-allocated files that store all changes made
to the database as they occur. Every instance of an Oracle database has an
associated online redo log to protect the database in case of an instance
failure.
What are the contents of the Online Redo Log?
Online redo log files are filled with redo records. A redo record, also called
a redo entry, is made up of a group of change vectors, each of which is a
description of a change made to a single block in the database. For example,
if you change a salary value in an employee table, you generate a redo record
containing change vectors that describe changes to the data segment block for
the table, the rollback segment data block, and the transaction table of the
rollback segments.
Redo entries record data that you can use to reconstruct all changes made to
the database, including the rollback segments. Therefore, the online redo log
also protects rollback data. When you recover the database using redo data,
Oracle reads the change vectors in the redo records and applies the changes
to the relevant blocks.
TROUBLESHOOTING STEPS
Scope & Application
You experience a large amount of redo being generated while performing a DML
even if "NOLOGGING" is used.
For assistance with determining what session is causing excessive redo, please see
Note:167492.1 titled "How to Find Sessions Generating Lots of Redo".
How to diagnose excessive redo generation.
Check the following:
- Confirm if the table or index has "NOLOGGING" set.
Issue the following statement.
select table_name,logging from all_tables where table_name = <table name>;
-or-
select table_name,logging from all_indexes where index_name = <index name>;
-
Table has no triggers that might cause some indirect DML on other tables.
-
Auditing is not the contributor for this excessive redo generation.
-
The tablespace is not in hot backup mode.
-
Note that only the following operations can make use of NOLOGGING mode:
-
direct load (SQL*Loader)
-
direct-load INSERT
-
CREATE TABLE ... AS SELECT
-
CREATE INDEX
-
ALTER TABLE ... MOVE PARTITION
-
ALTER TABLE ... SPLIT PARTITION
-
ALTER INDEX ... SPLIT PARTITION
-
ALTER INDEX ... REBUILD
-
ALTER INDEX ... REBUILD PARTITION
-
INSERT, UPDATE, and DELETE on LOBs in NOCACHE NOLOGGING mode stored out of line
Consider the following illustration.
Both tables below have "nologging" set at table level.
SQL> desc redo1
Name Null? Type
X NUMBER
Y NUMBER
SQL> desc redotesttab
Name Null? Type
X NUMBER
Y NUMBER
begin
for x in 1..10000 loop
insert into scott.redotesttab values(x,x+1);
-- or
-- insert /*+ APPEND */ into scott.redotesttab values(x,x+1);
end loop;
end;
Note: This will generate redo even if you provide the hint because this
is not a direct-load insert.
Now, consider the following bulk inserts, direct and simple.
SQL> select name,value from v$sysstat where name like '%redo size%';
NAME VALUE
redo size 27556720
SQL> insert into scott.redo1 select * from scott.redotesttab;
50000 rows created.
SQL> select name,value from v$sysstat where name like '%redo size%';
NAME VALUE
redo size 28536820
SQL> insert /*+ APPEND */ into scott.redo1 select * from scott.redotesttab;
50000 rows created.
SQL> select name,value from v$sysstat where name like '%redo size%';
NAME VALUE
redo size 28539944
You will notice that the redo generated via the simple insert is "980100" while
a direct insert generates only "3124".
Purpose of this document is to show how to identify the causes of excessive redo generation and what we can do to mitigate the problem
TROUBLESHOOTING STEPS
First of all, we need to remark that high redo generation is always a consequence of certain activity in the database and it is expected behavior, oracle is optimized for redo generation and there are no bugs regarding the issue.
The main cause of high redo generation is usually a high DML activity during a certain period of time and it's a good practice to first examine modifications on either database level (parameters, any maintenance operations,...) and application level (deployment of new application, modification in the code, increase in the users,..).
What we need to examine:
- Is supplemental logging enabled? The amount of redo generated when supplemental logging is enabled is quite high when compared to when supplemental logging is disabled.
What Causes High Redo When Supplemental Logging is Enabled (Doc ID 1349037.1)
-
Are a lot of indexes being used?, reducing the number of indexes or using the attribute NOLOGGING will reduce the redo considerably
-
Are all the operation really in need of the use of LOGGING? From application we can reduce redo by making use of the clause NOLOGGING. Note that only the following operations can make use of NOLOGGING mode:
-
direct load (SQL*Loader)
-
direct-load INSERT
-
CREATE TABLE ... AS SELECT
-
CREATE INDEX
-
ALTER TABLE ... MOVE PARTITION
-
ALTER TABLE ... SPLIT PARTITION
-
ALTER INDEX ... SPLIT PARTITION
-
ALTER INDEX ... REBUILD
-
ALTER INDEX ... REBUILD PARTITION
-
INSERT, UPDATE, and DELETE on LOBs in NOCACHE NOLOGGING mode stored out of line
To confirm if the table or index has "NOLOGGING" set.
Issue the following statement.
select table_name,logging from all_tables where table_name = <table name>;
-or-
select table_name,logging from all_indexes where index_name = <index name>;
-
Do tables have triggers that might cause some indirect DML on other tables?
-
Is Auditing enabled the contributor for this excessive redo generation?
-
Are tablespaces in hot backup mode?
-
Examine the log switches:
select lg.group#,lg.bytes/1024/1024 mb, lg.status, lg.archived,lf.member
from vlogfile lf, vlog lg where lg.group# = lf.group# order by 1, 2;
select to_char(first_time,'YYYY-MON-DD') "Date", to_char(first_time,'DY') day,
to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'999') "00",
to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'999') "01",
to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'999') "02",
to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'999') "03",
to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'999') "04",
to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'999') "05",
to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'999') "06",
to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'999') "07",
to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'999') "08",
to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'999') "09",
to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'999') "10",
to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'999') "11",
to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'999') "12",
to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'999') "13",
to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'999') "14",
to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'999') "15",
to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'999') "16",
to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'999') "17",
to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'999') "18",
to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'999') "19",
to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'999') "20",
to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'999') "21",
to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'999') "22",
to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'999') "23" ,
count(*) Total from v$log_history group by to_char(first_time,'YYYY-MON-DD'), to_char(first_time,'DY')
order by to_date(to_char(first_time,'YYYY-MON-DD'),'YYYY-MON-DD')
This will give us an idea of the times when the high peaks of redo are happening
- Examine AWR report:
Next step will be examining the AWR from the hour where we have had the highest number of log switches, and confirm with the redo size that these log switches are actually caused by a lot of redo generation.
In the AWR we can also see the sql with most of the gets/executions to have an idea of the activity that is happening in the database and generating redo and we can also see the segments with the biggest number of block changes and the sessions performing these changes.
Another way to find these sessions is described in SQL: How to Find Sessions Generating Lots of Redo or Archive logs (Doc ID 167492.1)
To find these segments we can also use queries:
SELECT to_char(begin_interval_time,'YY-MM-DD HH24') snap_time,
dhso.object_name,
sum(db_block_changes_delta) BLOCK_CHANGED
FROM dba_hist_seg_stat dhss,
dba_hist_seg_stat_obj dhso,
dba_hist_snapshot dhs
WHERE dhs.snap_id = dhss.snap_id
AND dhs.instance_number = dhss.instance_number
AND dhss.obj# = dhso.obj#
AND dhss.dataobj# = dhso.dataobj#
AND begin_interval_time BETWEEN to_date('11-01-28 13:00','YY-MM-DD HH24:MI') <<<<<<<<<<<< Need to modify the time as per the above query where more redo log switch happened (keep it for 1 hour)
AND to_date('11-01-28 14:00','YY-MM-DD HH24:MI') <<<<<<<<<<<< Need to modify the time as per the above query where more redo log switch happened (interval shld be only 1 hour)
GROUP BY to_char(begin_interval_time,'YY-MM-DD HH24'),
dhso.object_name
HAVING sum(db_block_changes_delta) > 0
ORDER BY sum(db_block_changes_delta) desc ;
-- Then : What SQL was causing redo log generation :
SELECT to_char(begin_interval_time,'YYYY_MM_DD HH24') WHEN,
dbms_lob.substr(sql_text,4000,1) SQL,
dhss.instance_number INST_ID,
dhss.sql_id,
executions_delta exec_delta,
rows_processed_delta rows_proc_delta
FROM dba_hist_sqlstat dhss,
dba_hist_snapshot dhs,
dba_hist_sqltext dhst
WHERE upper(dhst.sql_text) LIKE '%<segment_name>%' >>>>>>>>>>>>>>>>>> Update the segment name as per the result of previous query result
AND ltrim(upper(dhst.sql_text)) NOT LIKE 'SELECT%'
AND dhss.snap_id=dhs.snap_id
AND dhss.instance_number=dhs.instance_number
AND dhss.sql_id=dhst.sql_id
AND begin_interval_time BETWEEN to_date('11-01-28 13:00','YY-MM-DD HH24:MI') >>>>>>>>>>>> Update time frame as required
AND to_date('11-01-28 14:00','YY-MM-DD HH24:MI') >>>>>>>>>>>> Update time frame as required
- Finally, to troubleshoot further the issue and know the exact commands are being recorded at that particular time frame we can use log miner and mine the archivelog from the concerned time frame. We can look on v$archived_log and find the archived log generated at that particular time frame.
How To Determine The Cause Of Lots Of Redo Generation Using LogMiner (Doc ID 300395.1)
SYMPTOMS
NOTE: In the images and/or the document content below, the user information and data used represents fictitious data from the Oracle sample schema(s) or Public Documentation delivered with an Oracle database product. Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner.
- After migrated from 11.2 to 12c, large redo size is generated when set autotrace on for query
12c:
Statistics
----------------------------------------------------------
15458 recursive calls
2 db block gets
486798 consistent gets
792525 physical reads
26519728 redo size *<<<<<<<<<<----------
912915 bytes sent via SQL*Net to client
1254 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
37 sorts (memory)
1 sorts (disk)
1000 rows processed
11.2:
Statistics
----------------------------------------------------------
2719 recursive calls
2 db block gets
410794 consistent gets
758747 physical reads
0 redo size *<<<<<<<<<<----------
899477 bytes sent via SQL*Net to client
1226 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
1000 rows processed
- The large redo size is still generated even though unified auditing was disabled
- The log miner results show some INTERNAL operations besides the INSERT and DELETE of PLAN_TABLE$
For example:
SCN TIMESTAMP SEG_NAME SEG_OWNER TABLE_NAME TABLE_SPACE OPERATION SQL_REDO
1.1115E+13 8/2/2018 10:05 PLAN_TABLE$ SYS PLAN_TABLE$ SYSTEM INSERT /* No SQL_REDO for temporary tables */
1.1115E+13 8/2/2018 10:05 PLAN_TABLE$ SYS PLAN_TABLE$ SYSTEM DELETE /* No SQL_REDO for temporary tables */
1.1115E+13 8/2/2018 10:05 INTERNAL
- This problem can be reproduced by simple query as following, and the statistics of v$mystat shows that redo size for lost write detection is generated:
SQL> conn <USERNAME>/<Password> as sysdba
Connected.
SQL> SELECT name, value
FROM vmystat, vstatname
WHERE vmystat.statistic#=vstatname.statistic#
and v$statname.name like '%redo size%'
ORDER BY 1;
2 3 4 5
NAME VALUE
---------------------------------------------------------------- ----------
redo size 0
redo size for direct writes 0
redo size for lost write detection 0
SQL> select count(*) from t1;
COUNT(*)
----------
316777
SQL> SELECT name, value
FROM vmystat, vstatname
WHERE vmystat.statistic#=vstatname.statistic#
and v$statname.name like '%redo size%'
ORDER BY 1;
2 3 4 5
NAME VALUE
---------------------------------------------------------------- ----------
redo size 502756
redo size for direct writes 0
redo size for lost write detection 502756
CHANGES
In 12c environment, the initialization parameter DB_LOST_WRITE_PROTECT is set to non-default value from default value of NONE.
CAUSE
From 11.1 onwards, if DB_LOST_WRITE_PROTECT is set to non-default value of TYPICAL or FULL, lost write detection will be enabled Then buffer cache reads are logged which leads to generation of redo for selects. This is necessary for lost write detection. This is expected behavior. For further information of lost write detection related functionality, please refer to following Note and online documentation.
Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration Document 1302539.1
Database Reference
DB_LOST_WRITE_PROTECT
SOLUTION
No action need to take if lost write detection is needed.
If the environment is not in a Data Guard Configuration or lost write detection is not needed, please set DB_LOST_WRITE_PROTECT to default value of NONE.
SQL> conn <USERNAME>/<PASSWORD> as sysdba
Connected.
SQL> alter system set DB_LOST_WRITE_PROTECT=none;
System altered.
SQL> conn <USERNAME>/<PASSWORD> as sysdba
Connected.
SQL> SELECT name, value
FROM vmystat, vstatname
WHERE vmystat.statistic#=vstatname.statistic#
and v$statname.name like '%redo size%'
ORDER BY 1;
2 3 4 5
NAME VALUE
redo size 0
redo size for direct writes 0
redo size for lost write detection 0
SQL> select count(*) from t1;
COUNT(*)
316777
SQL> SELECT name, value
FROM vmystat, vstatname
WHERE vmystat.statistic#=vstatname.statistic#
and v$statname.name like '%redo size%'
ORDER BY 1;
2 3 4 5
NAME VALUE
redo size 0
redo size for direct writes 0
redo size for lost write detection 0
--- How to determine the cause of lots of redo generation using LogMiner ---
Using OPERATION Codes to Understand Redo Information
There are multiple operation codes which can generate the redo information, using following guide lines you can identify the operation codes which are causing the high redo generation and you need to take an appropriate action on it to reduce the high redo generation.
NOTE:
Redo records are not all equally sized. So remember that just because certain statements show up a lot in the LogMiner output, this does not guarantee that you have found the area of functionality generating the excessive redo.
What are these OPERATION codes ?
- INSERT / UPDATE / DELETE -- Operations are performed on SYS objects are also considered as an Internal Operations.
- COMMIT -- This is also "Internal" operation, you will get line "commit;" in the column sql_redo.
- START -- This is also "Internal" operation, you will get line "set transaction read write;" in sql_redo INTERNAL -- Dictionary updates
- SELECT_FOR_UPDATE - This is also an Internal operation and oracle generates the redo information for "select" statements which has "for update" clause.
In general INTERNAL operations are not relevant, so to query the relevant data, use "seg_owner=' in the "where" clause.
Examples:
How to extract relevant information from the view v$logmnr_contents?
- This SQL lists operations performed by user SCOTT
SQL> select distinct operation,username,seg_owner from v$logmnr_contents where seg_owner='SCOTT';
OPERATION USERNAME SEG_OWNER
DDL SCOTT SCOTT
DELETE SCOTT SCOTT
INSERT SCOTT SCOTT
UPDATE SCOTT SCOTT
- This SQL lists the undo and redo associated with operations that user SCOTT performed
SQL> select seg_owner,operation,sql_redo,sql_undo from v$logmnr_contents where SEG_owner='SCOTT';
SCOTT DDL
create table LM1 (c1 number, c2 varchar2(10));
SCOTT INSERT
insert into "SCOTT"."LM1"("C1","C2") values ('101','AAAA');
delete from "SCOTT"."LM1" where "C1" = '101' and "C2" = 'AAAA'
and ROWID = 'AAAHfBAABAAAMUqAAA';
SCOTT UPDATE update "SCOTT"."LM1" set "C2" = 'YYY'
where "C2" = 'EEE' and ROWID = 'AAAHfBAABAAAMUqAAE';
update "SCOTT"."LM1" set "C2" = 'EEE' where "C2" = 'YYY'
and ROWID = 'AAAHfBAABAAAMUqAAE';
INSERT / UPDATE / DELETE -- Operations are performed on SYS objects are also considered as an Internal Operations.
- This SQL lists undo and redo generated for UPDATE statements issues by user SCOTT
SQL> select username, seg_owner,operation,sql_redo,sql_undo from v$logmnr_contents where operation ='UPDATE' and USERNAME='SCOTT';
UNAME SEG_OW OPERATION SQL_REDO SQL_UNDO
SCOTT SYS UPDATE update "SYS"."OBJ$" set "OBJ#" = '1'..... update ....
SCOTT SYS UPDATE update "SYS"."TSQ$" set "GRANTO..... update .......
SCOTT SYS UPDATE update "SYS"."SEG$" set "TYPE#" = '5'.. update......
As per above result user SCOTT has updated SYS objects so, if you query on USERNAME, you may get incorrect result. So, better to query v$logmnr_contents on SEG_OWNER.
- Identifying Operation Counts
Run the following query to see the OPERATION code row count from v$logmnr_contents, to understand which OPERATION code has generated lots of redo information.
SQL> select operation,count(*) from v$logmnr_contents group by operation;
OPERATION COUNT(*)
COMMIT 22236
DDL 2
DELETE 1
INSERT 11
INTERNAL 11
SELECT_FOR_UPDATE 32487
START 22236
UPDATE 480
8 rows selected
- Identifying User Counts
Run the following query to check user activity and operation counts:
SQL> select seg_owner,operation,count(*) from v$logmnr_contents group by seg_owner,operation;
SEG_OWNER OPERATION COUNT(*)
SCOTT COMMIT 22236
SCOTT DDL 2
SCOTT DELETE 1
...
BILLY COMMIT 12899
BILLY DDL 5
BILLY DELETE 2
...
NOTE:
Be aware of next known issue:
If you are not using "select for update" statements often in your application and yet find a high operation count for operation code "SELECT_FOR_UPDATE" then you might be hitting a known issue.
To confirm this check whether SQL_REDO shows select,update statements on AQ_QUEUE_TABLE_AFFINITIES and AQ_QUEUE_TABLES.
If you see these selects and updates, then check the value of the Init.ora parameter AQ_TM_PROCESSES. The default value is AQ_TM_PROCESSES = 0 meaning that the queue monitor is not created.
If you are not using Advanced Queuing, then set AQ_TM_PROCESSES back to zero to avoid lots of redo generation on objects AQ_QUEUE_TABLE_AFFINITIES and AQ_QUEUE_TABLES.
How to find sessions generating lots of redo
- fact: Oracle Server - Enterprise Edition 8
- fact: Oracle Server - Enterprise Edition 9
- fact: Oracle Server - Enterprise Edition 10
SOLUTION
To find sessions generating lots of redo, you can use either of the following methods. Both methods examine the amount of undo generated. When a transaction
generates undo, it will automatically generate redo as well.
The methods are:
- Query V$SESS_IO. This view contains the column BLOCK_CHANGES which indicates how much blocks have been changed by the session. High values indicate a
session generating lots of redo.
The query you can use is:
SQL> SELECT s.sid, s.serial#, s.username, s.program,
2 i.block_changes
3 FROM vsession s, vsess_io i
4 WHERE s.sid = i.sid
5 ORDER BY 5 desc, 1, 2, 3, 4;
Run the query multiple times and examine the delta between each occurrence of BLOCK_CHANGES. Large deltas indicate high redo generation by the session.
- Query V$TRANSACTION. This view contains information about the amount of undo blocks and undo records accessed by the transaction (as found in the
USED_UBLK and USED_UREC columns).
The query you can use is:
SQL> SELECT s.sid, s.serial#, s.username, s.program,
2 t.used_ublk, t.used_urec
3 FROM vsession s, vtransaction t
4 WHERE s.taddr = t.addr
5 ORDER BY 5 desc, 6 desc, 1, 2, 3, 4;
Run the query multiple times and examine the delta between each occurrence of USED_UBLK and USED_UREC. Large deltas indicate high redo generation by
the session.
You use the first query when you need to check for programs generating lots of redo when these programs activate more than one transaction. The latter query
can be used to find out which particular transactions are generating redo.