Oracle重启后业务连接大量library cache lock

一、现象
数据库和前段应用重启后，出现大量library cache lock等待事件。
二、分析解决
本次异常原因是：原因定位3：库缓存对象无效 Library cache object Invalidations
三、各类情况具体分析如下
复制代码
原因定位1：由于文字导致的非共享SQL（硬解析？）  Unshared SQL Due to Literals
1 检查AWR报告中库缓存命中率 "Misses in the library cache"  
2 检查是否硬解析SQL
解决方案1：使用绑定变量
解决方案2： 使用CURSOR_SHARING初始化参数（影响性能）  use the CURSOR_SHARING initialization parameter
1 CURSOR_SHARING参数将自动用语句中的绑定值替换文字值。该参数的设置如下:
EXACT: Leave the statement as it was written with literals (default value)  确切:保持语句原样，使用文字(默认值)
FORCE: Substitute all literals with binds (as much as possible)             强制:用绑定替换所有文字(尽可能多)
SIMILAR: Substitute literals with binds only if the query's execution plan won't change (i.e., safe literal replacement) 
类似:仅当查询的执行计划不会改变时，才使用绑定替换文字(即安全的文字替换)
绑定值的使用可能会导致某些语句的执行计划更糟糕。
解决方案：确保参数对性能的影响


原因定位2： 共享的SQL超时刷出了缓存 Shared SQL being aged out
1 共享池太小，导致许多可以共享的语句从库缓存中过时并在以后重新加载。每次重载都需要硬解析，并且会影响CPU和锁存器。
2 AWR报告中 "Library Cache statistics" 显示reloads是否很高(共享池太小)
Library Cache statistics section shows that reloads are high (usually several thousand per hour) and little or no invalidations are seen
解决方案1：增加共享池的大小 （风险：swap）
解决方案2(10+)：使用ASMM调整共享内存 
  Use the Automatic Shared Memory Manager (ASMM) to adjust the shared pool size
  You will need to set a reasonable value for SGA_MAX_SIZE and SGA_TARGET to enable ASMM.
解决方案3：将常用的大型PL/SQL和游标对象保留在共享池中 （保留太多有ORA-4031 errors风险）
  Keep ("pin") frequently used large PL/SQL and cursor objects in the shared pool
  Use the DBMS_SHARED_POOL.KEEP() procedure to mark large, frequently used PL/SQL and SQL objects in the shared pool and avoid them being aged out. 


原因定位3： 库缓存对象无效 Library cache object Invalidations
When objects (like tables or views) are altered via DDL or collecting statistics, the cursors that depend on them are invalidated. This will cause the cursor to be hard parsed when it is executed again and will impact CPU and latches.
当对象(如表或视图)通过DDL或收集统计数据而被更改时，依赖于它们的游标就会失效。这将导致游标在再次执行时被硬解析，并将影响CPU和latch。
TKProf：
Look at the top statements and determine if they are being hard parsed; 
these will have "Misses in the library cache" equal or close to the total number of parses  这将使"库缓存中的未命中"等于或接近解析的总数
AWR or statspack reports: AWR或者statspack报告
现象1：Library Cache statistics section shows that reloads are high (usually several thousand per hour) and invalidations are high
     "库缓存统计"部分显示重新加载率很高(通常每小时几千次)，无效率很高
现象2：The "% SQL with executions>1" is over 60%, meaning statements are being shared 指标"% SQL with executions"超过60,游标开始被共享
现象3：Check the Dictionary Statistics section of the report and look for non-zero values in the Modification Requests column, meaning that DDL occurred on some objects.
     检查报告的字典统计部分，并在修改请求列中查找非零值，这意味着DDL发生在某些对象上。
解决方案1：Do not perform DDL operations during busy periods 请勿在繁忙时段执行DDL操作，将DDL推迟到一个空闲的时间
  DDL通常会导致库缓存对象无效，这可能会级联到许多不同的依赖对象，如游标。无效对库缓存、共享池、行缓存和CPU有很大影响，因为它们可能需要同时进行许多硬解析
解决方案2：Do not collect optimizer statistics during busy periods 不要在繁忙时段收集统计信息  
  收集统计数据(使用ANALYZE或DBMS_STATS)将导致库缓存对象无效，这可能会级联到许多不同的依赖对象，如游标。无效对库缓存、共享池、行缓存和CPU有很大影响，因为它们可能需要同时进行许多硬解析。
解决方案3：Do not perform TRUNCATE operations during busy periods 不要在繁忙时段进行truncate操作，将DDL推迟到一个空闲的时间
  Document 123214.1 Truncate - Causes Invalidations in the LIBRARY CACHE
truncate table cog.t1;
insert into cog.t1 values(1,'2');
commit;
select /* test_t1_truncate */ * from cog.t1;
select sql_text,version_count,loads,invalidations,parse_calls from v$sqlarea where sql_text like '%test_t1_truncate%';



原因定位4：跨会话编译对象(PL/SQL) Objects being compiled across sessions
One or more sessions are compiling objects (typically PL/SQL) while another session wants to pin the same object prior to executing or compiling it. 
One or more sessions will wait on library cache pin in Share mode (if it just wants to execute it) or eXclusive mode (if it want to compile/change the object).
解决方案1：避免在同一时间或繁忙时间编译不同会话中的对象 Avoid compiling objects in different sessions at the same time or during busy times
  不要跨并发会话或在高峰使用期间编译相互依赖的对象。 Do not compile interdependent objects across concurrent sessions or during peak usage.   




原因定位5：审计已打开  Auditing is turned on
Auditing will increase the need to acquire library cache locks and potentially increase contention for them. This is especially true in a RAC environment where the library cache locks become database-wide (across all instances).
审计将增加获取库缓存锁的需求，并可能增加对它们的争用。在库缓存锁成为数据库范围的(跨所有实例)的RAC环境中尤其如此。
原因：audit_trail参数设置为none以外的值  audit_trail parameter is set to something other than "none"
解决方案：评估是否需要审计，如果不是绝对必要，请考虑禁用审核。  Consider disabling auditing if it is not absolutely necessary.




原因定位6：在RAC环境SQL未共享  Unshared SQL in a RAC environment
Library cache locks waits may occur in RAC environments when applications are not sharing SQL.In single-instance environments, library cache and shared pool latch contention is typically the symptom for unshared SQL. However, in RAC, the main symptom may be library cache lock contention. 
当应用程序不共享SQL时，在RAC环境中可能会发生库缓存锁定等待,在单实例环境中，库缓存和共享池latch争用通常是非共享SQL的症状。但是，在RAC中，主要症状可能是库缓存锁争用。
现象1：许多语句硬解析，库缓存锁等待是硬解析的一部分   Many statements are hard parsed, library cache lock waits occur as part of a hard parse
现象2：AWR报告中"% SQL with executions>1"小于60%      Low percentage for "% SQL with executions>1" (less than 60%)
现象3：AWR报告中"Instance Efficiency Percentages"的"Execute to Parse %:"和"Soft Parse %:"  软解析率低于80%   soft parse ratio is below 80%
解决方案1：使用绑定值重写SQL  Rewrite the SQL to use bind values
解决方案2：使用CURSOR_SHARING初始化参数(EXACT,FORCE,SIMILAR)  Use the CURSOR_SHARING initialization parameter  (慎用，应先在应用中测试效果)



原因定位6：行级触发器的广泛使用 Extensive use of row level triggers
When row level triggers are fired frequently, higher than usual library cache activity may occur, because of the need to check if mutating tables are being read. During trigger execution, it is possible that the application tries to read mutating tables, i.e., tables that are in the process of being modified by the statement that caused the trigger to fire. As this may lead to inconsistencies, it is not allowed, and the application should receive the error ORA-4091. The mechanism to detect this error involves one library cache lock acquisition per table referenced in each select statement executed.
当频繁触发行级触发器时，可能会发生比通常更高的库缓存活动，因为需要检查是否正在读取变异表。在触发器执行期间，应用程序可能会尝试读取变异表，即正被导致触发器触发的语句修改的表。因为这可能会导致不一致，所以这是不允许的，应用程序应该会收到错误ORA-4091。检测此错误的机制包括为执行的每个select语句中引用的每个表获取一个库缓存锁。
The extent of the problem depends on how many times the row triggers fire rather than on the number of row triggers have been created (i.e., one trigger that fires 10000 times will cause more problems than 100 triggers that fire once).
问题的程度取决于行触发器触发的次数，而不是创建的行触发器的数量(即，一个触发器触发10000次将比100个触发器触发一次导致更多的问题)。
现象：行级触发器触发的证据(可能是一些与触发器相关的递归SQL) evidence of a row level trigger firing (maybe some recursive SQL related to a trigger)
解决方案：评估行触发器是否需要，有时不需要行触发器来完成功能。考虑是否有替代方案。   Evaluate the need for the row trigger 
   Sometimes row triggers aren't needed to accomplish the functionality. Consider if there is an alternative.



原因定位7：过多的子游标  Excessive Amount of Child Cursors
A large number of child cursors are being created for some SQL statements. This activity is causing contention among various sessions that are creating child cursors concurrently or with other sessions that also need similar resources (latches and mutexes).
一些SQL语句创建大量子游标。此活动会导致并发创建子游标的各种会话之间发生争用，或者与同样需要类似资源(闩锁和互斥锁)的其他会话发生争用。
现象1：AWR报告中 "SQL ordered by Version Count" 有大于500个版本的SQL
现象2：select * from (select sql_id,version_count,loads,invalidations,parse_calls from gV$SQLAREA order by version_count desc) where rownum<=10;
现象3：@sqlnoshare sql_id 查看SQL不能共享的原因：  uery V$SQL_SHARED_CURSOR to see the reasons why SQL isn't being shared.
解决方案：初始化参数CURSOR_SHARING不当的设置为SIMILAR     Inappropriate use of parameter CURSOR_SHARING set to SIMILAR
 Depends on the change made. Changing the CURSOR_SHARING initialization parameter to FORCE is risky if done at the database instance level, but less risky at the session level. Changing the application SQL is not as risky since only the single statement is affected.
 风险：取决于所做的更改。如果在数据库实例级别将CURSOR_SHARING初始化参数更改为FORCE是有风险的，但在会话级别风险较小。更改应用程序的SQL没有那么大的风险，因为只有一条语句受到影响。
 
  The difference between SIMILAR and FORCE is that SIMILAR forces similar statements to share the SQL area without deteriorating execution plans. Setting CURSOR_SHARING to FORCE forces similar statements to share the SQL area potentially deteriorating execution plans.
  SIMILAR和FORCE的区别在于SIMILAR强制相似的语句共享SQL区域，而不恶化执行计划。将CURSOR_SHARING设置为FORCE会强制类似的语句共享可能会恶化执行计划的SQL区域
  
One of the cursor sharing criteria when literal replacement is enabled with CURSOR_SHARING as SIMILAR is that bind value should match initial bind value if the execution plan is going to change depending on the value of the literal. The reason for this is we might get a sub-optimal plan if we use the same cursor. This would typically happen when, depending on the value of the literal, the optimizer is going to chose a different plan.
For example, if we have a predicate with " > ", then each execution with different bind values would result in a new child cursor because that would ensure that the plan didn't change (a range predicate influences cost and plans), if this was an equality predicate, we would always share the same child cursor. 
当使用CURSOR_SHARING=SIMILAR启用文字替换时，游标共享标准之一是，如果执行计划将根据文字的值而改变，绑定值应该与初始绑定值匹配。原因是，如果我们使用相同的光标，我们可能会得到一个次优的计划。这通常发生在优化器根据文字值选择不同计划的时候。
例如，如果我们有一个带有">"的谓词，那么每次带有不同绑定值的执行都会产生一个新的子游标，因为这将确保计划不会更改(范围谓词影响成本和计划)，如果这是一个等式谓词，我们将总是共享同一个子游标。
Avoiding the use of CURSOR_SHARING set to SIMILAR entails either rewriting the SQL in the application so that it uses bind values and still gets a good plan (hints, profiles, or outlines may be needed), or using CURSOR_SHARING set to FORCE which will avoid generating child cursors but can cause plans to be sub-optimal.
要避免使用类似的CURSOR_SHARING=SIMILAR设置，需要重写应用程序中的SQL，以便它使用绑定值并仍然获得一个好的计划(可能需要提示、配置文件或大纲)，或者将CURSOR_SHARING设置为FORCE，这将避免生成子游标，但会导致计划次优。

Changing the CURSOR_SHARING initialization parameter to FORCE is easy; changing the application to use binds will take more effort.
将CURSOR_SHARING初始化参数更改为FORCE很容易；将应用程序更改为使用绑定需要更多的努力。