在故障点之前,ASM的alert日志中未报过TNS错误,但故障点的时候,ASM中也出现了TNS连接关闭的错误,因此,也不排除网络因素引起的ASM实例故障。4点多的时候有一个NOTE: ASMB process exiting due to lack of ASM file activity for 305 seconds,根据文档“NOTE ASMB process exiting due to lack of ASM file activity (文档 ID 754110.1)”,这个NOTE可以忽略。到了8:46的时候,ASM实例随着数据库的重启也正常启动了
最终,找到了一篇MOS文档,是和这次遇到的故障相关的,应该就是Bug 13914613导致的这个问题:
Bug 13914613 Excessive time holding shared pool latch in kghfrunp with auto memory management
This note gives a brief overview of bug 13914613.
The content was last updated on: 17-DEC-2014
Click here for details of each of the sections below.
Affects:
Product (Component)
Oracle Server (Rdbms)
Range of versions believed to be affected
(Not specified)
Versions confirmed as being affected
Platforms affected
Generic (all / most platforms affected)
Fixed:
The fix for 13914613 is first included in
Description A session may spend excessive time holding the shared pool latch under kghfrunp when auto memory management is used. This can ultimately result in an instance crash due to other sessions holding critical resources too long. eg: database crash due to ORA-240 and ORA-15064 Rediscovery Notes Session wait chains show "latch: shared pool" waits Wait times for shared pool latch over 10 seconds for the same holder Call stacks of shared pool latch holder show kghfrunp as a currently executing function. Auto memory management is in use such that "duration" subheaps are used eg: ASMM (sga_target) or AMM (memory_target) being used. Workaround Set init.ora parameter _enable_shared_pool_durations=false Getting a Fix Use one of the "Fixed" versions listed above (for Patch Sets / bundles use the latest version available as contents are cumulative - the "Fixed" version listed above is the first version where the fix is included)
分析了产生故障的原因,我给客户的解决方案就是以下两条:
1.设置隐含参数"_enable_shared_pool_durations"=false
2.升级到11.2.0.4的patchset
在CentOS 5.5 i386 上安装 Oracle 10G XE