从上面的信息'Thu Mar 31 14:29:18 2016'与'WARNING: Waited 15 secs for write IO to PST disk 1 in group 3'可知在2016-03-31 14:29:18这个时间点,由于对ASM正常或高冗余磁盘所执行的ASM PST心跳检测出现了延迟,而且延迟时间超过了15秒,因此ASM实例dismount了ASM磁盘组。这种心跳方式会忽略对外部冗余磁盘组的检测,在ASM PST重新验证之前ASM实例会停止执行更多的PST心跳检测,但PST心跳延迟不会dismount外部冗余磁盘组。可能出现这种情况有以下几种原因:
1.由于多路径设备的某些物理路径脱机或丢失
2.多路径执行路径故障转移
3.服务器负载或存储/多路径/操作系统的维护,但这种情况可以排除因为负载不高,也没进行任何维护操作。
那么在出现这种情况时,可以通过以下几种方式来解决:
1.检查操作系统和存储对磁盘管理的响应时间
2.尽可能把磁盘的响应时间维持在15秒以下,但这将处于了操作系统,多路径软件与内核参数等多种因素,需要一一排查。
3.如果不能保证磁盘的响应时间维持在15秒以下,那么可以对ASM实例设置隐含参数_asm_hbetaiowait,对于这个参值数如果遇到bug 17274537可以设置120,并且这个bug在12.1.0.2中被修复了。
Thu Mar 31 14:30:05 2016
SQL> ALTER DISKGROUP GJJ_DG MOUNT /* asm agent *//* {0:23:23972} */
NOTE: cache registered group GJJ_DG number=3 incarn=0x46ed72a8
NOTE: cache began mount (not first) of group GJJ_DG number=3 incarn=0x46ed72a8
NOTE: Assigning number (3,0) to disk (/dev/rhdiskpower5)
NOTE: Assigning number (3,1) to disk (/dev/rhdiskpower6)
NOTE: Assigning number (3,2) to disk (/dev/rhdiskpower7)
Thu Mar 31 14:30:05 2016
GMON querying group 3 at 14 for pid 27, osid 29163580
NOTE: cache opening disk 0 of grp 3: GJJ_DG_0000 path:/dev/rhdiskpower5
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 3: GJJ_DG_0001 path:/dev/rhdiskpower6
NOTE: F1X0 found on disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 of grp 3: GJJ_DG_0002 path:/dev/rhdiskpower7
NOTE: F1X0 found on disk 2 au 2 fcn 0.0
NOTE: cache mounting (not first) normal redundancy group 3/0x46ED72A8 (GJJ_DG)
Thu Mar 31 14:30:05 2016
kjbdomatt send to inst 2
Thu Mar 31 14:30:05 2016
NOTE: attached to recovery domain 3
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Mar 31 14:30:05 2016
NOTE: LGWR attempting to mount thread 2 for diskgroup 3 (GJJ_DG)
NOTE: LGWR found thread 2 closed at ABA 22.4306
NOTE: LGWR mounted thread 2 for diskgroup 3 (GJJ_DG)
NOTE: LGWR opening thread 2 at fcn 0.383934 ABA 23.4307
NOTE: cache mounting group 3/0x46ED72A8 (GJJ_DG) succeeded
NOTE: cache ending mount (success) of group GJJ_DG number=3 incarn=0x46ed72a8
Thu Mar 31 14:30:05 2016
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup GJJ_DG was mounted
SUCCESS: ALTER DISKGROUP GJJ_DG MOUNT /* asm agent *//* {0:23:23972} */
从上面的信息可以看到在 2016-03-31 14:30:05这个时间点asm_agent在执行mount磁盘组(GJJ_DG),在14:29:18 dismount磁盘组到14:30:05 mount磁盘之间的时间间隔是47秒。所以等我检查时ASM磁盘组(GJJ_DG)已经mount成功了,于是只能让客户去检查操作系统,多路径软件和光纤。
Oracle 11.2 单实例连接ASM时需要注意的事项以及问题处理