环境:RHEL5.5 + Oracle 11g RAC
客户联系说关闭cluster后,重启启动,发现CRS无法启动。提示Cannot communicate with Cluster Ready Services。
登录主机检查
[root@rac-2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
检查RAC的日志
[grid@rac-2 rac-2]$ tail -100 alertrac-2.log | more
2016-09-23 03:16:17.396
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:18.697
[crsd(22676)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:18.704
[crsd(22676)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:19.433
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:20.737
[crsd(22685)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:20.747
[crsd(22685)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
检查crsd.log
2016-09-23 03:16:20.461: [ CRSMAIN][1106286912] Policy Engine is not initialized yet!
2016-09-23 03:16:20.463: [ CRSMAIN][3556262304] Initializing OCR
[ CLWAL][3556262304]clsw_Initialize: OLR initlevel [70000]
2016-09-23 03:16:20.735: [ OCRASM][3556262304]proprasmo: Error in open/create file in dg [ORC_VOTE]
[ OCRASM][3556262304]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
2016-09-23 03:16:20.735: [ OCRASM][3556262304]ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.737: [ OCRASM][3556262304]proprasmo: kgfoCheckMount returned [7]
2016-09-23 03:16:20.737: [ OCRASM][3556262304]proprasmo: The ASM instance is down
2016-09-23 03:16:20.738: [ OCRRAW][3556262304]proprioo: Failed to open [+ORC_VOTE]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2016-09-23 03:16:20.738: [ OCRRAW][3556262304]proprioo: No OCR/OLR devices are usable
2016-09-23 03:16:20.738: [ OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.738: [ GIPC][3556262304] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
2016-09-23 03:16:20.740: [ default][3556262304]clsvactversion:4: Retrieving Active Version from local storage.
2016-09-23 03:16:20.742: [ OCRRAW][3556262304]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.
2016-09-23 03:16:20.745: [ OCRRAW][3556262304]proprinit: Could not open raw device
2016-09-23 03:16:20.745: [ OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.746: [ OCRAPI][3556262304]a_init:16!: Backend init unsuccessful : [26]
2016-09-23 03:16:20.747: [ CRSOCR][3556262304] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.748: [ CRSMAIN][3556262304] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2016-09-23 03:16:20.748: [ CRSD][3556262304][PANIC] CRSD exiting: Could not init OCR, code: 26
2016-09-23 03:16:20.748: [ CRSD][3556262304] Done.
从错误信息判断是ASM出现了问题,检查ASM磁盘
[root@rac-2 ~]# /etc/init.d/oracleasm listdisks
ASMDATA01
ASMDATA02
ASMDATA03
OCR_VOTE
磁盘是存在的。
关闭CRS后,检查CRS相关进程
[root@rac-2 ~]# ps -ef | grep d.bin
root 3899 1 0 Jan13 ? 00:18:59 /u02/11.2.0/grid/bin/ohasd.bin reboot
grid 4267 1 0 Jan13 ? 00:34:32 /u02/11.2.0/grid/bin/oraagent.bin
grid 4280 1 0 Jan13 ? 00:00:16 /u02/11.2.0/grid/bin/mdnsd.bin
grid 4293 1 0 Jan13 ? 00:06:10 /u02/11.2.0/grid/bin/gpnpd.bin
root 4304 1 0 Jan13 ? 01:31:25 /u02/11.2.0/grid/bin/orarootagent.bin
grid 4307 1 0 Jan13 ? 00:27:27 /u02/11.2.0/grid/bin/gipcd.bin
root 4322 1 0 Jan13 ? 00:45:33 /u02/11.2.0/grid/bin/osysmond.bin
root 4332 1 0 Jan13 ? 00:01:24 /u02/11.2.0/grid/bin/cssdmonitor
root 4350 1 0 Jan13 ? 00:02:39 /u02/11.2.0/grid/bin/cssdagent
grid 4362 1 0 Jan13 ? 01:45:38 /u02/11.2.0/grid/bin/ocssd.bin
root 4437 1 0 Jan13 ? 00:28:42 /u02/11.2.0/grid/bin/octssd.bin reboot
grid 4461 1 0 Jan13 ? 00:00:22 /u02/11.2.0/grid/bin/evmd.bin
grid 4843 4461 0 Jan13 ? 00:00:00 /u02/11.2.0/grid/bin/evmlogger.bin -o /u02/11.2.0/grid/evm/log/evmlogger.info -l /u02/11.2.0/grid/evm/log/evmlogger.log
root 4941 1 0 Jan13 ? 00:21:18 /u02/11.2.0/grid/bin/ologgerd -m rac-1 -r -d /u02/11.2.0/grid/crf/db/rac-2
root 23122 22979 0 03:54 pts/3 00:00:00 grep d.bin
CRS已经关闭,但是好多进程没有释放。手动kill掉这些进程
[root@rac-2 ~]# ps -ef | grep d.bin | awk '{print $2}' | xargs kill -9
kill 23131: No such process
重启CRS,问题解决。