Oracle 11g RAC 报错CRS

日期：2020-06-02 栏目：程序人生浏览：次

环境：AIX6.1 + Oracle11.2.0.4 RAC(2 nodes)

1.故障现象

2.定位问题

3.处理问题

1.故障现象

使用crsctl查看集群各资源状态，在任一节点都会直接报错CRS-4535, CRS-4000；但此时数据库是可以被正常访问的。
具体故障现象如下：

＃节点1查询 grid@bjdb1:/home/grid>crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors. ＃节点2查询 root@bjdb2:/>crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.

同样的，crs_stat -t 查看一样报错，错误码是CRS－0184：

root@bjdb1:/>crs_stat -t CRS-0184: Cannot communicate with the CRS daemon.

节点2也一样！

确定此时数据库是可以被正常访问的。如下：

＃节点2模拟客户端登录RAC集群，使用SCAN IP访问，发现可以正常访问到数据库 oracle@bjdb2:/home/oracle>sqlplus jingyu/jingyu@192.168.103.31/bjdb SQL*Plus: Release 11.2.0.4.0 Production on Mon Oct 10 14:24:47 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL>

RAC环境下的/etc/hosts文件相关内容：

#scan 192.168.103.31 scan-ip 2.定位问题

首先查看节点1的集群相关日志：
Clusterware(GI)的日志存放在$GRID_HOME/log/nodename下；
Clusterware(GI)对应几个关键的后台进程css，crs，evm，它们的日志分别存在cssd，crsd，evmd目录下；

节点1查看相关日志：

＃查看GI的alert日志文件，最近的记录只是提示GI所在存储空间使用率高，稍后清理下即可，而且目前还有一定空间剩余，显然并非是此次故障的原因。 root@bjdb1:/opt/u01/app/11.2.0/grid/log/bjdb1>tail -f alert*.log 2016-10-10 14:18:26.125: [crflogd(39190674)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/opt/u01/app/11.2.0/grid/crf/db/bjdb1'. 2016-10-10 14:23:31.125: [crflogd(39190674)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/opt/u01/app/11.2.0/grid/crf/db/bjdb1'. 2016-10-10 14:28:36.125: [crflogd(39190674)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/opt/u01/app/11.2.0/grid/crf/db/bjdb1'. 2016-10-10 14:33:41.125: [crflogd(39190674)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/opt/u01/app/11.2.0/grid/crf/db/bjdb1'. 2016-10-10 14:38:46.125: [crflogd(39190674)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/opt/u01/app/11.2.0/grid/crf/db/bjdb1'. ＃因为crsctl不可以使用，进而查看crs的日志信息，发现3号已经有报错，无法打开裸设备，从而导致无法初始化OCR；继续看错误信息，发现是这个时候访问共享存储时无法成功。怀疑此刻存储出现问题，需要进一步和现场人员确定此时间点是否有存储相关的施工。 root@bjdb1:/opt/u01/app/11.2.0/grid/log/bjdb1/crsd>tail -f crsd.log 2016-10-03 18:04:40.248: [ OCRRAW][1]proprinit: Could not open raw device 2016-10-03 18:04:40.248: [ OCRASM][1]proprasmcl: asmhandle is NULL 2016-10-03 18:04:40.252: [ OCRAPI][1]a_init:16!: Backend init unsuccessful : [26] 2016-10-03 18:04:40.253: [ CRSOCR][1] OCR context init failure. Error: PROC-26: Error while accessing the physical storage 2016-10-03 18:04:40.253: [ CRSD][1] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage 2016-10-03 18:04:40.253: [ CRSD][1][PANIC] CRSD exiting: Could not init OCR, code: 26 2016-10-03 18:04:40.253: [ CRSD][1] Done.

节点2查看相关日志：

＃查看GI的alert日志，发现节点2的ctss有CRS-2409的报错，虽然根据MOS文档 ID 1135337.1说明，This is not an error. ctssd is reporting that there is a time difference and it is not doing anything about it as it is running in observer mode.只需要查看两个节点的时间是否一致，但实际上查询节点时间一致: root@bjdb2:/opt/u01/app/11.2.0/grid/log/bjdb2>tail -f alert*.log 2016-10-10 12:29:22.145: [ctssd(5243030)]CRS-2409:The clock on host bjdb2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2016-10-10 12:59:38.799: [ctssd(5243030)]CRS-2409:The clock on host bjdb2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2016-10-10 13:34:11.402: [ctssd(5243030)]CRS-2409:The clock on host bjdb2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2016-10-10 14:12:44.168: [ctssd(5243030)]CRS-2409:The clock on host bjdb2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. 2016-10-10 14:44:04.824: [ctssd(5243030)]CRS-2409:The clock on host bjdb2 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode. ＃查看节点2的crs日志，发现和节点1相近的时间点，同样访问共享存储出现了问题，进而无法初始化OCR root@bjdb2:/opt/u01/app/11.2.0/grid/log/bjdb2/crsd>tail -f crsd.log 2016-10-03 18:04:31.077: [ OCRRAW][1]proprinit: Could not open raw device 2016-10-03 18:04:31.077: [ OCRASM][1]proprasmcl: asmhandle is NULL 2016-10-03 18:04:31.081: [ OCRAPI][1]a_init:16!: Backend init unsuccessful : [26] 2016-10-03 18:04:31.081: [ CRSOCR][1] OCR context init failure. Error: PROC-26: Error while accessing the physical storage 2016-10-03 18:04:31.082: [ CRSD][1] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage 2016-10-03 18:04:31.082: [ CRSD][1][PANIC] CRSD exiting: Could not init OCR, code: 26 2016-10-03 18:04:31.082: [ CRSD][1] Done.

转载注明出处：https://www.heiqu.com/2394d84c1d95b65a97926476506ae43f.html

Oracle 11g RAC 报错CRS

相关推荐