为了方便相关问题测试,我在本机搭建了一套RAC环境,但昨天打开后却发现RAC无法启动了,不错,就当一次实战演练了。
测试环境:RedHat6.3_x64+ Oracle11gr2 RAC
二、处理过程:
在启动虚拟机一段时间后,通过命令查看,信息如下:
[grid@rac01 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[grid@rac01 ~]$ crsctl status res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
查看CRS服务状态
[root@rac01 rac-cluster]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
启动Cluster资源
[root@rac01 bin]#crsctl start cluster
CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'
CRS-4000: Command Start failed, or completed with errors.
相关相关日志,获取到如下信息,并未在其他日志中找到更有效的参考信息,如果有好的建议,请联系在下:
---alter.log
[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
---ocssd.log
2015-06-12 03:07:14.722: [ CLSF][2402883328]Allocated CLSF context
2015-06-12 03:07:14.723: [ SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:
2015-06-12 03:07:14.723: [ CSSD][2402883328]clssnmlalloccx:phyname rac01
2015-06-12 03:07:14.742: [ CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online
2015-06-12 03:07:14.742: [ CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01
2015-06-12 03:07:14.747: [ SKGFD][2381424384]NOTE: No asm libraries found in the system
2015-06-12 03:07:14.747: [ CLSF][2381424384]Allocated CLSF context
2015-06-12 03:07:14.748: [ SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:
2015-06-12 03:07:14.748: [ SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:
2015-06-12 03:07:15.749: [ SKGFD][2381424384]NOTE: No asm libraries found in the system
查看CSS信息
[grid@rac01 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
下面我通过ASM实例查看相关ASM磁盘信息:
SQL> select NAME , STATE FROM V$ASM_DISKGROUP;
NAME STATE
------------------------------ -----------
DATA DISMOUNTED
CRS DISMOUNTED
OK,尝试MOUNT磁盘组(后续,整理是发现奇怪问题,既然前边我们查看css信息时 磁盘是online,那么这我们却无法mount,并未尝试强制mount,有待进一步研究)
SQL> alter diskgroup crs mount;
alter diskgroup crs mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"
尝试MOUNT DATA磁盘组
SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> select NAME , STATE FROM V$ASM_DISKGROUP;
NAME STATE
------------------------------ -----------
DATA MOUNTED
CRS DISMOUNTED
注:现在写下当时处理问题的过程,并未过多深入研究问题,在整理文档时有了更多思考,暂且不讨论。
既然磁盘组DATA可以用,那么我们先将CRS等信息存储到DATA磁盘组中,之前并未手动备份过CRS等信息,只能通过自动备份信息恢复。
停止CRS服务,两个节点都执行
[root@rac01 rac-cluster]# crsctl stop has -f
再次启动,以NOCRS方式启动CRS,节点1执行
[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'
CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'
CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'
CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac01'
CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
修改/etc/oracle/ocr.loc文件,将OCR修改为DATA,两个节点都需要修改。
查看备份情况,选择一个最近时间节点恢复
查看命令:ocrconfig –showbackup
[root@rac01 rac-cluster]# ocrconfig -restore /grid/crs_home/product/11.2.0/cdata/rac-cluster/week.ocr
[root@rac01 rac-cluster]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3088
Available space (kbytes) : 259032
ID : 471595559
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
创建VOTEDISK
在创建时出现以下问题,解决办法如下:
[root@rac01 rac-cluster]# crsctl replace votedisk +DATA
CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.
Failure 27 with Cluster Synchronization Services while deleting voting disk.
Failed to replace voting disk group with +DATA.
CRS-4000: Command Replace failed, or completed with errors.
设置ASM磁盘搜索路径
SQL> show parameter asm_diskstring
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string
SQL> alter system set asm_diskstring = '/dev/asm*';
System altered.
SQL> create spfile='+DATA' from memory;
File created.
SQL> startup force mount;
再次创建VOTEDISK
[root@rac01 rac-cluster]# crsctl replace votedisk +DATA
Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.
Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.
Successfully replaced voting disk group with +DATA.
CRS-4266: Voting file(s) successfully replaced
停止集群服务,再次启动