Oracle 11g RAC CRS磁盘丢失后恢复

为了方便相关问题测试,我在本机搭建了一套RAC环境,但昨天打开后却发现RAC无法启动了,不错,就当一次实战演练了。   
测试环境:RedHat6.3_x64+ Oracle11gr2 RAC 

二、处理过程:
    在启动虚拟机一段时间后,通过命令查看,信息如下:

[grid@rac01 ~]$ crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.
 [grid@rac01 ~]$ crsctl status res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.
 

查看CRS服务状态

[root@rac01 rac-cluster]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager
 

启动Cluster资源

[root@rac01 bin]#crsctl start cluster

CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'

CRS-4000: Command Start failed, or completed with errors.
 

相关相关日志,获取到如下信息,并未在其他日志中找到更有效的参考信息,如果有好的建议,请联系在下:

---alter.log

[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

---ocssd.log
    2015-06-12 03:07:14.722: [    CLSF][2402883328]Allocated CLSF context

2015-06-12 03:07:14.723: [  SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:

2015-06-12 03:07:14.723: [    CSSD][2402883328]clssnmlalloccx:phyname rac01

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01

2015-06-12 03:07:14.747: [  SKGFD][2381424384]NOTE: No asm libraries found in the system

2015-06-12 03:07:14.747: [    CLSF][2381424384]Allocated CLSF context

2015-06-12 03:07:14.748: [  SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:

2015-06-12 03:07:14.748: [  SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:

2015-06-12 03:07:15.749: [  SKGFD][2381424384]NOTE: No asm libraries found in the system
 

查看CSS信息

[grid@rac01 ~]$ crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

1. ONLINE  aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
 

下面我通过ASM实例查看相关ASM磁盘信息:

SQL> select NAME , STATE FROM V$ASM_DISKGROUP; 

NAME                          STATE

------------------------------ -----------

DATA                          DISMOUNTED

CRS                            DISMOUNTED
 

OK,尝试MOUNT磁盘组(后续,整理是发现奇怪问题,既然前边我们查看css信息时 磁盘是online,那么这我们却无法mount,并未尝试强制mount,有待进一步研究)

SQL> alter diskgroup crs mount;

alter diskgroup crs mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "1" is missing from group number "1"
 

尝试MOUNT DATA磁盘组

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> select NAME , STATE FROM V$ASM_DISKGROUP; 

NAME                          STATE

------------------------------ -----------

DATA                          MOUNTED

CRS                            DISMOUNTED
 

注:现在写下当时处理问题的过程,并未过多深入研究问题,在整理文档时有了更多思考,暂且不讨论。
  既然磁盘组DATA可以用,那么我们先将CRS等信息存储到DATA磁盘组中,之前并未手动备份过CRS等信息,只能通过自动备份信息恢复。
  停止CRS服务,两个节点都执行

[root@rac01 rac-cluster]# crsctl stop has -f
 

再次启动,以NOCRS方式启动CRS,节点1执行

[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'

CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'

CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'

CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.asm' on 'rac01' succeeded
 

修改/etc/oracle/ocr.loc文件,将OCR修改为DATA,两个节点都需要修改。
查看备份情况,选择一个最近时间节点恢复

查看命令:ocrconfig –showbackup
 [root@rac01 rac-cluster]# ocrconfig -restore /grid/crs_home/product/11.2.0/cdata/rac-cluster/week.ocr

[root@rac01 rac-cluster]# ocrcheck

Status of Oracle Cluster Registry is as follows :

Version                  :          3

Total space (kbytes)    :    262120

Used space (kbytes)      :      3088

Available space (kbytes) :    259032

ID                      :  471595559

Device/File Name        :      +DATA

Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded
 

创建VOTEDISK

在创建时出现以下问题,解决办法如下:

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.

Failure 27 with Cluster Synchronization Services while deleting voting disk.

Failed to replace voting disk group with +DATA.

CRS-4000: Command Replace failed, or completed with errors.
 

设置ASM磁盘搜索路径

SQL> show parameter asm_diskstring

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

asm_diskstring                      string

SQL> alter system set asm_diskstring = '/dev/asm*';

System altered.

SQL> create spfile='+DATA' from memory;

File created.

SQL> startup force mount;
 

再次创建VOTEDISK

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.

Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.

Successfully replaced voting disk group with +DATA.

CRS-4266: Voting file(s) successfully replaced
 

停止集群服务,再次启动

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/c1f82132f266be33c5b349bf23af3d03.html