再次在AIX 6.1 TL7上安装一套Oracle Database RAC 11.2.0.3,再次遇到N多问题,在此记录这些问题。
问题1:Grid OUI需要使用bash在用户之间的受信。
报错如下:
解决办法:
ln -s /usr/bin/ksh /bin/bash
mkdir -p /usr/local/bin
ln -s /usr/bin/ssh-keygen /usr/local/bin/ssh-keygen
参考文章:
问题2:配置的网络参数无法生效。
问题如下:
从上图可以看出,rfc1323、sb_max、tcp_sendspace三个网络参数是没有生效的,点击"more details"可以发现是私有网络对应的网卡参数没有生效。但是网络参数之前已经设置并生效:
root@mtdb2:/etc# no -a | grep rfc1323
rfc1323 = 1
root@mtdb2:/etc# no -a | grep sb_max
sb_max = 1310720
root@mtdb2:/etc# no -a | grep tcp_sendspace
tcp_sendspace = 65536
于是,通过smitty inet针对私有网卡ent7绑定ftc1323和tcp_sendspace两个参数:
但是sb_max是一个全局的设置,并不能针对网卡设置。点击sb_max告警对应的"more details":
从上图不难看出,Oracle要求的sb_max的值为4194304,当前的值是1310720,差别比较大,执行以下命令重新设置sb_max值:
/usr/sbin/no -p -o sb_max=4194304
生效之后再次执行Grid的安装,自检的结果如下:
从上面的自检结果可以看出,所以网络相关的参数都自检通过了。
"Swap Size"的值不一定按照Oracle的要求设置,这部分空间很少被使用,划分的10G足够了。当前系统版本是6.1 TL7,可以忽略OS Patch:IZ97457。
至于"Device Checks for ASM"的告警,Oracle要求共享磁盘设备,例如,/dev/rhdisk2,/dev/rhdisk3...要属于grid用户,asmadmin系统组,文件权限要是0660,按照要求修改,即可通过自检。
另外,前面还出现了NTP的失败,这是由于Oracle有了自己的CTSS时间同步服务,默认会要求取消NTP服务(重命名/etc/ntp.conf即可通过自检),如果要使用环境中的NTP服务,忽略该失败就行了。
与此同时,执行ifconfig -a命令可以看到网卡rfc1323和tcp_sendspace两个参数的情况:
root@mtdb2:/etc# ifconfig -a
en5: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 192.168.128.101 netmask 0xffffff00 broadcast 192.168.128.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
en7: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 192.168.3.2 netmask 0xffffff00 broadcast 192.168.3.255
tcp_sendspace 65536 tcp_recvspace 65536 rfc1323 1
en8: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
inet 10.0.12.15 netmask 0xffffff00 broadcast 10.0.12.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
从上面的结果可以看出,除了ent7手动绑定的生效了意外,其余的网卡这两个参数都未生效。忽略该问题继续安装,但该问题有待求解!
问题3:执行root.sh脚本报错。
在安装grid执行root.sh脚本的时候收到如下的报错:
root@mtdb1:/u01/app/11.2.0/grid# ./root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
Failed to write the checkpoint:'' with status:FAIL.Error code is 256
Undefined subroutine &crsconfig_lib::dieformat called at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6135.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
解决办法:
经过搜索发现这个错误是由于之前的工程师非常“好心”的帮我装上了IBM HACMP软件,但是Oracle Database 11gR2 Grid Infrastructure与HACMP不兼容导致的,处理步骤如下:
Step-1) cd /usr/sbin/cluster/utilities
mv cldomain cldomain_orig
Step-2) Remove "hagSUSEr" group using smit security command
Step-3) cd /var/ha/soc
rm -rf *clients*
Step-4) Modify rootpre.sh file by removing HACMP related part from this file and run rootpre.sh again.
Now we can re-install CRS/DB again.
在安装RAC的所有节点都需要完成以上的步骤,且直接rm -rf Grid的安装目录。IBM HACMP和Grid Infrastructure兼容性可能存在问题,所以在不使用HACMP的情况下不要安装该软件。
参考文章:
问题4:grid用户属性的修改。
解决了问题3之后,重新安装Grid软件,执行root.sh脚本收到如下报错:
root@mtdb1:/u01/app/11.2.0/grid# ./root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
User grid is missing the following capabilities required to run CSSD in realtime: CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
To add the required capabilities, please run:
/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE grid
CSS cannot be run in realtime mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 11423.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
原来grid用户同样需要修改CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE这3个属性。
解决办法:
在安装RAC的所有节点执行如下命令:
/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE grid