初判应该是stream引起的。节点二也有相关stream的cdmp文件产生
Tue Oct 24 04:56:50 EAT 2017
Trace dumping is performing id=[cdmp_20171024015310]
Streams process APPLY_xxx altered (1) on instance 1
Tue Oct 24 05:11:34 EAT 2017
Trace dumping is performing id=[cdmp_20171024015317]
Streams process APPLY_xxx altered (1) on instance 1
Streams process APPLY_xxx altered (1) on instance 1
Tue Oct 24 05:26:18 EAT 2017
Trace dumping is performing id=[cdmp_20171024015321]
随即给厂家人员打了电话,让他们看看stream是否正常,反应状态异常。kill其进程p000、p001,立马就停止apply进程否则报错!!!
SQL> show user
USER is "STRMADMIN"
SQL> execute dbms_apply_adm.stop_apply (apply_name => 'apply_esshlj');
PL/SQL procedure successfully completed.
SQL> select apply_name,queue_name,status from dba_apply;
APPLY_NAME QUEUE_NAME STATUS
------------------------------ ------------------------------ --------
APPLY_ESSHLJ ESSHLJ_QUEUE DISABLED ###这会儿状态为DISABLE###启动后状态为ENABLE。
随即又清理了一下缓存,该在 strmadmin下执行
declare
options dbms_aqadm.aq$_purge_options_t;
begin
options.delivery_mode := dbms_aqadm.buffered;
dbms_aqadm.purge_queue_table('ESSHLJ_QUEUE_TABLE', null, options);
end;
/
报错了
ERROR at line 1:
ORA-03113: end-of-file on communication channel
1节点alert的trc文件一直产生,一秒钟可生成5个左右文件,一个2M。oracle目录只有20G。编写脚本自动删除trc文件。
1,尝试了flush buffer_cache;
2,尝试了flush share_pool;
因为该报错的只是节点一,随即打算将stream切换至节点二,观察是否报错依旧,在目标端中心库上面执行了如下命令;
exec DBMS_AQADM.ALTER_QUEUE_TABLE@esshlj(queue_table => 'ESSHLJ_QUEUE_TABLE', primary_instance => 2, secondary_instance => 1);
观察p000和p001进程在1节点已经不在了。2节点stream状态正常,ORA-600报错没了,但是ORA-7445报错依旧。因为报错中一直有小工进程q00这样的进程,暂时修改aq_tm_processes参数为0,看看报错还有没有。
SQL> !ps -ef |grep q00
oracle 8955 8883 1 15:22:05 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 8957 8955 0 15:22:05 pts/5 0:00 grep q00
oracle 27097 1 0 14:37:32 ? 0:00 ora_q006_esshlj1
oracle 8820 1 51 15:21:52 ? 0:00 ora_q003_esshlj1
oracle 8018 1 0 15:21:41 ? 0:00 ora_q008_esshlj1
oracle 6752 1 0 15:21:12 ? 0:00 ora_q007_esshlj1
oracle 7641 1 0 15:21:27 ? 0:00 ora_q004_esshlj1
oracle 8863 1 1 15:21:58 ? 0:00 ora_q000_esshlj1
oracle 8839 1 0 15:21:55 ? 0:00 ora_q009_esshlj1
oracle 8931 1 4 15:22:01 ? 0:00 ora_q001_esshlj1
oracle 8954 1 6 15:22:04 ? 0:00 ora_q002_esshlj1
SQL> alter system set aq_tm_processes = 0; #####修改参数为0#######
System altered.
SQL> !ps -ef |grep q00
oracle 11224 8883 0 15:23:08 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 11226 11224 0 15:23:08 pts/5 0:00 grep q00
小工进程已经没有,但是报错依旧~头疼,在改回去吧。
SQL> alter system set aq_tm_processes = 1;
System altered.
SQL> !ps -ef |grep q00
oracle 12198 8883 0 15:23:35 pts/5 0:00 /sbin/sh -c ps -ef |grep q00
oracle 12203 12198 0 15:23:35 pts/5 0:00 grep q00
oracle 12135 1 20 15:23:35 ? 0:00 ora_q000_esshlj1
oracle 12199 1 3 15:23:35 ? 0:00 ora_q002_esshlj1
oracle 12167 1 7 15:23:35 ? 0:00 ora_q001_esshlj1
有同事大胆猜想既然不在1节点跑,是否可以将1节点的aq_tm_processes参数设置为0, 将2节点的aq_tm_processes参数设置为1,并且指定SID。