一、问题描述:
2013年4月14日中午12点左右生产环境执行数据库版本升级期间根据需要停止XX1库和XX2库OGG 同步抽取进程时遇长事务,无法用正常命令停止,执行 forcestop 后重启进程报 OGG-00446 错误,无法启动。
错误如下:
2013-04-14 19:30:28 ERROR OGG-00446 Opening ASM file+FRA/bjschxsb/1_7125_796652962.dbf in DBLOGREADER mode: (308) ORA-00308: ca
nnot open archived log'+FRA/bjschxsb/1_7125_796652962.dbf'
ORA-17503: ksfdopn:2 Failed toopen file +FRA/bjschxsb/1_7125_796652962.dbf
ORA-15173: entry'1_7125_796652962.dbf' does not exist in directory 'bjschxsb'
Not able to establish initialposition for sequence 7125, rba 339291664.
2013-04-14 19:30:28 ERROR OGG-01668 PROCESS ABENDING.
二、问题原因:
XX1库和XX2库的 Extract 进程在执行 forcestop 停止前(瞬间)正在处理既未提交也未回滚的长时间运行事务,重新启动 Extract 进程后需要执行 Extract 进程恢复。
1、XX1库
停止XX1库Extract 进程时,正在处理的长事务为:
2013-04-1411:51:46 WARNING OGG-01027 Oracle GoldenGate Capture for Oracle,esb_cx7.prm: Long Running Transaction:XID 561.10.312
84,Items 0, Extract ESB_CX7, Redo Thread 1, SCN 3098.1568409621 (13307377092629),Redo Seq #7125, Redo RBA 339291664.
截止目前该事务在数据库中仍在进行:
SQL> select t.addr,t.XIDUSN,t.XIDSLOT,t.XIDSQN,t.START_DATEfrom gv$transaction t;
ADDR XIDUSN XIDSLOT XIDSQN START_DATE
---------------- ---------- ---------- ----------------------------
070000084724BB90 561 10 31284 09-APR-13
SQL> select t.PREV_SQL_ID,t.SQL_ID from gv$session t wheretaddr='070000084724BB90';
PREV_SQL_ID SQL_ID
------------- -------------
9m7787camwh4m
SQL> select sql_text from gv$sqltext t where t.SQL_ID = '9m7787camwh4m';
SQL_TEXT
----------------------------------------------------------------
begin :id := sys.dbms_transaction.local_transaction_id; end;
begin :id := sys.dbms_transaction.local_transaction_id; end;
begin :id := sys.dbms_transaction.local_transaction_id; end;
begin :id := sys.dbms_transaction.local_transaction_id; end;
该事务是由一台 IP 地址为 162.16.220.70 的机器,通过 PL/SQL dev 工具于 2013 年 4 月 9 日发起,使用数据库用户是HX_SJZ,该事务至今未提交也未回滚。需要注意的是 HX_SJZ 用户权限已于 2013 年
4 月 1日开始收回,该主机使用该用户从 3 月 28 日用该用户建立的 session 至今未断开。
selectt.SID,t.SERIAL#,t.SCHEMA#,t.SCHEMANAME,t.OSUSER,t.MACHINE,t.PORT,t.TERMINAL,t.PROGRAMfrom gv$session t where taddr='070000084724BB90';
SID SERIAL# SCHEMA# SCHEMANAMEOSUSER MACHINE PORT TERMINAL PROGRAM
---------- -------------------- ---------- ---------- -------------------- ---------- ----------------------------------------
3351 107 77 HX_SJZ css5 WORKGROUP\CSS-PC 53796CSS-PC plsqldev.exe
bjschxdbsb01:/u01/app/grid/diag/tnslsnr/bjschxdbsb01/listener/trace$catlistener.log | grep css5 | grep162.16.220.70 > /home/grid/listener.bak_20130414
qldev.exe)(HOST=CSS-PC)(USER=css5)))* (ADDRESS=(PROTOCOL=tcp)(HOST=162.16.220.70)(PORT=53313)) * establish *bjschxsb * 0
28-MAR-2013 23:26:06 *(CONNECT_DATA=(SERVICE_NAME=bjschxsb)(GLOBAL_NAME=bjschxsb)(CID=(PROGRAM=D:\Program?Files\PLSQL?Developer\plsqldev.exe)(HOST=CSS-PC)(USER=css5)))* (ADDRESS=(PROTOCOL=tcp)(HOST=162.16.220.70)(PORT=53796)) * establish *bjschxsb * 0
28-MAR-2013 23:34:51 *(CONNECT_DATA=(SERVICE_NAME=bjschxsb)(GLOBAL_NAME=bjschxsb)(CID=(PROGRAM=D:\Program?Files\PLSQL?Developer\plsqldev.exe)(HOST=CSS-PC)(USER=css5)))* (ADDRESS=(PROTOCOL=tcp)(HOST=162.16.220.70)(PORT=53920)) * establish *bjschxsb * 0
28-MAR-2013 23:36:37 *(CONNECT_DATA=(SERVICE_NAME=bjschxsb)(GLOBAL_NAME=bjschxsb)(CID=(PROGRAM=D:\Program?Files\PLSQL?Developer\plsqldev.exe)(HOST=CSS-PC)(USER=css5)))* (ADDRESS=(PROTOCOL=tcp)(HOST=162.16.220.70)(PORT=53944)) * establish *bjschxsb * 0
重启XX1库上的 Extract进程后,需要使用thread 1, seq 7125-7152 归档日志,而 seq 7125 归档正好已被删除,这就是报错的原因。
Recovery Checkpoint (position of oldestunprocessed transaction in the data source):
Thread #: 1
Sequence #: 7125
RBA: 339291664
Timestamp: 2013-04-09 15:18:13.000000
SCN: 3098.1568409621 (13307377092629)
Redo File: Not Available
Current Checkpoint (position of last recordread in the data source):
Thread #: 1
Sequence #: 7152
RBA: 253142788
Timestamp: 2013-04-14 11:59:29.000000
SCN: 3099.705644312 (13310809294616)
Redo File: +DATA/bjschxsb/onlinelog/redo01b