00600: 内部错误代码, 参数: [kdsgrp1] 解决案例

日期：2020-06-02 栏目：程序人生浏览：次

一日，客户的数据库中表空间SYSAUX的AWR相关表出现了一个坏块。通过truncate表（数据不重要），从备份中restore数据文件，做完恢复之后坏块修复。

数据库OPEN之后，客户业务出现错误。检查告警日志，出现了:

Errors in file /u01/app/Oracle/diag/rdbms/test/test/trace/test_ora_51465.trc (incident=279339):

ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /u01/app/oracle/diag/rdbms/test/test/incident/incdir_279339/test_ora_51465_i279339.trc

从trace中看出，某个SQL引发了这个错误。这个错误主要指对应索引ROWID,在数据表中找不到记录,这表明出现了数据一致性问题。从trace文件中获得了引发错误的SQL，执行之，如下：

00600: 内部错误代码, 参数: [kdsgrp1] 解决案例

继续看trace文件, 可以定位到如下记录

*** 2019-03-30 22:00:06.323

*** SESSION ID:(1802.759) 2019-03-30 22:00:06.323

*** CLIENT ID:() 2019-03-30 22:00:06.323

*** SERVICE NAME:(ysnc) 2019-03-30 22:00:06.323

*** MODULE NAME:(sqlservr.exe) 2019-03-30 22:00:06.323

*** ACTION NAME:() 2019-03-30 22:00:06.323

* kdsgrp1-1: *************************************************

row 0x030b33a7.0 continuation at

0x030b33a7.0 file# 12 block# 734119 slot 0 not found

KDSTABN_GET: 0 ..... ntab: 0

curSlot: 0 ..... nrows: 0

kdsgrp - dump CR block dba=0x030b33a7

Block header dump: 0x030b33a7

Object id on Block? Y

seg/obj: 0x29761 csc: 0x00.53475f8c itc: 2 flg: E typ: 1 - DATA

brn: 0 bdba: 0x30b3300 ver: 0x01 opc: 0

inc: 0 exflg: 0

从中，得到了数据文件号12，以及数据块编号， 734119，可以使用SQL定位出错的对象

select owner,segment_name,segment_type from dba_extents where file_id= 12 and block_id<= 734119 and (block_id+blocks)>= 734119 ;

查询到了对象之后，开始尝试重建对象的索引。

重建时，出现了错误ORA-00600 13004。

只得通过drop索引，然后再create的办法建立。

索引重建完成后，此SQL再次执行，没有发生错误。

另外，此客户的数据库后来又发生了ORA-08103 Object no longer exists

查询表时，已经发生错误，这里就不可避免发生了数据丢失。

从MOS上查到了如下脚本抢救数据：

REM Create a new table based on the table that is producing errors with no rows:

create table <用户>.(表名>_20180331

select *

from <用户>.(表名>

where 1=2;

REM Create the table to keep track of ROWIDs pointing to affected rows:

create table <用户>.bad_rows (row_id rowid, oracle_error_code number);

set serveroutput on

DECLARE

TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;

CURSOR c1 IS select /*+ index_ffs(tab1 <索引名称>) parallel(tab1) */ rowid

from <用户>.(表名> tab1

where pk_flow is NOT NULL

order by rowid;

r RowIDTab;

rows NATURAL := 20000;

bad_rows number := 0 ;

errors number;

error_code number;

myrowid rowid;

BEGIN

OPEN c1;

LOOP

FETCH c1 BULK COLLECT INTO r LIMIT rows;

EXIT WHEN r.count=0;

BEGIN

FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS

insert into <用户>.(表名>_20180331

select /*+ ROWID(A) */ a.*

from <用户>.(表名> A where rowid = r(i);

EXCEPTION

when OTHERS then

BEGIN

errors := SQL%BULK_EXCEPTIONS.COUNT;

FOR err1 IN 1..errors LOOP

error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;

if error_code in (1410, 8103, 1578) then

myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);

bad_rows := bad_rows + 1;

insert into <用户>.bad_rows values(myrowid, error_code);

else

raise;

end if;

END LOOP;

END;

commit;

END LOOP;

commit;

CLOSE c1;

dbms_output.put_line('Total Bad Rows: '||bad_rows);

END;

万幸的是， 40多万条数据最终只丢失了6条，收到影响的单据有两个。业务通过补单据的方式挽回了数据。

Linux公社的RSS地址：https://www.linuxidc.com/rssFeed.aspx

转载注明出处：https://www.heiqu.com/dcc492dcadc4a91d6aa72ded1c4397a9.html

00600: 内部错误代码, 参数: [kdsgrp1] 解决案例

相关推荐