INST_ID OPERA STAT POWER SOFAR EST_WORK EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
1 REBAL RUN 10 92407 97874 8716 0
这个时候我们在ASM的alert日志中观察到:
SQL> alter diskgroup testdg add disk '/dev/raw/raw7' rebalance power 2
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (5,0) to disk (/dev/raw/raw7)
NOTE: requesting all-instance membership refresh for group=5
NOTE: initializing header on grp 5 disk TESTDG_0000
NOTE: requesting all-instance disk validation for group=5
Tue Jan 10 16:07:12 2017
NOTE: skipping rediscovery for group 5/0x97f863e8 (TESTDG) on local instance.
NOTE: requesting all-instance disk validation for group=5
NOTE: skipping rediscovery for group 5/0x97f863e8 (TESTDG) on local instance.
Tue Jan 10 16:07:12 2017
GMON updating for reconfiguration, group 5 at 230 for pid 42, osid 6197
NOTE: group 5 PST updated.
NOTE: initiating PST update: grp = 5
GMON updating group 5 at 231 for pid 42, osid 6197
NOTE: PST update grp = 5 completed successfully
NOTE: membership refresh pending for group 5/0x97f863e8 (TESTDG)
GMON querying group 5 at 232 for pid 18, osid 5012
NOTE: cache opening disk 0 of grp 5: TESTDG_0000 path:/dev/raw/raw7
GMON querying group 5 at 233 for pid 18, osid 5012
SUCCESS: refreshed membership for 5/0x97f863e8 (TESTDG)
NOTE: starting rebalance of group 5/0x97f863e8 (TESTDG) at power 1
SUCCESS: alter diskgroup testdg add disk '/dev/raw/raw7'
Starting background process ARB0
Tue Jan 10 16:07:14 2017
ARB0 started with pid=27, OS id=982
NOTE: assigning ARB0 to group 5/0x97f863e8 (TESTDG) with 1 parallel I/O
cellip.ora not found.
Tue Jan 10 16:07:23 2017
NOTE: Attempting voting file refresh on diskgroup TESTDG
上面的输出意味着ASM已经完成了rebalance的第二个阶段,开始了第三个阶段compacting,如果我说的没错,通过pstack工具可以看到kfdCompact()函数,下面的输出显示,确实如此:
# pstack 982
#0 0x0000003957ccb6ef in poll () from /lib64/libc.so.6
...
#9 0x0000000003d711e0 in kfk_reap_oss_async_io ()
#10 0x0000000003d70c17 in kfk_reap_ios_from_subsys ()
#11 0x0000000000aea50e in kfk_reap_ios ()
#12 0x0000000003d702ae in kfk_io1 ()
#13 0x0000000003d6fe54 in kfkRequest ()
#14 0x0000000003d76540 in kfk_transitIO ()
#15 0x0000000003cd482b in kffRelocateWait ()
#16 0x0000000003cfa190 in kffRelocate ()
#17 0x0000000003c7ba16 in kfdaExecute ()
#18 0x0000000003c4b737 in kfdCompact ()
#19 0x0000000003c4c6d0 in kfdExecute ()
#20 0x0000000003d4bf0e in kfgbRebalExecute ()
#21 0x0000000003d39627 in kfgbDriver ()
#22 0x00000000020e8d23 in ksbabs ()
#23 0x0000000003d4faae in kfgbRun ()
#24 0x00000000020ed95d in ksbrdp ()
#25 0x0000000002322343 in opirip ()
#26 0x0000000001618571 in opidrv ()
#27 0x0000000001c13be7 in sou2o ()
#28 0x000000000083ceba in opimai_real ()
#29 0x0000000001c19b58 in ssthrdmain ()
#30 0x000000000083cda1 in main ()
通过tail命令查看ARB0的跟踪文件,发现relocating正在进行,而且一次只对一个条目进行relocating。(这是正进行到compacting阶段的另一个重要线索):
$ tail -f +ASM1_arb0_25416.trc
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
...
compacting过程中,V$ASM_OPERATION视图的EST_MINUTES字段会显示为0(也是一个重要线索):
16:08:56 SQL> /
INST_ID OPERA STAT POWER SOFAR EST_WORK EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
2 REBAL RUN 10 98271 98305 7919 0
固态表X$KFGMG的REBALST_KFGMG字段会显示为2,代表正在compacting。
16:09:12 SQL> select NUMBER_KFGMG, OP_KFGMG, ACTUAL_KFGMG, REBALST_KFGMG from X$KFGMG;
NUMBER_KFGMG OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
1 1 10 2
一旦compacting阶段完成,ASM的alert 日志中会显示stopping process ARB0 和rebalance completed:
Tue Jan 10 16:10:19 2017
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 5/0x97f863e8 (TESTDG)
一旦extents relocation完成,所有的数据就已经满足了冗余度的要求,不再会担心已经失败磁盘的partern磁盘再次失败而出现严重故障。