(4)> f 12
pvthread+000C00 STACK:
[002A23F4].backt+000000 ()
[kdb_get_memory] no real storage @ FFFFFFFF4017EA0
(4)> f 14
pvthread+000E00 STACK:
Use current context [0187BE00] of cpu 0
[002B0794]slock+000444 (00000000000034E0, F100070F10000C00 [??])
[00009558].simple_lock+000058 ()
[0020D7C4]v_prelru_addlist+000060 (??, ??, ??, ??)
[002A25FC]begfst+0000A0 ()
____ Exception (F000000030017780) ____
iar : 00000000000093FC msr : 8000000000009032 cr : 82008042
lr : 00000000000BB908 ctr : 00000000000CD990 xer : 20000000
mq : 00000000 asr : 00000000FC65A001
r0 : 0000000082008042 r1 : 0FFFFFFFF4017BF0 r2 : 0000000001491C28
r3 : 00000000000093FC r4 : 8000000000001032 r5 : 0000000082008042
r6 : 0FFFFFFFF4017B40 r7 : 0000000000000000 r8 : 0000000000000000
r9 : 000000000101A9C0 r10 : 000000000000E01D r11 : 000000000101A9C0
r12 : 0000000000297EBC r13 : F10001002CB82C00 r14 : 00000000DEADBEEF
r15 : 00000000DEADBEEF r16 : 00000000DEADBEEF r17 : 0000000000000010
r18 : 000000000000FFFF r19 : 00000000000669D0 r20 : 0000000000000000
r21 : 00000000000003C0 r22 : 0000000003B90000 r23 : 000000000109E6C8
r24 : 000000000109E848 r25 : 0000000000000000 r26 : 0000000000000001
r27 : 0000000000000001 r28 : 0000000000000013 r29 : 0000000000000000
r30 : 0000000000000000 r31 : 000000000000000B
prev 0000000000000000 stackfix 0000000000000000 int_ticks 00
kjmpbuf 0000000000000000 excbranch 0000000000000000 no_pfault 00
intpri 0B backt 00 flags 00
fpscr 0000000000000000 fpscrx 00000000 fpowner 00
fpeu 00 fpinfo 00 alloc F000
o_iar 0000000000000000 o_toc 0000000000000000
o_arg1 0000000000000000 o_vaddr 0000000000000000
krlockp 0000000000000000
Except :
csr 0000000000000000 dsisr 0000000040000000 bit set: DSISR_PFT
esid 0000000019003400 dar F100010030C2C000 dsirr 0000000000000106
[000093FC].unlock_enable_mem+0000F0 ()
[000BB904]vm_lru_addlist_87_23+0000B4 (??, ??, ??, ??, ??, ??)
[000CF1D8]vm_psmd_flush_pending+00017C (??, ??, ??)
[000CFAA0]vm_psmd_promote+0001E8 (??, ??, ??, ??)
[000CFEB4]psmd_kthread+0000A4 (??)
[0013DAC0]threadentry+000014 (??, ??, ??, ??)
(4)> f 296
pvthread+012800 STACK:
Use current context [F00000002FF47600] of cpu 2
WARNING: bad IAR: 1001E3E0, display stack from LR: 1001F6CC
根据以上分析,P550宕机的时候一直在做页面交换和NBU备份任务,通过检查备份服务器,发现有一个备份策略正好是每周六晚上22:00进行。
3、检查内存设置:
p550a:/tmp/ibmsupt#vmo -a |grep perm
maxperm = 1562847
maxperm% = 80
minperm = 390711
minperm% = 20
strict_maxperm = 0
p550a:/tmp/ibmsupt#vmo -a |grep client
maxclient% = 80
strict_maxclient = 1
p550a:/tmp/ibmsupt#vmo -a |grep lru_file_repage
lru_file_repage = 1
vmstat 2 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 882318 24052 0 0 0 0 0 0 21 1730 124 0 0 99 0
0 0 882319 24051 0 0 0 0 0 0 16 1959 139 0 0 99 0
这是平时无备份操作的内存使用情况,而且交换页面使用较多,看来fre比较少,通过设置参数:
vmo -p -o minperm%=5
vmo -p -o maxclient%=20
vmo -p -o maxperm%=20
vmo -p -o lru_file_repage=0
再次检查内存:
vmstat 2 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 1 882332 767703 0 0 0 0 0 0 22 1537 456 0 0 99 1
1 0 882333 767698 0 0 0 0 0 0 12 1334 99 0 0 99 0
0 0 882333 767698 0 0 0 0 0 0 14 1474 126 0 0 99 0
内存很快就出来了。
经过此次处理后的一年多时间观察,再也没出现过宕机问题了。
IBM P55A主机宕机dump分析(3)
内容版权声明:除非注明,否则皆为本站原创文章。
转载注明出处:http://www.heiqu.com/4bfeb1727743264ce8b64998531b8463.html