一个朋友遇到ORA-04031问题。虽然这个错误是非常常见的,然而这里的Case 也有点让人为之震惊!
Tue Aug 26 11:51:13 2014
Errors in file /Oracle/app/oracle/diag/rdbms/xx/xx1/trace/xx1_p485_28873.trc (incident=1589637)
ORA-04031: 无法分配 32792 字节的共享内存 ("shared pool","unknown object","sga heap(2,0)","PX msg pool")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Aug 26 11:51:15 2014
。。。。。。。。
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Tue Aug 26 11:57:12 2014
PMON failed to acquire latch, see PMON dump
Tue Aug 26 11:57:15 2014
DIAG (ospid: 14345) has not called a wait for 49 secs.
Tue Aug 26 11:57:58 2014
Errors in file /oracle/app/oracle/diag/rdbms/xx/xx1/trace/xx1_qmnc_15686.trc (incident=1608420)
ORA-00445: background process "q00T" did not start after 120 seconds
Tue Aug 26 11:58:17 2014
。。。。。。。。
Tue Aug 26 12:07:31 2014
WARNING: inbound connection timed out (ORA-3136)
Tue Aug 26 12:07:31 2014
WARNING: inbound connection timed out (ORA-3136)
Tue Aug 26 12:07:46 2014
PMON failed to acquire latch, see PMON dump
上述信息为Rac 节点1的信息,对于节点2,信息也基本类似,这里就不贴了。从日志来看,是p621出异常,根据这一点
我们可以判断为是并行进程出现了异常,接着从下面的ORA-04031错误也可以证实这一点。
ORA-04031: 无法分配 32792 字节的共享内存 (“shared pool”,”unknown object”,”sga heap(2,0)”,”PX msg pool”)
从这个错误来看,可以判断P621进程无法分别32792 byte的内存了。这部分内存是需要为Px msg pool进程分配。
这里首先说说PX msg pool是干什么的?
Px msg pool其实是用于Parallel 操作的。通常情况下这部分内存区域应该是从large pool中分配。然而很多业务系统中
由于设置不到,导致parallel 操作仍然使用shared pool的memory。这一点,我们稍后讨论。
接着我们看后面的日子发现PMON进程都无法获得latch了,然后这个时候系统已经不行了。 很明显这个问题的根源就是前面
的ORA-04031错误,这里我们需要分析下trace,来确认下,为什么进程会报错。
该trace文件(xx1_p485_28873.trc)的内容如下:
Allocation Request Summary Informaton
=====================================
Current information setting: 04014fff
SGA Heap Dump Interval=3600 seconds
Dump Interval=300 seconds
Last Dump Time=08/26/2014 11:51:12
Dump Count=1
Allocation request for: PX msg pool
Heap: c00000012fe3ae08, size: 32792
******************************************************
HEAP DUMP heap desc=c00000012fe3ae08
extent sz=0xfe0 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=0000000000000000 owner=0000000000000000 nex=0000000000000000 xsz=0x10000000 heap=0000000000000000
fl2=0x20, nex=0000000000000000, dsxvers=1, dsxflg=0x0
dsx first ext=0x30000000
latch set 2 of 2
durations disabled for this heap
reserved granules for root 0 (granule size 268435456)
可以看到,PX msg pool请求获得32792 byte的内存大小,从什么地方申请呢? heap 2,也就是shared pool的第2个subpool。
但是却无法分配这么多,以至于报错,下面我们来看下subpool 2的实际情况:
Memory Utilization of Subpool 2
================================
Allocation Name Size
___________________________ ____________
"free memory " 1431685456 ++++整个subpool 2 还有1.4G free memory左右
"miscellaneous " 1680
"vem_user_actlog1 " 0
"SWEEPERR " 0
"File Space Usage " 112144
"trace_knlasg " 0
"ges big msg buffers " 56801208
"dsktab_kfgsg " 57128
"hot latch diagnostics " 0
"gcs res hash bucket " 4194304
"Wait event pointers " 0
"gcs res latch table " 0
"file state object " 4698096
"vips_package_size " 0
"SEQ S.O. " 1057920
。。。。
"gcs resources " 223317648
"gc lock state " 0
"kglsim alloc latch area " 0
"gcs affinity " 0
"gcs opaque in " 49248
"gcs pnp histo " 0
"ges resource " 605005184
"knlp free small rcv buf " 0
"SQLA " 280160
"PRTI " 0
"SQLP " 53608
"SQLK " 28336
"gcs dynamic r " 2556288
"work area tab " 236640
"plwpil:wa " 0
"kebm run-once actions " 0
"grptab_kfgsg " 0
"AW SGA " 0
"dbktb: trace buffer " 53673984
"v_swperrcount " 0
"ksunfy : SSO free list " 27834240
"ksxp IPC state object " 346448
"ksdhng: cache history " 0
"DFW_CONFIG_CAPTURE " 0
"enqueue resources " 862232
"kzekm heap descriptor " 0
"kelt translation table " 0
"IPS_PACKAGE_UNPACK_HISTOR" 0
"kglsim hash table bkts " 2097152
"KSXR global channels " 0
"ges enqueues " 344980144
"trace buf hdr xtend " 2080
"KSXR receive buffers " 1061888
"kglsim Java timestamps " 131072
"VM OSD context " 0
"KCB Table Scan Buffer " 4194816
"knlp receive buffers " 0
"PLS chunk " 0
"knlu_init_btree:init " 0
"KEWS aggregation objs " 0
"DDE_USER_ACTION_DEF " 0
"keswxNotify:tabPlans " 131072
"EM_USER_ACTIVITY " 0
"KSXR large reply queue " 168296
"event-class map " 0
"KSQ event description " 0
"KSXP CONN Stats buffer " 728432
"KSXP CTX Stats buffer " 578128
"KSXP PORT Stats buffer " 142768
"kglsim alloc latches " 0
"kglsim used list arr " 0
"kglsim free obj list " 0
"KTCN global operators " 0
"KESTB existence bitvec " 131072
"KFD extent enqueue obj " 42792
"ksdhng: el wtr cache " 401280
"ksb cic process list " 0
"gcs shadows " 130592800
"value block f " 43832
"ksdhng: blkers cache " 72960
"FileOpenBlock " 255011680
"qmtb_init_xxta " 0
"sort segment handle " 54936
"latchnum to latch map " 0
"LISTEN ADDRESS ENTRY " 0
"vips_pkg_file " 0
"PX msg pool " 47109312 +++++ 当前PX msg pool已经分配了47M左右
。。。。。。
"Checkpoint queue " 116765184
"sim segment hits " 72960
"sim lru segments " 72960
"pesom.c:Global " 0
"peshm.c:Global " 0
"paralltp txn r " 22476896
"ASH buffers " 119537664
"ASM KFFD SO " 0
"statement queuing " 0
"v_actinc " 0
"message pool freequeue " 523240
"ADR_CONTROL " 0
"vshowincb " 0
"vproblem2 " 0
"PMON blockers " 0
"procs: ksunfy " 6408000
"sga listelement " 0
"primem_kfmdsg " 0
"agent name " 0
"SYSTEM PARAMETERS " 232944
"Online xxtafile Move sess" 0
"sched job queue " 0
"vproblem " 0
"object queue hash buckets" 8404992
"object queue hash table d" 229824
"obj stats allocation chun" 1290240
"qmnsf_heap " 0
"sim cache nbufs " 0
"qmu subheap " 53976
"MTTR advisory context " 0
"LGWR-network Server info " 34032
"LGWR debug module memory " 0
"logout storm management " 0
"log_simultaneous_copies " 49248
"sys event stats " 552960
"parameter handle " 4273688
"Parameter Handle " 0
"name-service recovery " 80776
"channel handle " 48360
"parameter blocks " 22016
"qmrcInitSGA:qmrcSGA " 0
"PARAMETER ENTRY " 0
"ASM map operations " 30552
"Security Class " 33280
"GCR " 0
"post/wait queues " 55872
"bloom filter " 0
"Auto tune undo info " 0
"pesldT03_Instance " 33848
"cp pool array " 0
"cp cmon array " 0
"UNDO INFO " 0
"Log Archive Config Contex" 0
"state objects " 107440104
"ArchLog Dest Array " 22784
"parallel kcbibr " 0
"archive_lag_target " 0
"poisoned memory " 32768
"name-service request " 0
"latch recovery structures" 0
"latches for protecting fs" 0
"latches for protecting ls" 0
"Label Cache Heap descrptr" 0
"cp connection " 0
"sched job slv " 0
"bt_subentry " 0
"Saved job ids " 0
"object level stat table " 0
"pso tbs: ksunfy " 780000
"dummy " 0
"qm_init_sga:qmdpsg " 0
"analytic workspace " 0
"sim trace buf " 0
"qm_init_sga:oidctx " 0
"Sort Segment " 16852160
"PRESENTATION ENTRY " 0
"cp srv array " 0
"recov_kgqbtctx " 26512
"dbkea sga " 0
"pesom.c:latch array " 0
"Banner Storage " 0
"qmps connections " 65280
"quiescing session " 0
"qm_init_uga:qmsg " 0
"enqueue " 0
"buffer handles " 5472000
"process group array " 69776
"DML lock " 7066304
"property service SO " 0
"broker globals " 0
"sga node map " 0
"prmtzdini tz region " 801416
"VIEW " 0
"savepoints " 0
"Monitor Modification Hash" 0
"RESUMABLE NAME " 0
"obj stats hash table " 0
"max allowable # log files" 506400
"OS proc request holder " 293424
"message pool context area" 22400
"media recovery state obje" 0
"db_files " 1783968
"base-kglhdusr " 1218016
"constraints " 0
"ASM scan context " 0
"PX server msg stats " 0
"PX server deq stats " 0
"cinfo_kfnsg " 0
"cco_subnode " 0
"cco_pubnode " 0
"KQR X PO " 50414728
"KQR L PO " 5977688
"parameter table block " 31158248
"KQR M SO " 192808
"KQR L SO " 128104
"KQR S SO " 90928
"KSFV SGA " 0
"KSIR SGA " 0
"ASM rollback operations " 0
"kccsgehtt " 0
"kglsgivl2 " 28160
"plugin xxtafile array " 36016
"plwxx:PLW_STR_NEW_RVAL " 0
"plwspv:PLW_STR_NEW_VAL " 0
"lm heartbeat " 0
"KGSK sga " 0
"BRANCH TABLE SEGMENTED AR" 105288
"ksv slave " 0
"RVWR post requested array" 0
"parameter value memory " 0
"RTA Bounxxry Info thread " 71808
"RTA read control file ind" 0
"k2q_lock " 4606648
"ASM extent pointer array " 318197080
。。。。。。。
"kglsim object batch " 338733696
"ktlbk state objects " 5417280
。。。。。
"kglsim heap " 197796864
"ksz parent " 204960
"kgnfs authp " 16464
"event statistics per sess" 29840640
"event statistics ptr arra" 36480
"event list to post commit" 0
"KFG SO child " 0
"KGKP randnum " 40000
Library Cache Dump