12170: TNS:Connect timeout occurred ORA

Oracle job 无法自动运行 报以下大量的错
 
Errors in file /oracle/admin/orcl/bdump/orcl_j000_2158798.trc:
ORA-12012: error on auto execute of job 482
ORA-12008: error in materialized view refresh path
ORA-12170: TNS:Connect timeout occurred
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 1883
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2089
ORA-06512: at "SYS.DBMS_IREFRESH", line 683
ORA-06512: at "SYS.DBMS_REFRESH", line 195
ORA-06512: at line 1
Sat Apr 26 21:57:53 2014
Errors in file /oracle/admin/orcl/bdump/orcl_j000_2158798.trc:
ORA-12012: error on auto execute of job 425
ORA-12170: TNS:Connect timeout occurred
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 1883
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2089
ORA-06512: at "SYS.DBMS_IREFRESH", line 683
ORA-06512: at "SYS.DBMS_REFRESH", line 195
 
 
主要环境介绍,两个数据库之间要刷新物化视图,平时都是正常的,某天开始主库的自动job执行失败,而且是持续的失败
 
查看原因主要是ORA-12170: TNS:Connect timeout occurred TNS超时所致,排查思路大概可以定在网络上
 
 
1.网络排查
 
在两台主机之间互相ping各自的IP地址和网关,看有无丢包情况
 
期初查看经查看无丢包
 
2.查看防火墙1521端口是否正常
 
网络工程师确认防火墙1521端口正常开放,telnet 1521也是通的
 
3.tnsping 各自的服务名
$tnsping orcl
TNS Ping Utility for 64-bit Aix: Version 10.2.0.4.0 - Production on 05-6月 -2014 11:47:24
Copyright (c) 1997, 2006, Oracle Corporation. All rights reserved.
Attempting to contact (DESCRIPTION= (ADDRESS_LIST = (ADDRESS=(PROTOCOL=TCP)(HOST =192.168.10.8)(PORT = 1521))) (CONNECT_DATA= (SERVICE_NAME=orcl)))
OK (0 msec)
OK (10 msec)
 
 
$tnsping yyzf
TNS Ping Utility for 64-bit Aix: Version 10.2.0.4.0 - Production on 05-6月 -2014 11:48:24
Copyright (c) 1997, 2006, Oracle Corporation. All rights reserved.
Attempting to contact (DESCRIPTION= (ADDRESS_LIST = (ADDRESS=(PROTOCOL=TCP)(HOST =192.168.11.9)(PORT = 1521))) (CONNECT_DATA= (SERVICE_NAME=yyzf)))
OK (0 msec)
OK (10 msec)
 
tnsping服务名也是OK的
 
4.到此为止所有网络看似都正常,网络似乎应该不存在问题,但job还是一直是报错,由于两台主机不再同一个网段,ping任务在持续,忽然在某几个时刻有丢包现象,telnet 1521端口也存在某几次不通的现象这么看来网络还是有问题,最后跟客户确认说是有人重新规划过网络,于是查看主机ip和路由
 
 
Command: OK stdout: yes stderr: no
 
Before command completion, additional instructions may appear below.
 
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
 
Route tree for Protocol Family 2 (Internet):
default 192.168.10.1 UG 60 442255197 en0 - - =>
default 192.168.10.254 UG 40 432853407 en0 - -
10.0.0.0 10.0.0.1 UHSb 0 0 en1 - - =>
10/27 10.0.0.1 U 1 11339108 en1 - -
10.0.0.1 127.0.0.1 UGHS 0 1840821 lo0 - -
10.0.0.31 10.0.0.1 UHSb 0 10 en1 - -
10.0.1.0 10.0.1.1 UHSb 0 0 en0 - - =>
10.0.1/27 10.0.1.1 U 2 12382572 en0 - -
10.0.1.1 127.0.0.1 UGHS 2 3268795 lo0 - -
10.0.1.31 10.0.1.1 UHSb 0 10 en0 - -
127/8 127.0.0.1 U 31 6493884 lo0 - -
192.168.10.0 192.168.10.9 UHSb 0 0 en0 - - =>
192.168.10.0 192.168.10.10 UHSb 0 0 en1 - - =>
192.168.10/27 192.168.10.9 U 0 193565 en0 - - =>
192.168.10/27 192.168.10.10 U 1 209927 en1 - -
192.168.10.9 127.0.0.1 UGHS 23 13738093 lo0 - -
192.168.10.10 127.0.0.1 UGHS 0 151330 lo0 - -
192.168.10.31 192.168.10.9 UHSb 0 0 en0 - - =>
192.168.10.31 192.168.10.10 UHSb 0 0 en1 - -
 
Route tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 0 lo0 - -
 
 
这样看来,主机的路由还是存在问题的,主机存在两条默认路由
 
在AIX中如果有默认网关设置多个的话,会存在时断时续的丢包现象,于是进入AIX的路由表查看,果然存在两条默认路由,192.168.10.1和192.168.10.254,同一个网卡存在这样两条路由肯定是有问题的,两条路由的use使用都很高,与网络工程师确认后.254的路由是无效的,存在的原因可能是在之前修改路由的时候忘记***旧的路由导致
 
 
 
于是***废弃的路由
 
>#route delete -if en0 default 192.168.10.254
192.168.10.254 net default: gateway 192.168.10.254
 
***路由后,观察一小时,所有job均执行正常,没有再出现TNS超时现象
 
看来,修改生产系统的任何参数都要按照严格的标准和检查流程才行,稳定重于一切

--------------------------------------分割线 --------------------------------------

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/99e40d455aaa165ea0b185e8b83e58fc.html