http监控一台Web应用服务器上多个Tomcat服务(3)

3 tomcat多端口监控报警

已经添加了tomcat930端口,现在再添加一个tomcat8300端口

3.1 客户端的nrpe.cfg里面添加配置

[root@webserver root]# vim /etc/nagios/nrpe.cfg

command[check_tomcat_8300_status]=/usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8300 -u /xx_xx_xx/index.html -e 200 -w 5 -c 10

[root@webserver root]# vim /etc/nagios/nrpe.cfg command[check_tomcat_8300_status]=/usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8300 -u /xx_xx_xx/index.html -e 200 -w 5 -c 10

3.2 nagios服务器端
添加command命令

[root@cache-2 etc]# vim ./objects/commands.cfg

define command{

command_name check_tomcat_8300_status

command_line $USER1$/check_http -I $HOSTADDRESS$ -p $PORT$ -u $URL$ -e $N200$ -w $Warning$ -c$Cri$

}

[root@cache-2 etc]# vim ./objects/commands.cfg define command{ command_name check_tomcat_8300_status command_line $USER1$/check_http -I $HOSTADDRESS$ -p $PORT$ -u $URL$ -e $N200$ -w $Warning$ -c$Cri$ }

添加service服务

define service{

host_name webserver

service_description Tomcat_8300_Status

check_command check_nrpe!check_tomcat_8300_status

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups opsweb

}

define service{ host_name webserver service_description Tomcat_8300_Status check_command check_nrpe!check_tomcat_8300_status max_check_attempts 5 normal_check_interval 3 retry_check_interval 2 check_period 24x7 notification_interval 10 notification_period 24x7 notification_options w,u,c,r contact_groups opsweb }

3.3 在nagios服务器上check下新添加的命令是否生效

[root@cache-2 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.xx.10 -c check_tomcat_8300_status

HTTP OK HTTP/1.1 200 OK - 611 bytes in 0.003 seconds |time=0.003152s;5.000000;10.000000;0.000000 size=611B;;;0

[root@cache-2 etc]#

[root@cache-2 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.xx.10 -c check_tomcat_8300_status HTTP OK HTTP/1.1 200 OK - 611 bytes in 0.003 seconds |time=0.003152s;5.000000;10.000000;0.000000 size=611B;;;0 [root@cache-2 etc]#

看到命令已经生效。

3.4 重启nagios服务器,查看结果

[root@cache-2 etc]# service nagios reload

Running configuration check...

Reloading nagios configuration...

done

[root@cache-2 etc]#

[root@cache-2 etc]# service nagios reload Running configuration check... Reloading nagios configuration... done [root@cache-2 etc]#

启后,过3分钟,新的tomcat8300已经监控起来了,如下图所示:

http监控一台Web应用服务器上多个Tomcat服务

为了验证tomcat的监控效果,在web服务器客户端,停掉tomcat的9300端口,一会就会收到报警email,也会在nagios页面看到红色报警提示,如下所示:

http监控一台Web应用服务器上多个Tomcat服务

这标示2个nagios选项监控的是2个端口,一个9300,一个8300;

4 添加新端口8200检测-e 200报错问题解决

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -e 200 -w 5 -c 10

HTTP CRITICAL - Invalid HTTP response received from host on port 8200

[root@webserver OCC_MANAGER_Web]#

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -e 200 -w 5 -c 10 HTTP CRITICAL - Invalid HTTP response received from host on port 8200 [root@webserver OCC_MANAGER_Web]#

4.1 直接访问tomcat服务以及indexhtml

:8200/OCC_REPORT_Web/index.html是可以访问的,但是会跳转到

?redirect=http%3A%2F%2F10.xx.xx.10%3A8200%2FOCC_REPORT_Web%2Findex.html的页面,证明web应用都是正常的,只是已经被跳转到别的域名页面而已。

4.2 –v详细分析

这个时候tomcat服务器是正常running的,而且web应用也是正常返回的,只是运行 看到这里大概意思是从8200端口获取无效的HTTP响应,因为这条命令最重要的是监控/OCC_REPORT_Web/index.html获取http信息并通过-e 200来判断http正常响应的OK状态,所以去掉报警的-w 5 –c 10参数,去掉-e 200的字符比对信息,看下check的返回信息。

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html

HTTP OK - HTTP/1.1 302 Found - 0.003 second response time |time=0.003367s;;;0.000000 size=317B;;;0

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html HTTP OK - HTTP/1.1 302 Found - 0.003 second response time |time=0.003367s;;;0.000000 size=317B;;;0

看到返回的是HTTP/1.1 302 Found 查看Tomcat错误代码知道是产生了新的URL信息

……

301 Moved Permanently 客户请求的文档在其他地方,新的URL在Location头中给出,浏览器应该自动地访问新的URL。
302 Found 类似于301,但新的URL应该被视为临时性的替代,而不是永久性的。注意,在HTTP1.0中对应的状态信息是“Moved Temporatily”。

……

最后加入-v参数调试看详细的获取信息:

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -H -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -v

GET /OCC_REPORT_Web/index.html HTTP/1.0

User-Agent: check_http/v1861 (nagios-plugins 1.4.11)

Connection: close

Host:

:8200/OCC_REPORT_Web/index.html is 323 characters

STATUS: HTTP/1.1 302 Found

**** HEADER ****

Server: Apache-Coyote/1.1

Set-Cookie: ploccSessionId=45CD9C9921A5B89C59FCB2E34FE52734; Path=/

Location: ?redirect=http%3A%2F%2F%2FOCC_REPORT_Web%2Findex.html

Content-Length: 0

Date: Thu, 12 Jun 2014 02:52:45 GMT

Connection: close

**** CONTENT ****

HTTP OK - HTTP/1.1 302 Found - 0.003 second response time |time=0.003268s;;;0.000000 size=323B;;;0

[root@webserver OCC_MANAGER_Web]# /usr/lib/nagios/plugins/check_http -H -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -v GET /OCC_REPORT_Web/index.html HTTP/1.0 User-Agent: check_http/v1861 (nagios-plugins 1.4.11) Connection: close Host: :8200/OCC_REPORT_Web/index.html is 323 characters STATUS: HTTP/1.1 302 Found **** HEADER **** Server: Apache-Coyote/1.1 Set-Cookie: ploccSessionId=45CD9C9921A5B89C59FCB2E34FE52734; Path=/ Location: ?redirect=http%3A%2F%2F%2FOCC_REPORT_Web%2Findex.html Content-Length: 0 Date: Thu, 12 Jun 2014 02:52:45 GMT Connection: close **** CONTENT **** HTTP OK - HTTP/1.1 302 Found - 0.003 second response time |time=0.003268s;;;0.000000 size=323B;;;0

看到页面重定向到域名系统,tomcat服务器是正常运行的,所以302 Found也可以表示tomca服务器正常运转无误,因为架构是用的lvs负载均衡,所以如果动用跳转后的公用域名来判断的话,就不能确定是否是这个主机的tomcat,因为��用域名每次只对应其中一个tomcat服务,因为这里是监控具体的一台web服务器的tomcat,所以去监控302端口也是一个不错的办法,这里可以去修改客户端nrpe.cfg里面的8200端口的监控命令,改成监控tomcat的302状态值:

Vim /etc/nagios/nrpe.cfg

/usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -e 302 -w 3 -c 10

Vim /etc/nagios/nrpe.cfg /usr/lib/nagios/plugins/check_http -I 10.xx.xx.10 -p 8200 -u /OCC_REPORT_Web/index.html -e 302 -w 3 -c 10


报错记录(一): NRPE: Unable to read output

[1402557345] SERVICE ALERT: webserver;Tomcat_6100_OCC_SSO_Service_Status;UNKNOWN;SOFT;3;NRPE: Unable to read output

解决:一般是nrpe路径不对。 

报错记录(二):CHECK_NRPE: Error - Could not complete SSL handshake.

[root@cache-2 etc]# /usr/local/nagios/libexec/check_http -I 10.xx.3.xx -p 8100 -u /tradeAdmin/index.html

HTTP OK: HTTP/1.1 302 Found - 319 bytes in 0.064 second response time |time=0.064033s;;;0.000000 size=319B;;;0

[root@cache-2 etc]#

[root@cache-2 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx -c check_load

CHECK_NRPE: Error - Could not complete SSL handshake.

[root@cache-2 etc]#

解决:/etc/nagios/nrpe.cfg里面没有添加nagios服务器主机ip地址

Vim /etc/nagios/nrpe.cfg

allowed_hosts=127.0.0.1,10.xx.xxx.xx1

之后重启nrpeservice nrpe restart;再去nagios服务器上验证OK:

[root@cache-2 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.xxx.3.xx -c check_load

OK - load average: 0.43, 0.17, 0.06|load1=0.430;15.000;30.000;0; load5=0.170;10.000;25.000;0; load15=0.060;5.000;20.000;0;

[root@cache-2 etc]#

Nagios 的详细介绍请点这里
Nagios 的下载地址请点这里

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/7194ea3df7c47f0c8f44de4f838c273b.html