如何在Nagios监控Tomcat,是一个比较简单又复杂的事情,简单是因为如果只监控web应用服务器的一个tomcat服务是否正常运行,那么比较简单;如果要监控tomcat的其他比如连接数比如jvm内存使用率等就比较复杂,google没有适合的监控脚本;如果要监控web应用上面的多个tomcat服务器,而且很多tomcat服务都是跳转式的,那就需要多做很多事情。
一般通常都使用tcp tomcat端口的方式,不过这有一个bug就是tomcat假死的情况下,tcp 端口是OK的,但是tomcat里面部署的web应用其实已经不能正常访问,这个时候需要使用http方式来监控tomcat的状态。
所以本文就记录了如何采用http方式来监控一台web服务器上多个tomcat应用服务器。
1在tomcat web服务器上安装nrpe客户端:
Rpm包下载地址为:
具体下载目录在 /2014年资料/6月/17日/Nagios通过check_http监控一台Web应用服务器上多个Tomcat服务
------------------------------------分割线------------------------------------
在RHEL5.3上配置基本的Nagios系统(使用Nagios-3.1.2)
CentOS 5.5+Nginx+Nagios监控端和被控端安装配置指南
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用
1.1,rpm方式安装nrpe客户端
[root@localhost nagios]# ll
总计 768
-rw-r--r-- 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm
-rw-r--r-- 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm
-rw-r--r-- 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm
[root@localhost nagios]# rpm -ivh *.rpm --nodeps --force
Preparing... ########################################### [100%]
1:nagios-plugins ########################################### [ 33%]
id: nagios:无此用户
2:nrpe ########################################### [ 67%]
3:nrpe-plugin ########################################### [100%]
[root@cache-1 ~]#
[root@localhost nagios]# ll 总计 768 -rw-r--r-- 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm -rw-r--r-- 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm -rw-r--r-- 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm [root@localhost nagios]# rpm -ivh *.rpm --nodeps --force Preparing... ########################################### [100%] 1:nagios-plugins ########################################### [ 33%] id: nagios:无此用户 2:nrpe ########################################### [ 67%] 3:nrpe-plugin ########################################### [100%] [root@cache-1 ~]#1.2 在配置文件最末尾,添加配置信息以及监控主机服务器ip地址
[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg
# addby tim on 2014-06-11
command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800
command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.xx.10 -w 3000.0,80% -c 5000.0,100% -p 5
allowed_hosts = 127.0.0.1,10.xx.xxx.xx1
[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg # add by tim on 2014-06-11 command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80 command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800 command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.xx.10 -w 3000.0,80% -c 5000.0,100% -p 5 allowed_hosts = 127.0.0.1,10.xx.xxx.xx1check下命令是否生效:
[root@webserver nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15
USERS OK - 2 users currently logged in |users=2;8;15;0
[root@webserver nrpe-2.15]#
[root@webserver nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15 USERS OK - 2 users currently logged in |users=2;8;15;0 [root@webserver nrpe-2.15]#看到已经USERS OK -….命令已经生效。
1.3 启动nrpe报错如下:
[root@webserver ~]# service nrpe restart
Shutting down nrpe: [失败]
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@webserver ~]#
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@db-m2-slave-1 nagios_client]#
[root@webserver ~]# service nrpe restart Shutting down nrpe: [失败] Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory [失败] [root@webserver ~]# [root@db-m2-slave-1 nagios_client]# service nrpe start Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory [失败] [root@db-m2-slave-1 nagios_client]#建立软连接
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6
(如果没有libssl.so,就采用别的libssl.so.10来做软连接,ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)
[root@db-m2-slave-1 nagios_client]#
再重新启动如下:
[root@webserver nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
[失败]
[root@web-10 ~]# ll /usr/lib64/libcrypto.so
lrwxrwxrwx. 1 root root 18 10月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0
[root@webserver nagios_client]#
[root@webserver nagios_client]# service nrpe start Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory [失败] [root@web-10 ~]# ll /usr/lib64/libcrypto.so lrwxrwxrwx. 1 root root 18 10月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0 [root@webserver nagios_client]#再建软链接:
[root@webserver nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6
(或者如果没有libcrypto.so,就采用libcrypto.so.10做软连接, ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)
[root@webserver nagios_client]# service nrpe start
Starting nrpe: [确定]
[root@webserver nagios_client]#
[root@webserver nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6 (或者如果没有libcrypto.so,就采用libcrypto.so.10做软连接, ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6) [root@webserver nagios_client]# service nrpe start Starting nrpe: [确定] [root@webserver nagios_client]#1.4 检测下nrpe是否正常运行:
去nagios服务器端check下
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.xx.10
NRPE v2.12
[root@cache-2 ~]#
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.xx.10 NRPE v2.12 [root@cache-2 ~]#看到返回NRPE v2.15表示已经连接成功。
1.5 在web应用下添加检测jsp文件
(1) 建立测试文件
vim ./webapps/nagios_test_0611/nagios_test_0611.jsp
<%@ page language="java" contentType="text/html; charset=gb2312"
pageEncoding="gb2312"%>
<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<title>nagios test here</title>
</head>
<body>
<center>Now timeis: <%=new java.util.Date()%></center>
</body>
</html>
vim ./webapps/nagios_test_0611/nagios_test_0611.jsp <%@ page language="java" contentType="text/html; charset=gb2312" pageEncoding="gb2312"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=gb2312"> <title>nagios test here</title> </head> <body> <center>Now time is: <%=new java.util.Date()%></center> </body> </html>(2) 去check下check_http命令
[root@webserver~]# /usr/local/nagios/libexec/check_http -I 10.xx.xx.10 -p 8300 -u /nagios_test_0611/nagios_test_0611.jsp -e 200
HTTP CRITICAL - Invalid HTTP response received from host on port 8300: HTTP/1.1 404 Not Found
[root@webserver~]# /usr/local/nagios/libexec/check_http -I 10.xx.xx.10 -p 8300 -u /nagios_test_0611/nagios_test_0611.jsp -e 200 HTTP CRITICAL - Invalid HTTP response received from host on port 8300: HTTP/1.1 404 Not Found需要重启一下tomcat,使新添加的jsp生效能打开,执行如下stop start命令:
/usr/local/app/apache-tomcat-6.0.37_8300/bin/shutdown.sh
/usr/local/app/apache-tomcat-6.0.37_8300/bin/startup.sh
再执行check_http命令
[root@webserver~]# /usr/local/nagios/libexec/check_http -I 10.xx.xx.10 -p 8300 -u /nagios_test_0611/nagios_test_0611.jsp -e 200
HTTP OK: Status line output matched "200" - 571 bytes in 0.882 second response time |time=0.882479s;;;0.000000 size=571B;;;0
[root@ webserver ~]#
[root@webserver~]# /usr/local/nagios/libexec/check_http -I 10.xx.xx.10 -p 8300 -u /nagios_test_0611/nagios_test_0611.jsp -e 200 HTTP OK: Status line output matched "200" - 571 bytes in 0.882 second response time |time=0.882479s;;;0.000000 size=571B;;;0 [root@ webserver ~]#1.6查看NRPE的监控命令
[root@webserver nrpe-2.15]# cat /etc/nagios/nrpe.cfg |grep -v "^#"|grep -v "^$"
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
dont_blame_nrpe=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800
command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.xx.10 -w 3000.0,80% -c 5000.0,100% -p 5
allowed_hosts=127.0.0.1,10.xx.xxx.xx1
[root@webserver nrpe-2.15]#