Nagios监控Nginx服务详细过程(2)

3 编写脚本来监控nginx服务

3.1 调试详细经过

[root@lb-net-2 run]# find / -name nginx.pid

/usr/local/nginx/logs/nginx.pid

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: : integer expression expected

/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

去查看262行,将逻辑运算符 "-a" 改成 "&&"

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]'

/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

[root@lb-net-2 run]#

看到已经OK了,再修改文件。

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]#

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]'

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

[root@lb-net-2 run]#

将[]改成使用"[[]]", 即可!

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]#

[root@lb-net-2 run]#

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

[root@lb-net-2 run]#

注释掉#reqpcon=`echo "scale=2; $reqpsec / $conpsec" | bc -l`之后,就不会报(standard_in) 1: syntax error错误,如下所示:

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

expr: 参数数目错误

expr: 语法错误

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

[root@lb-net-2 run]#

注释掉# reqpsec=`expr $tmp2_reqpsec - $tmp1_reqpsec` 就不会再报 expr: 参数数目错误,如下所示:

报错:

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

expr: 语法错误

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

再次注释掉 #reqpcon=`echo "scale=2; $reqpsec / $conpsec" | bc -l` 后,运行不会报expr: 语法错误,如下所示:

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

OK - nginx is running. requests per second, connections per second ( requests per connection) | 'reqpsec'= 'conpsec'= 'conpreq'= ]

[root@lb-net-2 run]#

看到这里发现'reqpsec'= 'conpsec'= 'conpreq'=都没有值,但是nginx又是在启动运行着,问题出在哪里?经过排查,原来是nginx_status服务没有启动,需要在/usr/local/nginx/conf/nginx.conf配置文件里面添加如下配置:

# 添加pid参数

pid logs/nginx.pid;

#charset koi8-r;

access_log logs/host.access.log main;

location /nginx_status {

stub_status on;

access_log off;

deny all;

}

然后重新加载nginx,看到新的nginx-status文件是生成了,但是文件内容为空,如下所示:

[root@lb-net-2 logs]# ll /tmp/nginx*

-rw-r--r--. 1 root root 0 7 3 15:06 /tmp/nginx-status.1

[root@lb-net-2 logs]#

去查看ngins后台日志

[root@lb-net-2 logs]# cd /usr/local/nginx/

[root@lb-net-2 logs]# tail -n 300 error.log

……

2014/07/03 15:05:47 [error] 4285#0: *1851293 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

2014/07/03 15:05:48 [error] 4285#0: *1851294 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

2014/07/03 15:06:12 [error] 4282#0: *1851362 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

2014/07/03 15:06:13 [error] 4282#0: *1851363 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

2014/07/03 15:06:55 [error] 4285#0: *1851509 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

2014/07/03 15:06:56 [error] 4285#0: *1851519 access forbidden by rule, client: 127.0.0.1, server: localhost, request: "GET /nginx_status HTTP/1.0", host: "localhost"

查看nginx编译参数

[root@lb-net-2 logs]# /usr/local/nginx/sbin/nginx -V

nginx version: nginx/1.4.2

built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)

configure arguments: --prefix=/usr/local/nginx --with-http_stub_status_module --with-http_realip_module

证明确实是加载了stub_status插件,之后去修改配置文件,注释掉deny all;重新加载nginx。

[root@lb-net-2 logs]# vim /usr/local/nginx/conf/nginx.conf

#deny all;

[root@lb-net-2 logs]# service nginx reload

reload nginx

[root@lb-net-2 logs]#

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问/tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]#

还是没有看到/tmp/nginx-status.1状态文件生成,因为nagios下监控nginx的脚本是从nginx-status.1获取数据,如果没有这个文件,没有办法获取数据。

继续google”nginx stub_status没有生成nginx-status.1”文件,看到有人说只要配置好了这个状态文件有没有无所谓,我就试着直接运行脚本看看能否生效。

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问/tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]# /root/check_nginx2.sh -H localhost -P 80 -p /usr/local/nginx/logs/ -n nginx.pid -s nginx_status -w 15000 -c 20000

OK - nginx is running. 1 requests per second, 2 connections per second (.50 requests per connection) | 'reqpsec'=1 'conpsec'=2 'conpreq'=.50 ]

[root@lb-net-2 logs]#

看到'reqpsec'=1 'conpsec'=2 'conpreq'=.50里面有数据了,再去check下文件有没有生成,如下所示:

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问/tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]#

还是没有文件生成,但是check已经有数据了,证明不一定要拘泥于是否在/tmp/目录下是否有nginx-status.1文件。通过脚本分析如下:

[root@lb-net-2 logs]# vim /usr/lib/nagios/plugins/check_nginxstatus

180 get_status() {

181 if [ "$secure" = 1 ]

182 then

183 wget_opts="-O- -q -t 3 -T 3 --no-check-certificate"

184 out1=`wget ${wget_opts} ${hostname}:${port}/${status_page}`

185 sleep 1

186 out2=`wget ${wget_opts} ${hostname}:${port}/${status_page}`

187 else

188 wget_opts="-O- -q -t 3 -T 3"

189 out1=`wget ${wget_opts} ${hostname}:${port}/${status_page}`

190 sleep 1

191 out2=`wget ${wget_opts} ${hostname}:${port}/${status_page}`

192 fi

193

194 if [ -z "$out1" -o -z "$out2" ]

195 then

196 echo "UNKNOWN - Local copy/copies of $status_page is empty."

197 exit $ST_UK

198 fi

199 }

是通过访问`wget -O- -q -t 3 -T 3 --no-check-certificate :80/nginx_status`这个链接来获取status的数据记录的,而不是去加载/tmp/nginx-status.1文件来获取数据的。直接访问 :80/nginx_status 地址就能获取nginx运行数据,如下图所示:

Nagios监控Nginx服务详细过程


nagios服务器上check下,报错:

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status

UNKNOWN - Local copy/copies of nginx_status is empty.

[root@cache-2 ~]#

检查监控脚本,搜索 ‘Local copy/copies of nginx_status is empty.’在第197行,有如下代码:

195 if [ -z "$out1" -o -z "$out2" ]

196 then

197 echo "UNKNOWN - Local copy/copies of $status_page is empty."

198 exit $ST_UK

199 fi

看出是由于if [ -z "$out1" -o -z "$out2" ]这个判断生效,导致监控脚本运行到这里就exit了。继续调试,发现用nagios服务器调用脚本的时候,执行到以下第190行到第192

out1=`/usr/bin/wget ${wget_opts} ${hostname}:${port}/${status_page}`

sleep 1

out2=`/usr/bin/wget ${wget_opts} ${hostname}:${port}/${status_page}`

的时候,out1为空,out2也为空,所以在后面的if [ -z "$out1" -o -z "$out2" ]判断通过报出信息为:UNKNOWN - Local copy/copies of $status_page is empty. 然后直接exit

说明:由于nginx是要调用wget命令来获取nginx_status状态的,而wget命令是只能以root用户来运行的, 所以需要将nagios用户设置成可以无需密码直接suroot,这样就能以nagios用户运行命令sudo /usr/lib/nagios/plugins/check_nginxstatus 。在CentOS系统中,无法直接调用sudo命令,需要修改/etc/sudoers, 找到 #Defaults requiretty 并取消注释,另外新增一行。表示nagios用户不需要登陆终端就可以调用命令,如下所示:

Defaults requiretty

Defaults:nagios !requiretty

#添加nagios 请求sudo,允许特定指令时(可跟参数),不需要密码(如)。

nagios ALL=(ALL) NOPASSWD: ALL

修改完后,再check,数据出来了:

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status

OK - nginx is running. 1 requests per second, 1 connections per second (1.00 requests per connection) | 'reqpsec'=1 'conpsec'=1 'conpreq'=1.00 ]

[root@cache-2 ~]#

CentOS 6.2实战部署Nginx+MySQL+PHP

使用Nginx搭建WEB服务器

搭建基于Linux6.3+Nginx1.2+PHP5+MySQL5.5的Web服务器全过程

CentOS 6.3下Nginx性能调优

CentOS 6.3下配置Nginx加载ngx_pagespeed模块

CentOS 6.4安装配置Nginx+Pcre+php-fpm

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/857766101985cfa31f7b7e2d5d24d0a2.html