使用Nagios打造专业的业务状态监控 (2)

4.创建nagios web访问的账号密码

# vi /usr/local/bin/htpasswd.pl #!/usr/bin/perl use strict; if ( @ARGV != 2 ){ print "usage: /usr/local/bin/htpasswd.pl <username> <password>\n"; } else { print $ARGV[0].":".crypt($ARGV[1],$ARGV[1])."\n"; } # chmod +x /usr/local/bin/htpasswd.pl #利用perl脚本生成账号密码到htpasswd.users文件中 # /usr/local/bin/htpasswd.pl nagiosadmin nagios@ops-coffee > /usr/local/nagios/htpasswd.users

nagios默认开启了账号认证,认证相关的配置在这个文件里/usr/local/nagios/etc/cgi.cfg

如果安装了httpd服务,可以直接接触htpasswd命令生成密码,这里我们没有httpd服务,所以写个perl脚本来生成密码

5.nginx添加server配置,让浏览器可以访问

server { listen 80; server_name ngs.domain.com; access_log /var/log/nginx/nagios.access.log; error_log /var/log/nginx/nagios.error.log; auth_basic "Private"; auth_basic_user_file /usr/local/nagios/htpasswd.users; root /usr/local/nagios/share; index index.php index.html; location / { try_files $uri $uri/ index.php /nagios; } location /nagios { alias /usr/local/nagios/share; } location ~ \.php$ { include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; } location ~ ^/nagios/(.*\.php)$ { alias /usr/local/nagios/share/$1; include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/php5-fpm.sock; } location ~ \.cgi$ { root /usr/local/nagios/sbin/; rewrite ^/nagios/cgi-bin/(.*)\.cgi /$1.cgi break; fastcgi_param AUTH_USER $remote_user; fastcgi_param REMOTE_USER $remote_user; include /etc/nginx/fastcgi_params; fastcgi_pass unix:/var/run/fcgiwrap.socket; } }

6.检查配置文件并启动

#检查配置文件是否有语法错误 # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #启动nagios服务 # /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg #启动fcgiwrap和php5-fpm服务 # service fcgiwrap restart # service php5-fpm restart

7.浏览器访问服务器IP或域名就可以看到nagios的页面了,默认有本机的监控数据,不需要的话可以在配置文件localhost.cfg中删除

使用Nagios打造专业的业务状态监控

Nagios配置

Nagios的主配置文件路径为/usr/local/nagios/etc/nagios.cfg,里边默认已经配置了一些配置文件的路径,cfg_file=后边配置的都是配置文件,nagios程序会来这里读取配置,我们可以新添加一个专门用来监控HTTP API的配置文件

cfg_file=http://www.likecs.com/usr/local/nagios/etc/objects/check_api.cfg check_api.cfg里边的内容如下: define service{ use generic-service host_name localhost service_description web_project_01 check_command check_http!ops-coffee.cn -S } define service{ use generic-service host_name localhost service_description web_project_02 check_command check_http!ops-coffee.cn -S -u / -e 200 } define service{ use generic-service host_name localhost service_description web_project_03 check_command check_http!ops-coffee.cn -S -u /action/health -k "sign:e5dhn" }

define service:定义一个服务,每一个页面或api属于一个服务

use:定义服务使用的模板,模板配置文件在/usr/local/nagios/etc/objects/templates.cfg

host_name:定义服务所属的主机,我们这里区别主机意义不大,统一都属于localhost好了

service_description:定义服务描述,这个值会最终展示在web页面上的service字段,定义应简单有意义

check_command:定义服务检查使用的命令,命令的配置文件在/usr/local/nagios/etc/objects/commands.cfg

check_http检测https接口时可以使用-S参数,如果报错SSL is not available,那么你需要先安装libssl-dev包,然后重新编译(./configure --with-openssl=http://www.likecs.com/usr/bin/openssl)部署nagios-plugin插件添加对ssl的支持

check_command我们配置了check_http,需要修改commands.cfg文件中默认的check_http配置如下: define command { command_name check_http command_line $USER1$/check_http -H $ARG1$ }

define command:定义一个command

command_name:定义command的名字,在主机或服务的配置文件中可以引用

command_line:定义命令的路径和执行方式,这个check_http就是我们通过安装nagios-plugin生成的,位于/usr/local/nagios/libexec/下,check_http的详细用法可以通过check_http -h查看,支持比较广泛

use我们配置了generic-service,可以通过配置服务模板定义很多默认的配置如下: define service { name generic-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 2 ; Re-check the service up to 3 times in order to determine its final (hard) state check_interval 1 ; Check the service every 10 minutes under normal conditions retry_interval 1 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DON'T REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }

配置太多就不一一解释了,配合后边的英文注释应该看得懂,说几个重要的

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wpxzwd.html