$ tar-zxvf nrpe-2.14.tar.gz
$ cdnrpe-2.14
$./configure
$make all
$make install-plugin && make install-daemon && makeinstall-daemon-config && make install-xinetd
5. 配置
(1)在/etc/xinetd.d/nrpe的only_from变量中增加<Horizonip | Nagios core ip>
(2)在/etc/services中增加
nrpe 5666/tcp # NRPE
(3) $service xinetd restart
(4)开启防火墙:
$sudo iptables -A INPUT -p tcp -m tcp --dport 5666 -j ACCEPT
$iptables-save
$ vim/etc/network/interface的网卡配置中增加pre-upiptables-restore < /etc/iptables.up.rules
(4)在本地检查配置是否正确:
$netstat -at | grep nrpe
>tcp 0 0 *:nrpe *:* LISTEN
$/usr/local/nagios/libexec/check_nrpe -H localhost
>NRPE v2.14
(5)在Horizon| Nagios core node上检查配置是否正确:
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_users
>USERS OK - 1 users currently logged in |users=1;5;10;0
4.4 Computer node
同Controllernode
5. 使用Nagios实现Openstack监控
服务启动方法:
NDOUtils:/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg
Nagioscore: /usr/local/nagios/bin/nagios -d/usr/local/nagios/etc/nagios.cfg
NRPE:service xinetd start
主要监控以下资源:
控制和计算节点硬件资源:
CPU,Mem,Disk,Network
2. 控制和计算服务
keystone,glance-api, glance-register, nova-api, nova-computer, nova-network,nova-scheduler, nova-volume, nova-objectstores, mysql, dnsmasq,rabbitmq, etc.
5.1控制和计算节点硬件资源
5.1.1 CPU
插件名称:check_cpu.sh
%28matejunkie%29/details
插件描述:基于/proc/stat周期性获取CPU的监控数据,并返回W|C
插件参数:
check_cpu.sh[-i/--interval] [-w/--warning] [-c/--critical]
Options:
--interval|-i)
Definesthe pause between the two times /proc/stat is being
parsed.Higher values could lead to more accurate result.
Defaultis: 1 second
--warning|-w)
Setsa warning level for CPU user. Default is: off
--critical|-c)
Setsa critical level for CPU user. Default is: off
Example:
【本地环境】
$/usr/local/nagios/libexec/check_cpu.sh -i 3 -w 60 -c 80
> OK- user: 0.83, nice: 0.50, sys: 0.83, iowait: 0.50, irq: 0.50,softirq: 0.50 idle: 99.83, cpu_usage=3 | 'user'=0.83 'nice'=0.50'sys'=0.83 'softirq'=0.50 'iowait'=0.50 'irq'=0.50 'idle'=99.83
其中cpu_usage是当前CPU的使用量。
PS:由于此plugin是shell脚本,具体逻辑可以定制化。
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh-i 3 -w 60 -c 80
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_cpu
> 同上
5.1.2 Mem
插件名称:check_mem.sh
插件描述:基于free查询mem的使用情况
插件参数:
check_mem.sh-w <warnlevel> -c <critlevel>
其中warn或者crit与(memuserd/memtotal)比较
Example:
【本地环境】
$/usr/local/nagios/check_mem.sh -w 4 -c 10
>Memory: WARNING Total: 2003 MB - Used: 166 MB - 8%used!|TOTAL=2101026816;;;; USED=173740032;;;; CACHE=856936448;;;;BUFFER=58998784;;;;
PS:由于此plugin是shell脚本,具体逻辑可以定制化。
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh-w 80 -c 90
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_mem
> 同上
5.1.3 Network
插件名称:check_net.pl
插件描述:具体不详
插件参数:具体不详无-h | --help
Example:
【本地环境】
$/usr/local/nagios/check_net.pl
> NETOK - (Rx/Tx) eth0=(65.1B/7.1B), lo=(5.6B/5.6B)|eth0_in=68215167c;eth0_out=7394459c; lo_in=5905765c; lo_out=5905765c;
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_net]=/usr/local/nagios/libexec/check_net.pl
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_net
> 同上
5.1.4 Disk & LVM
插件名称:check_diskstat.sh
插件描述:具体不详
插件参数:
Usage:
./check_diskstat.sh-d DEVICE -w tps,read,write -c tps,read,write | -h
-dDEVICE DEVICE must be without /dev (ex: -d sda)
-w/cTPS,READ,WRITE TPS means transfer per seconds (aka IO/s)
READ andWRITE are in sectors per seconds
Example:
【本地环境】
$ sudo/usr/local/nagios/check_diskstat.sh -d vda -w 200,100000,100000 -c300,200000,200000
>summary: 0 io/s, read 8 sectors (0kB/s), write 56 sectors (4kB/s) in6 seconds | tps=0io/s;;; read=682b/s;;; write=4778b/s;;;
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_diskstat]=/usr/local/nagios/libexec/check_diskstat.sh -d vda -w 200,100000,100000 -c 300,200000,200000
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ diskstat
> 同上
---------------------------------------------------------------------------------------------------------------------------
插件名称:check_disk
%25-used-space/details
插件描述:基于df命令编写,-d需要设置df打印出来的Mountedon
插件参数:
Thisplugin shows the % of used space of a mounted partition, using the'df' utility
./check_disk:
-c<integer> If the % of used space is above <integer>,returns CRITICAL state
-w<integer> If the % of used space is below CRITICAL and above<integer>, returns WARNING state
-d<device> The partition or mountpoint to be checked. eg./dev/sda1, /home, /
Example:
【本地环境】
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.9G 1.7G 7.8G 18% /
udev 998M 12K 998M 1% /dev
tmpfs 401M 224K 401M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1002M 0 1002M 0% /run/shm
/dev/vdb 20G 173M 19G 1% /mnt
$/usr/local/nagios/check_disk -d /mnt -c 80 -w 10
> OK- /mnt space used=1% | '/mnt usage'=1%;10;80;
【远程环境】
在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_disk]=/usr/local/nagios/libexec/check_disk-d /mnt -c 80 -w 10
$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ disk
> 同上
---------------------------------------------------------------------------------------------------------------------------
插件名称:check_lvm
插件描述:仅运行在存在vg的情况下
插件参数:
NOTE -This script only works on _mounted_ volumes!
Usage:./check_lvm -w -c
Description:
Thisplugin finds all LVM logical volumes, checks their used space, andcompares against the supplied thresholds.
Example:
5.2 控制和计算服务
插件名称:check_proc
插件描述:基于ps,可用于查看相关服务的进程是否存在。
插件参数:
check_procs-w <range> -c <range> [-m metric] [-s state] [-p ppid]
[-uuser] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
[-Ccommand] [-t timeout] [-v]
Options:
-h,--help
Printdetailed help screen
-V,--version
Printversion information
-w,--warning=RANGE
Generatewarning state if metric is outside this range
-c,--critical=RANGE
Generatecritical state if metric is outside this range
-m,--metric=TYPE
Checkthresholds against metric. Valid types:
PROCS - number of processes (default)
VSZ - virtual memory size
RSS - resident set memory size
CPU - percentage CPU
ELAPSED- time elapsed in seconds
-t,--timeout=INTEGER
Secondsbefore connection times out (default: 10)
-v,--verbose
Extrainformation. Up to 3 verbosity levels
Filters:
-s,--state=STATUSFLAGS
Onlyscan for processes that have, in the output of `ps`, one or
moreof the status flags you specify (for example R, Z, S, RS,
RSZDT,plus others based on the output of your 'ps' command).
-p,--ppid=PPID
Onlyscan for children of the parent process ID indicated.
-z,--vsz=VSZ
Onlyscan for processes with VSZ higher than indicated.
-r,--rss=RSS
Onlyscan for processes with RSS higher than indicated.
-P,--pcpu=PCPU
Onlyscan for processes with PCPU higher than indicated.
-u,--user=USER
Onlyscan for processes with user name or ID indicated.
-a,--argument-array=STRING
Onlyscan for processes with args that contain STRING.
--ereg-argument-array=STRING
Onlyscan for processes with args that contain the regex STRING.
-C,--command=COMMAND
Onlyscan for exact matches of COMMAND (without path).
Example:
$/usr/local/nagios/check_procs -w 3 -c 5 -a nagios
>PROCS OK: 2 processes with args 'nagios'
5.3 其它可选监控插件
[LOG]
[DNS]
[DHCP]
[AMQP]
[MYSQL]
[ROUTE]
%2A-Routing
备注
Nagios本身具有web界面,web界面通过与Nagioscore的进程交互获取信息,而Nagioscore通过plugin获取信息,并将数据保存在mysql数据库中。
由于在目前环境下仅需基于Nagios的plugin获取节点的监控信息,所以并未在Nagioscore,NDOUtils,Nagiosweb interface进行深入描述。具体详细信息科参考Refernces。
分享到: