Nagios 安装、配置和使用 操作(3)

$ tar-zxvf nrpe-2.14.tar.gz

$ cdnrpe-2.14

$./configure

$make all

$make install-plugin && make install-daemon && makeinstall-daemon-config && make install-xinetd

5. 配置

(1)在/etc/xinetd.d/nrpe的only_from变量中增加<Horizonip | Nagios core ip>

(2)在/etc/services中增加

nrpe 5666/tcp # NRPE

(3) $service xinetd restart

(4)开启防火墙:

$sudo iptables -A INPUT -p tcp -m tcp --dport 5666 -j ACCEPT

$iptables-save

$ vim/etc/network/interface的网卡配置中增加pre-upiptables-restore < /etc/iptables.up.rules

(4)在本地检查配置是否正确:

$netstat -at | grep nrpe

>tcp 0 0 *:nrpe *:* LISTEN

$/usr/local/nagios/libexec/check_nrpe -H localhost

>NRPE v2.14

(5)在Horizon| Nagios core node上检查配置是否正确:

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_users

>USERS OK - 1 users currently logged in |users=1;5;10;0

4.4 Computer node
同Controllernode

5. 使用Nagios实现Openstack监控
服务启动方法:

NDOUtils:/usr/local/nagios/bin/ndo2db-3x -c /usr/local/nagios/etc/ndo2db.cfg

Nagioscore: /usr/local/nagios/bin/nagios -d/usr/local/nagios/etc/nagios.cfg

NRPE:service xinetd start

主要监控以下资源:

控制和计算节点硬件资源:

CPU,Mem,Disk,Network

2. 控制和计算服务

keystone,glance-api, glance-register, nova-api, nova-computer, nova-network,nova-scheduler, nova-volume, nova-objectstores, mysql, dnsmasq,rabbitmq, etc.

5.1控制和计算节点硬件资源
5.1.1 CPU
插件名称:check_cpu.sh

%28matejunkie%29/details

插件描述:基于/proc/stat周期性获取CPU的监控数据,并返回W|C

插件参数:

check_cpu.sh[-i/--interval] [-w/--warning] [-c/--critical]

Options:

--interval|-i)

Definesthe pause between the two times /proc/stat is being

parsed.Higher values could lead to more accurate result.

Defaultis: 1 second

--warning|-w)

Setsa warning level for CPU user. Default is: off

--critical|-c)

Setsa critical level for CPU user. Default is: off

Example:

【本地环境】

$/usr/local/nagios/libexec/check_cpu.sh -i 3 -w 60 -c 80

> OK- user: 0.83, nice: 0.50, sys: 0.83, iowait: 0.50, irq: 0.50,softirq: 0.50 idle: 99.83, cpu_usage=3 | 'user'=0.83 'nice'=0.50'sys'=0.83 'softirq'=0.50 'iowait'=0.50 'irq'=0.50 'idle'=99.83

其中cpu_usage是当前CPU的使用量。

PS:由于此plugin是shell脚本,具体逻辑可以定制化。

【远程环境】

在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh-i 3 -w 60 -c 80

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_cpu

> 同上

5.1.2 Mem
插件名称:check_mem.sh

插件描述:基于free查询mem的使用情况

插件参数:

check_mem.sh-w <warnlevel> -c <critlevel>

其中warn或者crit与(memuserd/memtotal)比较

Example:

【本地环境】

$/usr/local/nagios/check_mem.sh -w 4 -c 10

>Memory: WARNING Total: 2003 MB - Used: 166 MB - 8%used!|TOTAL=2101026816;;;; USED=173740032;;;; CACHE=856936448;;;;BUFFER=58998784;;;;

PS:由于此plugin是shell脚本,具体逻辑可以定制化。

【远程环境】

在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh-w 80 -c 90

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_mem

> 同上

5.1.3 Network
插件名称:check_net.pl

插件描述:具体不详

插件参数:具体不详无-h | --help

Example:

【本地环境】

$/usr/local/nagios/check_net.pl

> NETOK - (Rx/Tx) eth0=(65.1B/7.1B), lo=(5.6B/5.6B)|eth0_in=68215167c;eth0_out=7394459c; lo_in=5905765c; lo_out=5905765c;

【远程环境】

在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_net]=/usr/local/nagios/libexec/check_net.pl

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_net

> 同上

5.1.4 Disk & LVM
插件名称:check_diskstat.sh

插件描述:具体不详

插件参数:

Usage:

./check_diskstat.sh-d DEVICE -w tps,read,write -c tps,read,write | -h

-dDEVICE DEVICE must be without /dev (ex: -d sda)

-w/cTPS,READ,WRITE TPS means transfer per seconds (aka IO/s)

READ andWRITE are in sectors per seconds

Example:

【本地环境】

$ sudo/usr/local/nagios/check_diskstat.sh -d vda -w 200,100000,100000 -c300,200000,200000

>summary: 0 io/s, read 8 sectors (0kB/s), write 56 sectors (4kB/s) in6 seconds | tps=0io/s;;; read=682b/s;;; write=4778b/s;;;

【远程环境】

在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_diskstat]=/usr/local/nagios/libexec/check_diskstat.sh -d vda -w 200,100000,100000 -c 300,200000,200000

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ diskstat

> 同上

---------------------------------------------------------------------------------------------------------------------------

插件名称:check_disk

%25-used-space/details

插件描述:基于df命令编写,-d需要设置df打印出来的Mountedon

插件参数:

Thisplugin shows the % of used space of a mounted partition, using the'df' utility

./check_disk:

-c<integer> If the % of used space is above <integer>,returns CRITICAL state

-w<integer> If the % of used space is below CRITICAL and above<integer>, returns WARNING state

-d<device> The partition or mountpoint to be checked. eg./dev/sda1, /home, /

Example:

【本地环境】

$ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/vda1 9.9G 1.7G 7.8G 18% /

udev 998M 12K 998M 1% /dev

tmpfs 401M 224K 401M 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 1002M 0 1002M 0% /run/shm

/dev/vdb 20G 173M 19G 1% /mnt

$/usr/local/nagios/check_disk -d /mnt -c 80 -w 10

> OK- /mnt space used=1% | '/mnt usage'=1%;10;80;

【远程环境】

在/usr/local/nagios/etc/nrpe.cfg中增加
command[check_disk]=/usr/local/nagios/libexec/check_disk-d /mnt -c 80 -w 10

$/usr/local/nagios/libexec/check_nrpe -H 10.0.1.14 -c check_ disk

> 同上

---------------------------------------------------------------------------------------------------------------------------

插件名称:check_lvm

插件描述:仅运行在存在vg的情况下

插件参数:

NOTE -This script only works on _mounted_ volumes!

Usage:./check_lvm -w -c

Description:

Thisplugin finds all LVM logical volumes, checks their used space, andcompares against the supplied thresholds.

Example:

5.2 控制和计算服务
插件名称:check_proc

插件描述:基于ps,可用于查看相关服务的进程是否存在。

插件参数:

check_procs-w <range> -c <range> [-m metric] [-s state] [-p ppid]

[-uuser] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]

[-Ccommand] [-t timeout] [-v]

Options:

-h,--help

Printdetailed help screen

-V,--version

Printversion information

-w,--warning=RANGE

Generatewarning state if metric is outside this range

-c,--critical=RANGE

Generatecritical state if metric is outside this range

-m,--metric=TYPE

Checkthresholds against metric. Valid types:

PROCS - number of processes (default)

VSZ - virtual memory size

RSS - resident set memory size

CPU - percentage CPU

ELAPSED- time elapsed in seconds

-t,--timeout=INTEGER

Secondsbefore connection times out (default: 10)

-v,--verbose

Extrainformation. Up to 3 verbosity levels

Filters:

-s,--state=STATUSFLAGS

Onlyscan for processes that have, in the output of `ps`, one or

moreof the status flags you specify (for example R, Z, S, RS,

RSZDT,plus others based on the output of your 'ps' command).

-p,--ppid=PPID

Onlyscan for children of the parent process ID indicated.

-z,--vsz=VSZ

Onlyscan for processes with VSZ higher than indicated.

-r,--rss=RSS

Onlyscan for processes with RSS higher than indicated.

-P,--pcpu=PCPU

Onlyscan for processes with PCPU higher than indicated.

-u,--user=USER

Onlyscan for processes with user name or ID indicated.

-a,--argument-array=STRING

Onlyscan for processes with args that contain STRING.

--ereg-argument-array=STRING

Onlyscan for processes with args that contain the regex STRING.

-C,--command=COMMAND

Onlyscan for exact matches of COMMAND (without path).

Example:

$/usr/local/nagios/check_procs -w 3 -c 5 -a nagios

>PROCS OK: 2 processes with args 'nagios'

5.3 其它可选监控插件
[LOG]

[DNS]

[DHCP]

[AMQP]

[MYSQL]

[ROUTE]

%2A-Routing

备注
Nagios本身具有web界面,web界面通过与Nagioscore的进程交互获取信息,而Nagioscore通过plugin获取信息,并将数据保存在mysql数据库中。

由于在目前环境下仅需基于Nagios的plugin获取节点的监控信息,所以并未在Nagioscore,NDOUtils,Nagiosweb interface进行深入描述。具体详细信息科参考Refernces。

分享到:

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/d780b4c8f6b1fad3df97d74fd9e85434.html