使用Nagios监控Linux

时间:2019-04-29 03:18:07  来源:igfitidea点击:

什么是Nagios?

Nagios是一个计算机监控系统,可以监控您的网络和服务器基础结构。Nagios使管理员能够通过服务警报自动通知支持团队。可以监控系统负载,进程,磁盘使用情况和系统日志。Nagios可以监控本地主机和远程主机。Nagios可用于监控Linux,Solaris和Microsoft Windows等平台。

Nagios的NRPE是什么?

NRPE是Nagios远程插件执行程序,是Nagios代理,它允许使用放置在远程主机上的脚本对系统进行远程监控。Nagios使用check_nrpe插件轮询这些远程服务器。我们将在下面的示例安装中介绍用于远程监控的配置。

Nagios安装和配置

在下面的示例中,我们将在openSUSE 12.3服务器上安装Nagios Monitoring软件。然后,我们将使用此服务器监控Ubuntu 12.04服务器。

Nagios监控服务器

操作系统openSUSE 12.3
IP地址192.168.0.19

远程监控服务器

Ubuntu服务器:Ubuntu 12.04(远程服务器)
IP地址:192.168.0.14

安装Nagios Core

要将Nagios Core安装到我们的openSUSE 12.3服务器上,我们可以执行以下命令:

linux-j2w3:~ # zypper in nagios

Loading repository data...
Reading installed packages...

将会安装Nagios软件以及各种插件。
如果尚未安装Apache Web服务器,它将自动安装。

Nagios目录:

linux-j2w3:/etc/nagios # cd /etc/nagios

linux-j2w3:/etc/nagios # ls -l
total 68
-rw-rw-r-- 1 root   root   11653 Jun 28 16:05 cgi.cfg
-rw-r----- 1 root   nagcmd    26 Jun 28 16:05 htpasswd.users
-rw-r--r-- 1 root   root   44489 Jun 28 16:05 nagios.cfg
drwxrwxr-x 2 nagios nagcmd  4096 Jul 30 20:38 objects
-rw-rw---- 1 root   root    1336 Jun 28 16:05 resource.cfg

Apache目录

linux-j2w3:/etc/apache2/conf.d # cd /etc/apache2/conf.d

linux-j2w3:/etc/apache2/conf.d # ls -l
total 12
-rw-r--r-- 1 root root 1052 Jun 28 16:05 nagios.conf
-rw-r--r-- 1 root root  354 Jan 27  2013 php5.conf
-rw-r--r-- 1 root root  975 Feb 14 20:56 phpMyAdmin.conf

帐户密码存储在nagios.conf文件中指定的/etc/nagios/htpasswd.users文件。

创建Nagios帐户

用于创建帐户的命令是htpasswd2。在下面的示例中,我们使用了-c选项,因为这会清除所有现有帐户。可以通过省略-c标志来创建其他帐户:

linux-j2w3:/etc/nagios # htpasswd2 -c /etc/nagios/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

nagios.conf文件示例

ScriptAlias /nagios/cgi-bin "/usr/lib/nagios/cgi"

<Directory "/usr/lib/nagios/cgi">
#  SSLRequireSSL
   Options ExecCGI
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /etc/nagios/htpasswd.users
   Require valid-user
</Directory>

Alias /nagios "/usr/share/nagios"

<Directory "/usr/share/nagios">
#  SSLRequireSSL
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /etc/nagios/htpasswd.users
   Require valid-user
    <IfDefine KOHANA2>
      DirectoryIndex index.html index.php
    </IfDefine>
</Directory>

测试Nagios Core的安装

快速测试Nagios是否已正确安装,并且我们的网络服务器也正常运行。

Nagios的启动/停止命令

启动,停止,重启服务或者查看服务状态

service nagios start
service nagios stop
service nagios restart
service nagios status

service apache2 start
service apache2 stop
service apache2 restart
service apache2 status

或者使用 systemctl命令

systemctl start nagios.service 
systemctl restart nagios.service 
systemctl stop apache2.service

将服务设置为开机自启动

chkconfig nagios on
chkconfig apache2 on

chkconfig -l | grep nagios
chkconfig -l | grep apache2

apache2 0:off 1:off 2:off 3:on 4:off 5:on 6:off3和5应该是on状态

使用Web服务器测试Nagios

在本地Web浏览器(Nagios openSUSE服务器上的浏览器)中键入以下地址:

http://192.168.0.19/nagios(使用 ip a s查看服务器ip)

linux-j2w3:/etc/nagios # ip a s
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:21:9a:25 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.19/24 brd 192.168.0.255 scope global eth0

输入我们之前创建的密码。进入nagios主页面

Nagios Core主屏幕

nagios

添加要监控的远程主机

在我们要监控的远程主机上(192.168.0.14),我们需要安装NRPE和插件组件。
这里使用的是Ubuntu 12.04服务器, 运行安装命令sudo apt-get install nagios-nrpe-server

查看可用的软件包。

apt-cache search nrpe | more
nagios-nrpe-server - Nagios Remote Plugin Executor Server

apt-cache search nagios-plugins | more 
nagios-plugins - Plugins for nagios compatible monitoring systems (metapackage)

安装命令示例:

john@john-desktop:~$ sudo apt-get install nagios-nrpe-server
[sudo] password for john: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  mc-data libcogl9 gir1.2-json-1.0 libunity6 libbabl-0.0-0
  gir1.2-gtkclutter-1.0 libgegl-0.0-0 gir1.2-clutter-1.0 libclutter-gtk-1.0-0
  libcogl-common gir1.2-champlain-0.12 libclutter-1.0-0 libchamplain-0.12-0

将Nagios监控服务器IP地址添加到nrpe.cfg

在要监控的远程服务器上(192.168.0.14),我们需要修改以下文件:/etc/nagios/nrpe.cfg

添加我们的Monitoring Server的IP地址。

allowed_hosts=127.0.0.1,192.168.0.19

允许监控服务器访问。

同时在nrpe.cfg文件中注销下面的条目

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 	

check_hda1磁盘对应的是 /dev/sda1。 使用df -h查看监控的远程服务器上(192.168.0.14)的硬盘。

修改dont_blame_nrpe = 0行, 将其值更改为1

dont_blame_nrpe=1 

在Ubuntu服务器上重新启动NRPE

sudo /etc/init.d/nagios-nrpe-server restart

编辑/etc/nagios/nagios.cfg

在Nagios监控服务器上(192.168.0.19)编辑/etc/nagios/nagios.cfg文件

添加下面内容

#############################################################

# Definition For Monitoring Remote Linux Server
cfg_file=/etc/nagios/objects/remotehosts.cfg

#############################################################

检查nagios.cfg中是否有错误

运行命令nagios -v nagios.cfg检查配置:

linux-j2w3:/etc/nagios # nagios -v nagios.cfg

Nagios Core 3.5.0
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 03-15-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...

创建remotehosts.cfg

该文件将包含我们的远程主机定义和服务信息:

define host{
          name                  linux-box-remote      ; Name of Template
          use                   generic-host          ; Inherit Default Values
          check_period          24x7
          check_interval        5
          retry_interval        1
          max_check_attempts    10
          check_command         check-host-alive
          notification_period   24x7
          notification_interval 30
          notification_options  d,r
          contact_groups        admins
          register              0          
          }

define host{
          use       linux-box-remote     ; Inherit default values from a template
          host_name ubunt01    ; Identification name of server
          alias     ubunt01    ; A longer name for the server..
          address   192.168.0.14  ; IP address of the server
          }

define service{
          use                 generic-service
          host_name           ubunt01
          service_description CPU Load
          check_command       check_nrpe!check_load
          }
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Current Users
          check_command       check_nrpe!check_users
          }
define service{
    use                      generic-service ;Name of service template to use
    host_name                ubunt01
    service_description      Remote check disk
    check_command            check_nrpe!check_hda1!20%!10%!/
}
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Total Processes
          check_command       check_nrpe!check_total_procs
          }
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Zombie Processes
          check_command       check_nrpe!check_zombie_procs
}

在监控服务器(192.168.0.19)上的文件/etc/nagios/objects/commands.cfg的底部添加以下行:

###############################################################################
# NRPE CHECK COMMAND
#
# Command to use NRPE to check remote host systems....
###############################################################################

define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

测试从Nagios服务器到远程服务器的连接

可以执行各种快速检查来验证Nagios监控服务器是否可以与远程服务器上的NRPE组件进行通信:

使用telnet测试来自Nagios的连接

在监控服务器上操作

telnet 192.168.0.14 5666

Escape character is '^]'

测试监控服务器到远程服务器的连接:

linux-j2w3:/usr/lib/nagios/plugins # ./check_nrpe -H 192.168.0.14
NRPE v2.12

重启监控服务器上的所有服务

systemctl restart nagios.service 
systemctl restart apache2.service 

systemctl status nagios.service 
systemctl status apache2.service

重新登录Nagios网页

使用浏览器打开 http://192.168.0.19/nagios

nagios

要查看受监控的服务器,可以单击在浏览器左侧框架中的主机链接上。

nagios

要查看已定义的服务,请单击浏览器左侧框架中的服务链接:

nagios