Nagios and NRPE Monitoring

Monitoring Linux with Nagios

What is Nagios?


Nagios is a Computer Monitoring System that can monitor your Network and Server infrastructure. Nagios gives the administrator the ability to notify support teams with service alerts automatically. System Load, Processes, Disk Usage and System Logs can all be monitored. Nagios can monitor local hosts and remote hosts. Nagios can be used to monitor platforms such as Linux, Solaris and Microsoft's Windows.



What is NRPE for Nagios?


NRPE is the Nagios Remote Plugin Executor is a Nagios agent that allows remote monitoring of systems using scripts that are placed on the remote host. Nagios polls these remote servers using the "check_nrpe" plugin. We will cover the configuration for remote monitoring in our example installation below.


Nagios Installation and Configuration


In the example that follows we will be installing the Nagios Monitoring software on a openSUSE 12.3 server. We will then use this server to monitor an Ubuntu 12.04 server.


Nagios Monitoring Server


openSUSE 12.3
IP Address: 192.168.0.19


Remotely Monitored Server


Ubuntu Server: Ubuntu 12.04 (remote server)
IP Address: 192.168.0.14


Installing Nagios Core


To install the Nagios Core onto our openSUSE 12.3 server we can issue the following command:



linux-j2w3:~ # zypper in nagios

Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following NEW packages are going to be installed:
  apparmor-parser mrtg nagios nagios-plugins nagios-plugins-bgpstate
  nagios-plugins-breeze nagios-plugins-by_ssh nagios-plugins-cluster
  nagios-plugins-common nagios-plugins-dhcp nagios-plugins-dig
  nagios-plugins-disk nagios-plugins-disk_smb nagios-plugins-dns
  nagios-plugins-dummy nagios-plugins-file_age nagios-plugins-flexlm
  nagios-plugins-http nagios-plugins-icmp nagios-plugins-ide_smart
  nagios-plugins-ifoperstatus nagios-plugins-ifstatus nagios-plugins-ircd
  nagios-plugins-linux_raid nagios-plugins-load nagios-plugins-log
  nagios-plugins-mailq nagios-plugins-mrtg nagios-plugins-mrtgtraf
  nagios-plugins-netapp nagios-plugins-nt nagios-plugins-ntp_peer
  nagios-plugins-ntp_time nagios-plugins-nwstat nagios-plugins-oracle
  nagios-plugins-overcr nagios-plugins-ping nagios-plugins-procs
  nagios-plugins-real nagios-plugins-rpc nagios-plugins-sensors
  nagios-plugins-smtp nagios-plugins-ssh nagios-plugins-swap nagios-plugins-tcp
  nagios-plugins-time nagios-plugins-ups nagios-plugins-users
  nagios-plugins-wave nagios-www net-snmp perl-Crypt-DES perl-Crypt-Rijndael
  perl-Net-SNMP perl-SNMP perl-Socket6 perl-Term-ReadKey sensors whois

The following recommended packages were automatically selected:
  apparmor-parser mrtg nagios-plugins nagios-plugins-bgpstate
  nagios-plugins-breeze nagios-plugins-by_ssh nagios-plugins-cluster
  nagios-plugins-dhcp nagios-plugins-dig nagios-plugins-disk
  nagios-plugins-disk_smb nagios-plugins-dns nagios-plugins-dummy
  nagios-plugins-file_age nagios-plugins-flexlm nagios-plugins-http
  nagios-plugins-icmp nagios-plugins-ide_smart nagios-plugins-ifoperstatus
  nagios-plugins-ifstatus nagios-plugins-ircd nagios-plugins-linux_raid
  nagios-plugins-load nagios-plugins-log nagios-plugins-mailq
  nagios-plugins-mrtg nagios-plugins-mrtgtraf nagios-plugins-netapp
  nagios-plugins-nt nagios-plugins-ntp_peer nagios-plugins-ntp_time
  nagios-plugins-nwstat nagios-plugins-oracle nagios-plugins-overcr
  nagios-plugins-ping nagios-plugins-procs nagios-plugins-real
  nagios-plugins-rpc nagios-plugins-sensors nagios-plugins-smtp
  nagios-plugins-ssh nagios-plugins-swap nagios-plugins-tcp nagios-plugins-time
  nagios-plugins-ups nagios-plugins-users nagios-plugins-wave nagios-www

59 new packages to install.
Overall download size: 4.2 MiB. After the operation, additional 13.4 MiB will
be used.
Continue? [y/n/?] (y): 

Now if we reply "Y" yes to the above prompt, our Nagios software along with various plugins will be installed.

In this example I had already installed the Apache webserver. If you haven’t already installed this component, it will be automatically installed.

The following directory structures should now be available:

Nagios Area



linux-j2w3:/etc/nagios # pwd

/etc/nagios

linux-j2w3:/etc/nagios # ls -l
total 68
-rw-rw-r-- 1 root   root   11653 Jun 28 16:05 cgi.cfg
-rw-r----- 1 root   nagcmd    26 Jun 28 16:05 htpasswd.users
-rw-r--r-- 1 root   root   44489 Jun 28 16:05 nagios.cfg
drwxrwxr-x 2 nagios nagcmd  4096 Jul 30 20:38 objects
-rw-rw---- 1 root   root    1336 Jun 28 16:05 resource.cfg

Apache2 area



linux-j2w3:/etc/apache2/conf.d # pwd

/etc/apache2/conf.d

linux-j2w3:/etc/apache2/conf.d # ls -l
total 12
-rw-r--r-- 1 root root 1052 Jun 28 16:05 nagios.conf
-rw-r--r-- 1 root root  354 Jan 27  2013 php5.conf
-rw-r--r-- 1 root root  975 Feb 14 20:56 phpMyAdmin.conf

Permissions can be set in the nagios.conf file. In this file you can specify access by host name. Account passwords are stored in the following location: "/etc/nagios/htpasswd.users" as specified within the "nagios.conf" file.


Create Nagios Accounts


The command used to create an account is "htpasswd2". In the example below we have used the "-c" option as this clears out any existing accounts. Additional accounts can be created by omitting the "-c" flag:



linux-j2w3:/etc/nagios # htpasswd2 -c /etc/nagios/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

Example nagios.conf file



ScriptAlias /nagios/cgi-bin "/usr/lib/nagios/cgi"

<Directory "/usr/lib/nagios/cgi">
#  SSLRequireSSL
   Options ExecCGI
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /etc/nagios/htpasswd.users
   Require valid-user
</Directory>

Alias /nagios "/usr/share/nagios"

<Directory "/usr/share/nagios">
#  SSLRequireSSL
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /etc/nagios/htpasswd.users
   Require valid-user
    <IfDefine KOHANA2>
      DirectoryIndex index.html index.php
    </IfDefine>
</Directory>

Test Nagios Core installation


At this point we can quickly test that Nagios has installed correctly and that our webserver is also functioning correctly.



Start/Stop/Status Commands for Nagios


Various command can be used to Start, Stop, Restart our installation.

Traditional "service" commands can be used:

service nagios [start/stop/restart/status]

service apache2 [start/stop/restart/status]

You may also use the newer "systemctl" commands to start/stop/restart:

systemctl restart nagios.service
systemctl restart apache2.service

To check the status of your running services:

systemctl status nagios.service
systemctl status apache2.service


linux-j2w3:/etc/nagios # service nagios start
linux-j2w3:/etc/nagios # service apache2 start

linux-j2w3:/etc/nagios # service nagios status
nagios.service - LSB: Network monitor Nagios
          Loaded: loaded (/etc/init.d/nagios)
          Active: active (running) since Tue, 2013-07-30 21:08:14 BST; 21s ago
         Process: 3511 ExecStart=/etc/init.d/nagios start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/nagios.service
                  └ 3590 /usr/sbin/nagios -d /etc/nagios/nagios.cfg

Jul 30 21:08:14 linux-j2w3.site nagios[3589]: Nagios 3.5.0 starting... (PID=...)
Jul 30 21:08:14 linux-j2w3.site nagios[3589]: Local time is Tue Jul 30 21:08...3
Jul 30 21:08:14 linux-j2w3.site nagios[3589]: LOG VERSION: 2.0
Jul 30 21:08:14 linux-j2w3.site nagios[3590]: Finished daemonizing... (New P...)
Jul 30 21:08:14 linux-j2w3.site nagios[3511]: Starting Nagios ..done
Jul 30 21:08:14 linux-j2w3.site systemd[1]: Started LSB: Network monitor Nagios.

linux-j2w3:/etc/nagios # service apache2 status
apache2.service - apache
          Loaded: loaded (/lib/systemd/system/apache2.service; disabled)
          Active: active (running) since Tue, 2013-07-30 21:08:27 BST; 23s ago
         Process: 3600 ExecStart=/usr/sbin/start_apache2 -D SYSTEMD -k start (code=exited, status=0/SUCCESS)
        Main PID: 3619 (httpd2-prefork)
          CGroup: name=systemd:/system/apache2.service
                  ├ 3619 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...
                  ├ 3620 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...
                  ├ 3621 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...
                  ├ 3622 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...
                  ├ 3623 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...
                  └ 3624 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.con...

Jul 30 21:08:26 linux-j2w3.site start_apache2[3600]: httpd2-prefork: apr_sock...
Jul 30 21:08:26 linux-j2w3.site start_apache2[3600]: httpd2-prefork: Could no...
Jul 30 21:08:27 linux-j2w3.site systemd[1]: Started apache.

Set Services to Start Automatically at boot



linux-j2w3:/etc/nagios # chkconfig nagios on

linux-j2w3:/etc/nagios # chkconfig -l | grep nagios

nagios                    0:off  1:off  2:off  3:on   4:off  5:on   6:off

linux-j2w3:/etc/nagios # chkconfig apache2 on

linux-j2w3:/etc/nagios # chkconfig -l | grep apache2

apache2                   0:off  1:off  2:off  3:on   4:off  5:on   6:off

As you can see from the output, both Nagios and our webserver will start automatically in runlevels "3" and "5".


Testing Nagios with Webserver


Once you are happy that the services are running (see section above), we can now try out Nagios Core with our webserver. In the following example, I am going to type the following address into my local web browser (Browser on our Nagios openSUSE server):

192.168.0.19/nagios

If you are unsure of your IP address, then you can use the "ip a s" command to display this information:



linux-j2w3:/etc/nagios # ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:21:9a:25 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.19/24 brd 192.168.0.255 scope global eth0
    inet6 fe80::a00:27ff:fe21:9a25/64 scope link
       valid_lft forever preferred_lft forever

You will need to enter your "nagiosadmin" credentials to login. If all has gone well, you will see a screen similar to the one below:


Nagios Core Home Screen



Nagios Core Home Screen


Adding Remote hosts to be monitored


On our remote host that we wish to monitor, we will need to install our NRPE and plugin components. In our example here I am using an Ubuntu 12.04 server. First I issued the "apt-cache" command to see what packages were available. Once I had then identified the packages that were needed, I ran the install command "sudo apt-get install nagios-nrpe-server". This installed our NRPE component and the necessary plugins:



apt-cache search nrpe | more
nagios-nrpe-server - Nagios Remote Plugin Executor Server

apt-cache search nagios-plugins | more 
nagios-plugins - Plugins for nagios compatible monitoring systems (metapackage)

Example of install command:



john@john-desktop:~$ sudo apt-get install nagios-nrpe-server
[sudo] password for john: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  mc-data libcogl9 gir1.2-json-1.0 libunity6 libbabl-0.0-0
  gir1.2-gtkclutter-1.0 libgegl-0.0-0 gir1.2-clutter-1.0 libclutter-gtk-1.0-0
  libcogl-common gir1.2-champlain-0.12 libclutter-1.0-0 libchamplain-0.12-0
  libcogl-pango0 libchamplain-gtk-0.12-0 libclutter-1.0-common gir1.2-cogl-1.0
  gir1.2-gtkchamplain-0.12 gir1.2-coglpango-1.0 libllvm3.0
Use 'apt-get autoremove' to remove them.
The following extra packages will be installed:
  libnet-snmp-perl libpq5 libradius1 nagios-plugins nagios-plugins-basic
  nagios-plugins-standard snmp
Suggested packages:
  libcrypt-des-perl nagios3 postfix sendmail-bin exim4-daemon-heavy
  exim4-daemon-light
The following NEW packages will be installed
  libnet-snmp-perl libpq5 libradius1 nagios-nrpe-server nagios-plugins
  nagios-plugins-basic nagios-plugins-standard snmp
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,353 kB of archives.
After this operation, 4,516 kB of additional disk space will be used.
Do you want to continue [Y/n]? 

To install the necessary packages, reply "Y" to the prompt.


Add Nagios Monitoring Server IP address to nrpe.cfg


On our remote server to be monitored, we need to modify the following file: "/etc/nagios/nrpe.cfg"



john@john-desktop:/etc/nagios$ pwd
/etc/nagios
john@john-desktop:/etc/nagios$ ls -l
total 16
-rw-r--r-- 1 root root 7465 Jul 31 21:36 nrpe.cfg
drwxr-xr-x 2 root root 4096 May 30 23:42 nrpe.d
-rw-r--r-- 1 root root  117 May 30 23:42 nrpe_local.cfg

In the file nrpe.cfg, add the IP address of our Monitoring Server. Modify the following line as follows:



allowed_hosts=127.0.0.1,192.168.0.19

Remember to change the IP address to match your Nagios Monitoring server.

Uncomment the following entries in the nrpe.cfg file:



command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 	

Also note the "check_hda1" entry needs to match the drive you are checking. In this example it is "/dev/sda1". If you are unsure of which drive you need to specify, then you can issue the "df -h" command to provide this information.

Next modify the line "dont_blame_nrpe=0" and change the value to a "1" :



dont_blame_nrpe=1 

Restart NRPE on Ubuntu server


To reinitialise our NRPE component so that it picks up the additional IP address, we need to restart it with the following command:



sudo /etc/init.d/nagios-nrpe-server restart

Edit /etc/nagios/nagios.cfg (Nagios Monitoring Server - openSUSE)


Now we need to make the following changes to our openSUSE config: /etc/nagios/nagios.cfg file

Add the following Section to the above file:



#############################################################

# Definition For Monitoring Remote Linux Server
cfg_file=/etc/nagios/objects/remotehosts.cfg

#############################################################

Check for any errors in nagios.cfg


To verify that no errors are within your "nagios.cfg" file, you can run the command "nagios -v nagios.cfg" to verify your configurations:



linux-j2w3:/etc/nagios # nagios -v nagios.cfg

Nagios Core 3.5.0
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 03-15-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/etc/nagios/objects/commands.cfg'...
Processing object config file '/etc/nagios/objects/contacts.cfg'...
Processing object config file '/etc/nagios/objects/timeperiods.cfg'...
Processing object config file '/etc/nagios/objects/templates.cfg'...
Processing object config file '/etc/nagios/objects/remotehosts.cfg'...
Processing object config file '/etc/nagios/objects/localhost.cfg'...
Processing object config directory '/etc/nagios/servers'...
   Read object config files okay...

Running pre-flight check on configuration data...
.....
.....
.....
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

If your results look similar to the above, then we are good to continue with our configurations:


Create remotehosts.cfg


Once the above section has been added and verified, we now need to create the file that is being referenced "remotehosts.cfg".

This file will contain our remote host definition and service information:



define host{
          name                  linux-box-remote      ; Name of Template
          use                   generic-host          ; Inherit Default Values
          check_period          24x7
          check_interval        5
          retry_interval        1
          max_check_attempts    10
          check_command         check-host-alive
          notification_period   24x7
          notification_interval 30
          notification_options  d,r
          contact_groups        admins
          register              0          
          }

define host{
          use       linux-box-remote     ; Inherit default values from a template
          host_name ubunt01    ; Identification name of server
          alias     ubunt01    ; A longer name for the server..
          address   192.168.0.14  ; IP address of the server
          }

define service{
          use                 generic-service
          host_name           ubunt01
          service_description CPU Load
          check_command       check_nrpe!check_load
          }
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Current Users
          check_command       check_nrpe!check_users
          }
define service{
    use                      generic-service ;Name of service template to use
    host_name                ubunt01
    service_description      Remote check disk
    check_command            check_nrpe!check_hda1!20%!10%!/
}
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Total Processes
          check_command       check_nrpe!check_total_procs
          }
define service{
          use                 generic-service
          host_name           ubunt01
          service_description Zombie Processes
          check_command       check_nrpe!check_zombie_procs
}


In the above file you will need to modify the "host_name" and "alias" entries to match the names you wish to give your server, the IP address will need to be changed to match that of your remote server.

Also we need to add the following line to the bottom of the file "/etc/nagios/objects/commands.cfg" (On nagios Monitoring server openSUSE)



###############################################################################
# NRPE CHECK COMMAND
#
# Command to use NRPE to check remote host systems....
###############################################################################

define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

Test Connectivity from Nagios Server to Remote Server


There are various quick checks that can be carried out to verify that the Nagios monitoring server can communicate with the NRPE component on the remote server:


Test connection from Nagios Server using telnet (openSUSE)


If you have "telnet" installed you can can issue the command below. Here we are looking for a quick reply of "Escape character is '^]'". You will need to change the IP address to match your remote server. The number after the IP address is the port number we are using to communicate through.



telnet 192.168.0.14 5666

Escape character is '^]'

A better way of testing connectivity from our Nagios Monitoring server to our remote server is to use the "check_nrpe" command as per the example below:



linux-j2w3:/usr/lib/nagios/plugins # ./check_nrpe -H 192.168.0.14
NRPE v2.12

If you see the version return, then we are ok. In the example above we received NRPE V2.12.

If you are having connectivity problems, then you may need to amend your firewall settings locally and remotely. You will probably want to open port 80 on the Nagios server. This will allow a browser from another system to look at the monitoring information. If you are using openSUSE, then you can use the "yast firewall" command to view and change settings. If you are using Ubuntu, then you will probably need to use the "UFW" tool to change your rules. Only change rules if you are confident you fully understand the changes you are making!

Basic overview of UFW can be found here UFW Firewall

basic overview of iptables can be found here iptables


Restart all services on Nagios Monitoring Server (openSUSE)


systemctl restart nagios.service
systemctl restart apache2.service

systemctl status nagios.service
systemctl status apache2.service


Nagios Core Home Screen


As mentioned earlier, we can type the following address into our browser to view the main Nagios Core Home screen "192.168.0.10/nagios"

You will need to enter your "nagiosadmin" credentials into the box:

Nagios Login Screen
Nagios Core Home Screen

To view monitored servers, you can click on the "hosts" link in the left hand frame of your browser. Here you will see your local server and any remote servers that are monitored:

Nagios Core Hosts Screen

To view services that have been defined, click on the "services" link from the left hand frame of your browser:

Nagios Core Services Screen

To display the scheduling queue, click on the "Scheduling Queue":

Nagios Core Scheduling Screen


Nagios Documentation and Help


Some great documentation can be found on the official Nagios website: Documentation

Other sources of documentation can be found on the Nagios homepage: library.nagios.com