centostricks

Just another WordPress.com site

Category Archives: Nagios

Troubleshooting Nagios/Nrpe Issues


Troubleshooting NRPE (Nagios Remote Plugin Executor) Client

Nagios Server communicates with nrpe via SSL. So, all the communication is encrypted.

Common Errors while configuring NRPE

1. CHECK_NRPE: Error – Could not complete SSL handshake

Solution: 

This error message could be due to several problems:

1. SSL is disabled. Make sure both the NRPE daemon and the check_nrpe plugin were compiled with SSL support (During ./configure)

2. Incorrect file permissions. Make sure the NRPE config file (nrpe.cfg) is readable by the user (i.e. nagios) that executes the NRPE binary from inetd/xinetd.

3. The command that the NRPE daemon was asked to run took longer than 10 seconds to execute. This is the most likely cause if the error message was “CHECK_NRPE: Socket timeout after 10 seconds”. Use the –t command line option to specify a longer timeout for the check_nrpe plugin. The following example will increase the timeout to 30 seconds:
/usr/local/nagios/libexec/check_nrpe -H localhost -c somecommand -t 30

4. The NRPE daemon is not installed or not running on the remote host. Verify that the NRPE daemon is running as a standalone daemon or under inetd/xinetd with one of the following commands:

# ps -ef | grep nrpe
# netstat -at | grep nrpe
5. There is a firewall that is blocking the communication between the monitoring host (which runs the check_nrpe plugin) and the remote host (which runs the NRPE daemon). Verify that the firewall rules ( Eg : iptables) that are running on the remote host allow for communication and make sure there isn’t a physical firewall that is located between the monitoring host and the remote host.

6. There could be a network issue. Check ping on the remote IP address on which you are trying to connect

2. The check_nrpe plugin returns “CHECK_NRPE: Received 0 bytes from daemon”

Solution :

First thing you should do is check the remote server logs for an error message. Seriously. 🙂 This error could be due to the following problem:

1.  The check_nrpe plugin was unable to complete an SSL handshake with the NRPE daemon. An error message in the logs should indicate whether or not this was the case. Check the versions of OpenSSL that are installed on the monitoring host and remote host. If you’re running a commercial version of SSL on the remote host, there might be some compatibility problems.

3. The check_nrpe plugin returns “NRPE: Unable to read output”

Solution :

This error indicates that the command that was run by the NRPE daemon did not return any character output.  This could be an indication of the following problems:

1. An incorrectly defined command line in the command definition. Verify that the command definition in your NRPE configuration file is correct.

2. The plugin that is specified in the command line is malfunctioning. Run the command line manually to make sure the plugin returns some kind of text output.

3. There should be file permission issue. You need to grant read and execute privileges to the user which runs the nrpe daemon (this can be found in your nrpe config file).

For example : Your plugins are located under /usr/local/nagios/libexec/check_*

You can do this with

# chmod ug+rx /usr/local/nagios/libexec/check_*

# chown  nagios:nagios /usr/local/nagios

# chown –R nagios:nagios /usr/local/nagios/libexec

4. Check the /var/log/messages to find any errors related to host.allow/host.deny file. If there was any permission issue with this file will also result in above error

4. Unable to read output  due to Sudo Issues in CentOS when configuring an nrpe plugin with sudo:

[root@system ~]# /usr/lib/nagios/plugins/check_nrpe -H 3.3.3.3 -c check_dns

NRPE: Unable to read output

Given that check_dns is defined as follows, in nrpe.conf:

command[check_dns]=sudo /usr/local/nagios/libexec/check_dns

Solution :

You should also add its relative /etc/sudoers line as follows:

nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_dns

Then the problem is in the requiretty options in /etc/sudoers, enabled by default on CentOS. Simply comment it as follows:

#Defaults requiretty

Now the plugin should work as expected:

[root@system ~]# /usr/lib/nagios/plugins/check_nrpe -H 3.3.3.3 -c check_dns

DNS Ok

5. NPRE Daemon not shown when checked with netstat –ta

Solution :

Add a line to your /etc/services file as follows (modify the port number as you see fit)

nrpe 5666/tcp # NRPE

6. ERROR: Could not fetch information from server

The most logical first step is to re-verify the Nagios server config file.  Check to make sure DNS resolution is correct.  Second, take a look at the NSC.log on the client system.  In my case, I saw:

2009-03-30 10:52:23: error:.\NSClientListener.cpp:307: Unauthorized access from: 192.168.1.25

Well, that could definitely be a problem.  The allowed_hosts line of:

Edit nsc.ini file and added the below lines

allowed_hosts=192.168.1.25/32

Sometime you should have added the server ip address in the allowed_hosts directive, but still the connection is not happening, Even if the local firewall is allowing you. This may be still the same due to some blockage at firewall or may be your nagios server is coming through a load balancer to your client network to access the client which inturn will result in hitting your client with the load balancer ip which is not allowed in allowed_hosts directive in nsc.ini. Please have a check on nsclient.log or nsc.log file to check what is the issue and added the IP. Once you verify it’s a trusted IP address. You should be all set J

Advertisements

Nagios Client Nsclient++ Installation (Applies for Windows Clients)


1.0 Nagios Client Nsclient++

NSClient++ is an open source windows service that allows performance metrics to be gathered by Nagios for windows services. 

 1.1. Overview

Following three steps will happen on a very high level when Nagios (installed on the nagios-server) monitors a service (for e.g. disk space usage) on the remote Windows host.

  •       Nagios will execute check_nt command on nagios-server and request it to monitor disk usage on remote windows host.
  •       The check_nt on the nagios-server will contact the NSClient++ service on remote windows host and request it to execute the USEDDISKSPACE on the remote host.
  •       The results of the USEDDISKSPACE command will be returned back by NSClient++ daemon to the check_nt on nagios-server.

Following flow summarizes the above explanation:

Nagios Server (check_nt) —–> Remote host (NSClient++) —–> USEDDISKSPACE

Nagios Server (check_nt) <—– Remote host (NSClient++) <—– USEDDISKSPACE (returns disk space usage)

1.2. Setup nagios on remote windows host

1.2.1. Install NSClient++ on the remote windows server

Download NSclient++ from NSClient++ Project.

Once you download the Nsclient++, Click on the msi file to start installation

Go through the following 5 NSClient++ installation steps to get the installation completed.

(1) NSClient++ Welcome Screen

(2) License Agreement Screen


(3) Select Installation option and location. Use the default option and click next. > S

(4) Specify  the Allowed IP list, this will be the nagios server ip from which connection should be allowed

(5) Ready to Install Screen. Click on Install to get it started.

(6) Installation completed Screen. 

 1.2.2 Modify the NSClient++ Service

Go to Control Panel -> Administrative Tools -> Services. Double click on the “NSClient++ service and select the check-box that says “Allow service to interact with desktop” as shown below.

1.2.3 Start the NSClient++ Service

Start the NSClient++ service either from the Control Panel -> Administrative tools -> Services -> Select “NSClient++″ and click on start (or) Click on “Start -> All Programs -> NSClient++ -> Start NSClient++. Please note that this will start the NSClient++ as a windows service.

Later if you modify anything in the NSC.ini file, you should restart the “NSClient++″ from the windows service.

1.2.4 Whitelist the Nsclient++ Service in the Windows Firewall

1.3. Configuration steps on Nagios Server

1.3.1. Verify check_nt command and windows-server template

Verify that the check_nt is enabled under

/usr/local/nagios/etc/objects/commands.cfg

# ‘check_nt’ command definition

define command{

command_name    check_nt

command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$

}
Verify that the windows-server template is enabled under /usr/local/nagios/etc/objects/templates.cfg

# Windows host definition template – This is NOT a real host, just a template!

define host{

name                    windows-server  ; The name of this host template

use                     generic-host    ; Inherit default values from the generic-host template

check_period            24×7            ; By default, Windows servers are monitored round the clock

check_interval          5               ; Actively check the server every 5 minutes

retry_interval          1               ; Schedule host check retries at 1 minute intervals

max_check_attempts      10              ; Check each server 10 times (max)

check_command           check-host-alive        ; Default command to check if servers are “alive”

notification_period     24×7            ; Send notification out at any time – day or night

notification_interval   30              ; Resend notifications every 30 minutes

notification_options    d,r             ; Only send notifications for specific host states

contact_groups          admins          ; Notifications get sent to the admins by default

hostgroups              windows-servers ; Host groups that Windows servers should be a member of

register                0               ; DONT REGISTER THIS – ITS JUST A TEMPLATE

}

4.3.2 . Uncomment windows.cfg in /usr/local/nagios/etc/nagios.cfg

# Definitions for monitoring a Windows machine

cfg_file=/usr/local/nagios/etc/objects/windows.cfg

4.3.3. Modify /usr/local/nagios/etc/objects/windows.cfg

By default a sample host definition for a windows server is given under windows.cfg, modify this to reflect the appropriate windows server that needs to be monitored through nagios.

# Define a host for the Windows machine we’ll be monitoring

# Change the host_name, alias, and address to fit your situation

define host{

use             windows-server              ; Inherit default values from a template

host_name   remote-windows-host      ; The name we’re giving to this host

alias            Remote Windows Host     ; A longer name associated with the host

address       192.168.1.4                   ; IP address of the remote windows host

}

4.3.4. Define windows services that should be monitored.

Following are the default windows services that are already enabled in the sample windows.cfg. Make sure to update the host_name on these services to reflect the host_name defined in the above step.

define service{

use                     generic-service

host_name               remote-windows-host

service_description     NSClient++ Version

check_command           check_nt!CLIENTVERSION

}

define service{

use                     generic-service

host_name               remote-windows-host

service_description     Uptime

check_command           check_nt!UPTIME

}

define service{

use                     generic-service

host_name               remote-windows-host

service_description     CPU Load

check_command           check_nt!CPULOAD!-l 5,80,90

}

define service{

use                     generic-service

host_name               remote-windows-host

service_description     Memory Usage

check_command           check_nt!MEMUSE!-w 80 -c 90

}

define service{

use                     generic-service

host_name               remote-windows-host

service_description     C:\ Drive Space

check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90

}

1.3.5. Verify Configuration and Restart Nagios.

Verify the nagios configuration files as shown below.

[nagios-server]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0

Total Errors:   0

Things look okay – No serious problems were detected during the pre-flight check
Restart nagios as shown below.

[nagios-server]# /etc/rc.d/init.d/nagios stop

Stopping nagios: .done.

[nagios-server]# /etc/rc.d/init.d/nagios start

Starting nagios: done.
Verify the status of the various services running on the remote windows host from the Nagios web

Nagios Client Installation


3.0 Nagios Client Configuration (Linux Clients)

3.1 Now configure NRPE for clients:

Login to the linux box and start installing the nrpe which should be added to monitoring

# groupadd nagios

# useradd –g nagios nagios

# mkdir /home/nagios/downloads

# cd /home/nagios/downloads

3.2 Download nagios-plugins and NRPE for clients

#  wget  http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz

# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz

Before installing nagios plugin, do have pre-check 

#rpm -qa gcc

#rpm -qa openssl-devel

If the packages are installed we are good to continue else please install the below packages

#yum install gcc

#yum install openssl-devel

3.3 Install and configure nagios plugin

# tar zxvf  nagios-plugins-1.4.14.tar.gz

#  cd nagios-plugins-1.4.14

#  ./configure –with-nagios-user=nagios –with-nagios-group=nagios

#  make

#  make install

# chown  nagios:nagios /usr/local/nagios

# chown –R nagios:nagios /usr/local/nagios/libexec

# yum install xinetd

3.4 Install and configure NRPE daemon

# cd /home/nagios/downloads

# tar zxvf nrpe-2.12.tar.gz

# cd nrpe-2.12

#  ./configure –enable-ssl

# make all

# make install-plugin

# make install-daemon

# make install-daemon-config

#  make install-xinetd

3.5 Edit nrpe under xinetd and add following

# vi /etc/xinetd.d/nrpe

Only_from = 127.0.0.1 <Nagios Server – IP>

Also the same ip should be defined in /usr/local/nagios/etc/nrpe.cfg

Allowed_hosts = 127.0.0.1, <Nagios Server – IP>

3.5.1 Add entry for nrpe in /etc/services

 vi /etc/services

nrpe                 5666/tcp                      #nrpe

3.5.2 Start are Reload the Xinetd Daemon

 # service xinetd start/reload

# chkconfig –level 345 xinetd on

3.5.3 Test NRPE Daemon Install

# /usr/local/nagios/libexec/check_nrpe -H localhost  (From Client)

NRPE v2.12

# /usr/local/nagios/libexec/check_nrpe -H <ip address of monitored box> (from server to client IP)

NRPE v2.12

3.5.4 Adding Rules to Iptables to open port on 5666/tcp on client

# iptables –A INPUT –p tcp –m state –state NEW –-dport 5666 –j ACCEPT

# /etc/init.d/iptables save –– > to make changes permanent

Now communication has been established between Server and Client. 🙂

3.6 Configuration to be done on the Nagios Server for client to be Monitored

Here I have created a template called linux-box-remote.cfg

  1. /usr/local/nagios/etc/nagios.cfg   Main configuration file
  2. /usr/local/nagios/etc/cgi.cfg         This the file file where we do configuration changes.
  3. /usr/local/nagios/etc/objects directory will be having server scripts.
  4. add the linux-box-remote.cfg line into nagios.cfg once the file is filled with below entries.

3.6.1 linux-box-remote.cfg contains

define host{

name                  linux-box-remote             ; Name of this template

use                     generic-host          ; Inherit default values

check_period          24×7

check_interval        5

retry_interval        1

max_check_attempts    10

check_command         check-host-alive

notification_period   24×7

notification_interval 30

notification_options  d,r

contact_groups        admins

register              0          ; DONT REGISTER THIS – ITS A TEMPLATE

}

define host{

use       linux-box-remote     ; Inherit default values from a template

host_name <Hostname>    ; The name we’re giving to this server

alias     Centos5 ; A longer name for the server

address   <ip address> ; IP address of the server

}

define service{

use                 generic-service

host_name           <hostname>

service_description CPU Load

check_command       check_nrpe!check_load

}

define service{

use                 generic-service

host_name           <hostname>

service_description Current Users

check_command       check_nrpe!check_users

}

define service{

use                 generic-service

host_name            <hostname>

service_description /dev/hda1 Free Space

check_command       check_nrpe!check_hda1

}

define service{

use                 generic-service

host_name            <hostname>

service_description Total Processes

check_command       check_nrpe!check_total_procs

}

define service{

use                 generic-service

host_name            <hostname>

service_description Zombie Processes

check_command       check_nrpe!check_zombie_procs

}

3.6.2 Save the template and add the line into nagios.cfg

# vi /usr/local/nagios/etc/nagios.cfg

Cfg_file=/usr/local/nagios/etc/objects/linux-box-remote.cfg

3.6.3 Now verify the configuration file

# /usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg

 If configuration is good, nagios page will display the configured host

Nagios Server Installation


1.0 Installing and Configuring Nagios Server

1.1  Nagios Requirement

  •       Apache
  •       Gcc Compiler
  •       GD development libraries
  •       User  and Group nagios
  •       nagcmd group
  •       Openssl-devel
  •       xinetd

1.2 User and Group

# useradd  nagios

# groupadd nagcmd

#  usermod -G nagcmd nagios

# usermod -G nagcmd apache

# chown –R nagios:nagios /home/nagios

1.3 Installing Apache

# yum install httpd

# yum install php

# yum install mod_ssl

1.4 Installing GCC

# yum install gcc  ( This will installs  glibc,  glibc-common also)

1.5 Installing GD Tools

# yum install gd gd-devel

1.6 Installing Openssl

# yum install openssl-devel

1.7 Installing xinetd if not already installed

# rpm -qa xinetd   — > to verify if xinetd is already  installed , if the command doesn’t return anything we need to install it!

# yum install xinetd

2.0 Download  Nagios  and plugins on Nagios Server

# mkdir /home/nagios/downloads

# cd /home/nagios/downloads

# wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz

# wget  http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz

2.1 Installing the Nagios Package

# tar zxvf nagios-3.2.0.tar.gz

# cd nagios-3.2.0

# ./configure –with-command-group=nagcmd

# make all

#  make install

# make install-init

# make install-config

# make install–commandmode

# make install-webconf

2.2 Now create nagiosadmin account for logging into nagios through web.

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

2.3 Compile and Install the Nagios Plugins

# cd /home/nagios/downloads

# tar zxvf nagios-plugins-1.4.14.tar.gz

# cd nagios-plugins-1.4.14

# ./configure –with-nagios-user=nagios –with-nagios-group=nagios

# make

# make install

2.4 Configuring Nagios to Start at Bootup

# chkconfig  –add nagios

# chkconfig –level 345 nagios on

2.5 Editing Contacts in Contacts.cfg

# vi /usr/local/nagios/etc/objects/contacts.cfg  ( Change the e-mail address)

define contact{

contact_name       nagiosadmin             ; Short name of user

use                             generic-contact         ; Inherit default values from generic-contact template (defined above)

alias                           Nagios Admin            ; Full name of user

email                          <Email ID>      ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******

}

2.6 Customizing the Subject Line in Nagios Alerts

vi /usr/local/nagios/etc/objects/command.cfg (change the Subject Format as highlighted)

notify-host-by-email /usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n” | /bin/mail -s “NAG_$ HOSTALIAS$ is $HOSTSTATE$” $CONTACTEMAIL$
notify-service-by-email /usr/bin/printf “%b” “***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$” | /bin/mail -s “NAG_$HOSTALIAS$_$SERVICEDESC$ is $SERVICESTATE$” $CONTACTEMAIL$

2.7 Verify nagios configuration for any errors

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If you don’t find any errors start nagios daemon

# service nagios start  or /etc/init.d/nagios start

2.8 Accessing the Login Page

Check the nagios URL in web browser and login with nagiosadmin.

http://servername/nagios (Example: http://nagiosserver/nagios)

Login using nagiosadmin user and its associated password.

2.9 Installing NRPE on Nagios server

NRPE  is an client plugin, which will communicate server through 5666. Nagios server also requires NRPE plugin.

# cd /home/nagios/downloads

# wget  http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz

# tar zxvf nrpe-2.12.tar.gz

# cd nrpe-2.12

# ./configure –enable-ssl

# make all

# make install-plugin

# make install-daemon

# make install-daemon-config

# make install-xinetd

2.9.1 Edit /etc/service and add following

nrpe                       5666/tcp                               # NRPE

2.9.2 Edit  /etc/xinetd.d/nrpe and add nagios server IP or name

only_from = 127.0.0.1 <nagios_ip_address>

2.9.3 Restart xinetd and set to start at boot

#chkconfig –level 345 xinetd on

# service xinetd restart

2.9.4 Add the following in /usr/local/nagios/etc/objects/commands.cfg


##################################################################
# NRPE CHECK COMMAND
#
# Command to use NRPE to check remote host systems
##################################################################

define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

2.9.5 Test NRPE daemon install and function:

# netstat –at |grep nrpe

tcp    0    0 *:nrpe    *.*    LISTEN

# /usr/local/nagios/libexec/check_nrpe -H localhost

NRPE v2.12

2.9.6 Now check for local host in nagios server url

You should be all set with you new Nagios Server 🙂