Monitoring a cluster of Raspberry Pi with Nagios

In this post, I’ll describe how I’ve set up the monitoring of my micro-cluster of Raspberry Pi with Nagios.

Infrastructure

On the monitor host

Install and configure Nagios: see this article

Install the NRPE plugin:

sudo apt-get install nagios-nrpe-plugin

Define the services: edit /etc/nagios3/conf.d/services_nagios2.cfg

# NRPE Services
define service {
        hostgroup_name                  rpi-cluster
        service_description             Current-Users-N$
        check_command                   check_nrpe_1arg$
        use                             generic-service
        notification_interval           0
}

define service {
    hostgroup_name rpi-cluster
    service_description Current Load NRPE
    check_command check_nrpe_1arg!check_load
    use generic-service
    notification_interval 0
}

define service {
    hostgroup_name rpi-cluster
    service_description Disk Space NRPE
    check_command check_nrpe_1arg!check_all_disks
    use generic-service
    notification_interval 0
}

define service {
    hostgroup_name rpi-cluster
    service_description Zombie Processes NRPE
    check_command check_nrpe_1arg!check_zombie_procs
    use generic-service
    notification_interval 0
}

define service {
    hostgroup_name rpi-cluster
    service_description Total Processes NRPE
    check_command check_nrpe_1arg!check_total_procs
    use generic-service
    notification_interval 0
}

define service {
    hostgroup_name rpi-cluster
    service_description Swap NRPE
    check_command check_nrpe_1arg!check_swap
    use generic-service
    notification_interval 0
}

Define the new hostgroup: /etc/nagios3/conf.d/hostgroups_nagios2.cfg

define hostgroup {
        hostgroup_name  rpi-cluster
                alias           Raspberry PI Cluster
                members         rpi0,rpi1,rpi2
        }

Define a new host file for each slave: /etc/nagios3/conf.d/rpi-cluster-xxx.cfg. The address config contains the slave IP.

define host {
        use                     generic-host
        host_name               rpixxx
        alias                   rpi-cluster-xxx
        hostgroups              rpi-cluster
        address                 192.168.0.xxx
}

Reload Nagios:

sudo service nagios3 reload

On the slave hosts

Install the NRPE server:

sudo apt-get install nagios-nrpe-server

Edit /etc/nagios/nrpe_local.cfg. The allowed_hosts config contains the IP of the monitor.

######################################
# Do any local nrpe configuration here
######################################

allowed_hosts=127.0.0.1,192.168.0.xxx

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10%
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 50% -c 25%

Restart the service:

sudo service nagios-nrpe-server restart

Monitor

You can now monitor the slaves on Nagios:

Nagios monitoring


Inspired by: LowEndBox