วันอังคารที่ 5 กรกฎาคม พ.ศ. 2554

สร้าง host device in nagios


การติดตั้ง Nagios 3.0 และ Nagios-plugins (Host and Service Monitoring)
Nagios-plugins คือ เป็น Plugins ที่ใช้ในการตรวจ system, memory usage, cpu utilization อื่นๆ เป็นต้น.
สำหรับท่านไหนที่เลยติดตั้งจาก YaST แล้วต้อง Uninstall โปรแกรมออกก่อนนะครับ
- nagios
- nagios-nsca
- nagios-nsca-client
- nagios-plugins
- nagios-plugins-extras
- nagios-www
สำหรับท่านไหนที่ยังไม่ได้ติดตั้ง ก็ไม่ต้องสนใจด้านบนครับ
ตรวจสอบโปรแกรมว่าคุณติดตั้งไปยัง
- gd-devel
- libpng-devel
1. ดาวห์โหลดโปรแกรม
- nagios-3.0b6.tar.gz    download : http://www.nagios.org/download
- nagios-plugins-1.4.10.tar.gz downlaod : http://sourceforge.net
$ cd เข้าไปยัง folder ที่คุณเก็บโปรแกรมไว้
$ tar -zxvf nagios-3.0b6.tar.gz
$ tar -zxvf nagios-plugins-1.4.10.tar.gz

2. สร้าง user และ group
$ useradd -m nagios
$ groupadd nagios
$ groupadd nagcmd
$ usermod -G nagios,nagcmd nagios
$ usermod -G nagcmd wwwrum

3. ติดตั้ง Nagios 3.0
$ cd /nagios-3.0b6
$ ./configure --prefix=/opt/nagios --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios \
--with-nagios-group=nagios --with-command-group=nagcmd
$ make all
$ make install
$ make install-init
$ make install-commandmode
$ make install-config
$ make install-webconf

4. ติดตั้ง Nagios Plugins 1.4.10
$ cd nagios-plugins-1.4.10
$ ./configure --prefix=/opt/nagios --with-nagios-user=nagios --with-nagios-group=nagios
$ make
$ make install

5. คอนฟิก Nagios 3.0
$ vi /opt/nagios/etc/nagios.cfg
log_file=/var/opt/nagios/nagios.log
object_cache_file=/var/opt/nagios/objects.cache
precached_object_file=/var/opt/nagios/objects.precache
status_file=/var/opt/nagios/rw/nagios.cmd
lock_file=/var/opt/nagios/nagios.tmp
log_archive_path=/var/opt/nagios/archive
check_result_path=/var/opt/nagios/spool/retention.dat
state_retention_file=/var/opt/nagios/retention.dat
debug_file=/var/opt/nagios/nagios.debug

6. สร้าง Directories
$ mkdir -p /var/opt/nagios/rw
$ mkdir -p /var/opt/nagios/spool/checkresults
$ mkdir -p /var/opt/nagios/archives
$ chown -R nagios.nagios /var/opt/nagios
$ chown -R nagios.nagcmd /var/opt/nagios/rw
$ chmod 2775 /var/opt/nagios/rw

7. Apache Security
$ htpasswd2 -c /opt/nagios/etc/htpasswd.users sysadmin
Password : Your_password

8. Apache และ Nagios Startup
$ rcapache2 restart
$ /etc/rc.d/init.d/nagios start

9. Automatic Startup at systerm boot time
$ insserv nagios

10. ทดสอบการทำงาน
URL: http://<IP Address Server>/nagios
Username is "sysadmin"
Password is "Your_password"

Nagios Error: Could not open command file '/var/nagios/rw/nagios.cmd' for update!

Solution: change group from "nagios" to "www"
$ id nagios
$ cd /opt/nagios/var/rw/
$ chgrp www nagios.cmd


Adding remote Linux/Unix hosts:
ตัวอย่าง (Defalut)
$ vi /opt/nagios/etc/objects/localhost.cfg

##Added by Sontaya
define host {
use                     linux-server
host_name               hostname
alias                   hostname.mydomain
address                 Public IP Address / Private IP Address
}

define service {
use                     local-service
host_name               hostname
service_description     PING
check_command           check_ping!100.0,20%!500.0,60%
}

define service {
use                     local-service
host_name               hostname
service_description     SSH
check_command           check_ssh
}

# Define a service to check the disk space of the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use                             local-service
host_name                       hostname
service_description             Root Partition
check_command                   check_local_disk!20%!10%!/
}

# Define a service to check the number of currently logged in
# users on the local machine.  Warning if > 10 users, critical
# if > 20 users.

define service{
use                             local-service
host_name                       hostname
service_description             Current Users
check_command                   check_local_users!15!20
}

define service{
use                             local-service
host_name                       hostname
service_description             Total Processes
check_command                   check_local_procs!250!400!RSZDT
}

define service{
use                             local-service
host_name                       hostname
service_description             Current Load
check_command                   check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}

# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
use                             local-service         ; Name of service template to use
host_name                       hostname
service_description             Swap Usage
check_command                   check_local_swap!20!10
}


Adding email contacts:

$ vi /opt/nagios/etc/objects/contacts.cfg
email                     nagios@hostname, admin@mydomain.com

Restart nagios:
$ /etc/rc.d/init.d/nagios restart

Ref: 
http://nagios.sourceforge.net/docs/3_0/plugins.html
http://www.novell.com/coolsolutions/feature/16723.html
http://code.google.com/p/onebusaway/wiki/NagiosConfiguration#Nagios_User_and_Group

ติดตั้ง NRPE บนเครื่อง Remote Hosts  (Client): Daemon and plugin for executing plugins on remote hosts

โดยปกติแล้วการ monitor จะเป็นการรันผ่าน plugins ที่ Nagios Server ไปยังเครื่องที่ต้องการจะ monitor โดยส่วนมากเป็นการส่ง message
ไปและ response กลับมา เช่น plugins check_ping , check_http แต่การ monitor บางอย่างไม่สามารถใช้วิธีนี้ได้
เช่น check_load , check_disk เป็นต้น ซึ่ง plugins เหล่านี้สามารถทำงานได้ในเครื่อง local เท่านั้น.

ไฟล์ที่สำคัญ:
check_nrpe คือ Plugin ที่ใช้ในการจัดการ nrpe บนเครื่อง remote host.
nrpe       คือ Agent ที่รันบนเครื่อง remote host และใช้ในการติดต่อกับ plugin.
nrpe.cfg   คือ ไฟล์คอนฟิกของเครื่อง remote host

1. สร้างบัญชีผู้ใช้/กลุ่มผู้ใช้


$ useradd nagios
$ passwd nagios
กำหนดรหัสผ่านเป็น "nagios"
$ groupadd nagios

2. ดาวห์โหลด Nagios Plugins:

$ mkdir -p /opt/nagios/
ดาวห์โหลดไฟล์ และบันทึกลงไว้ที่ "/opt/nagios/" ดาวห์โหลดจาก
http://www.nagios.org/download/download.php
(nagios-plugins-1.4.13.tar.gz)

$ tar zxvf nagios-plugins-1.4.13.tar.gz

3. ติดตั้ง Nagios Plugins

*** ตรวจสอบ openssl-devel ว่าติดตั้งยัง ถ้ายังให้ทำการติดตั้งจาก YaST ก่อน เพราะ plugin สนับสนุน ssl.

$ cd nagios-plugins-1.4.13
$ ./configure --with-nagios-user=nagios --with-nagios-group=nagios
$ make
$ make install

4. กำหนด permissions โฟร์เดอร์ plugin:

$ chown nagios.nagios /usr/local/nagios
$ chown -R nagios.nagios /usr/local/nagios/libexec

5. ติดตั้ง NPRE Daemon

ดาวห์โหลดไฟล์ และบันทึกลงไว้ที่ "/opt/nagios/" ดาวห์โหลดจาก
http://www.nagios.org/download/download.php
(nrpe-2.12.tar.gz)

$ tar zxvf nrpe-2.12.tar.gz
$ cd nrpe-2.12
$ ./configure
$ make all
$ make install-plugin
$ make install-daemon
$ make install-daemon-config
$ make install-xinetd

6. คอนฟิกพอร์ต NRPE 

$ vi /etc/xinetd.d/nrpe

เพิ่ม IP Address Nagios ในบรรทัด
only_from       = 127.0.0.1 192.168.1.13

จากนั้นบันทึกไฟล์

$ vi /etc/services

ทำการเพิ่มพอร์ต 5666 เข้าในไฟล์ services
nrpe            5666/tcp        # NRPE

รีสตาร์ Xinetd
$ rcxinetd restart

ตรวจสอบ NRPE daemon

$ netstat -at | grep nrpe
tcp        0      0 *:nrpe                  *:*                     LISTEN

ทดสอบเวอร์ชั่นของ NRPE
$ /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12

7. คอนฟิก Firewall (iptables)

ทำการเปิดพอร์ต 5666
$ iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
ตรวจสอบ
$ netstat -ntlp | grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN      -

Tip:
$ vi /etc/sysconfig/scripts/SuSEfirewall2-custom
#NRPE
iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT

$ rcSuSEfirewall2 reload

8. แก้ไขไฟล์ nrpe.cfg

$ vi /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

*** สามารถสร้าง command ได้ 2 แบบ คือ กำหนดค่า argument กับไม่กำหนดค่า argument
(ในตัวอย่างนี้ใช้แบบกำหนดค่า argument คือส่งค่า threshold หรือรับค่า argument มาจากเครื่อง Nagios Server)

===================================================================================
คอนฟิก Nagios Server
===================================================================================
1. ทดสอบ Telnet 

ทดสอบ telnet เข้าเครื่อง Remote host (Client)

$ telnet 192.168.11.3 5666
Trying 192.168.11.3...
Connected to 192.168.11.3.
Escape character is '^]'.
^]
telnet> quit
Connection closed.


2. คอนฟิกไฟล์ commands.conf


$ /opt/nagios/etc/objects/commands.cfg

# NRPE CHECK COMMAND
# Command to use NRPE to check remote host systems
#
###############################################################################
#
define command{
command_name   check_nrpe
command_line   $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

*** เราสามารถสร้าง command ไว้ 2 ตัวได้ แยกส่วนเวลาเรียกใช้งานแบบส่งและไม่ส่ง argument.


3. คอนฟิกไฟล์ host-linux.cfg

รูปแบบการกำหนดค่า check_nrpe!command plugins!argument threshold1 arg2 arg3
check_nrpe!check_procs!5 !10 !Z

Tip: ระหว่าง argument ให้เว้นวรรค

$ /opt/nagios/etc/objects/host-linux.cfg

#----------------------------------------------------------------------------------------------------------------
#192.168.11.3
#----------------------------------------------------------------------------------------------------------------

define host {
use                     linux-server
host_name               bclinux3
alias                   bclinux3.mydomain
address                 192.168.11.3
hostgroups              linux-servers
}

#CRITICAL if the round trip average (RTA) is greater than 600 milliseconds
#or the packet loss is 60% or more
#WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more
#OK if the RTA is less than 600 ms and the packet loss is less than 20%
define service {
use                     generic-service
host_name               bclinux3
service_description     PING
check_command           check_ping!200.0,20%!600.0,60%
}

# Define a service to check the disk space of the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use                 generic-service
host_name           bclinux3
service_description Free Root Partition
check_command       check_nrpe!check_disk -w 20% -c 10% -p /dev/sda2
}

# Define a service to check the number of currently logged in
# users on the local machine.  Warning if > 5 users, critical
# if > 10 users.

define service{
use                             generic-service
host_name                       bclinux3
service_description             Current Users
check_command                   check_nrpe!check_users -w 5 -c 10
}

define service{
use                             generic-service
host_name                       bclinux3
service_description             Total Processes
check_command                   check_nrpe!check_total_procs
}

define service{
use                             generic-service
host_name                       bclinux3
service_description             Current Load
check_command                   check_nrpe!check_load -w 15,10,5 -c 30,25,20
}

define service{
use                           generic-service
host_name                     bclinux3
service_description           Zombie Processes
check_command                 check_nrpe!check_zombie_procs
}

บันทึกไฟล์

Tip:
Usage:check_users -w <users> -c <users>
-w, --warning=INTEGER
Set WARNING status if more than INTEGER users are logged in
-c, --critical=INTEGER
Set CRITICAL status if more than INTEGER users are logged in

ตรวจไฟล์คอนฟิก:
$ /opt/nagios/bin/nagios -v /opt/nagios/etc/nagios.cfg

ถ้าไม่มี error ให้รีสตาร์ Nagios

Restart Nagios:
$ /etc/rc.d/init.d/nagios reload

=========================================================================
Note: Object configuration files:
=========================================================================

Timeperiods:
$ vi /opt/nagios/etc/objects/timeperiods.cfg

Contacts/Contacts groups:
$ vi /opt/nagios/etc/objects/contacts.cfg
#Adding email contacts:
email                     nagios@hostname, sontaya@mydomain.com

Adding remote Linux/Unix hosts:
$ vi /opt/nagios/etc/objects/host-linux.cfg

Templates Services: (CONTACT, HOST, SERVICE)
$ vi /opt/nagios/etc/objects/templates.cfg

COMMANDS
:
$ vi /opt/nagios/etc/objects/commands.cfg


Restart nagios:
$ /etc/rc.d/init.d/nagios restart

Path plugins:

$ vi /opt/nagios/etc/resource.cfg
$USER1$=/opt/nagios/libexec

Path NRPE config
$ vi /usr/local/nagios/etc/nrpe.cfg

Usage:check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
check_command           check_ping!100.0,20%!500.0,60%

===========================================================================
Tip: How to check a host, that for security reasons has ping disabled
===========================================================================

1. Copy "check_nrpe" file to path plugin.
$ cp /usr/local/nagios/libexec/check_nrpe /opt/nagios/libexec

2. Define the service and attributes within the default services file.
(place check_nrpe! in front of the check-host-alive)

$ vi /opt/nagios/etc/objects/templates.cfg
## SERVICE TEMPLATES
check_command                   check_nrpe!check-host-alive

4. Add the command to every client’s nrpe.cfg file
$ vi /usr/local/nagios/etc/nrpe.cfg

3. Reload Nagios (Finished)
$ /etc/rc.d/init.d/nagios reload

===========================================================================
Error messages:
===========================================================================
Could not open command file '/var/nagios/rw/nagios.cmd' for update!

Solution: change group "nagios" to "www"
$ id nagios
$ cd /opt/nagios/var/rw/
$ chgrp www nagios.cmd


============================================================================
Tip & Install NagiosGrapher
============================================================================




To see which pre-requisites are installed:

    install.pl --check-prereq

To install pre-requisites:

  Debian/Ubuntu
    sudo apt-get install libcgi-pm-perl librrds-perl libgd-gd2-perl
  Redhat/Fedora/CentOS
    sudo yum install perl-rrdtool perl-GD


Easy Install
------------

    install.pl

To see a list of options:

    install.pl --help


Recipe for Manual Installation
------------------------------

These instructions assume an overlay layout, with nagios at /usr/local/nagios

 - Extract nagiosgraph into a temporary location:
     cd /tmp
     tar xzvf nagiosgraph-x.y.z.tgz

 - Copy the contents of etc into your preferred configuration location:
     mkdir /etc/nagiosgraph
     cp etc/* /etc/nagiosgraph

 - Edit the perl scripts in the cgi and lib directories, modifying the
   "use lib" line to point to the directory from the previous step.
     vi cgi/*.cgi lib/insert.pl

 - Copy insert.pl to a location from which it can be executed:
     cp lib/insert.pl /usr/local/nagios/libexec

 - Copy CGI scripts to a script directory served by the web server:
     cp cgi/*.cgi /usr/local/nagios/sbin

 - Copy CSS and JavaScript files to a directory served by the web server:
     cp share/nagiosgraph.css /usr/local/nagios/share
     cp share/nagiosgraph.js /usr/local/nagios/share

 - Edit /etc/nagiosgraph/nagiosgraph.conf.  Set at least the following:
     logfile           = /var/log/nagiosgraph.log
     cgilogfile        = /var/log/nagiosgraph-cgi.log
     perflog           = /var/nagios/perfdata.log
     rrddir            = /var/nagios/rrd
     mapfile           = /etc/nagiosgraph/map
     nagiosgraphcgiurl = /nagios/cgi-bin
     javascript        = /nagios/nagiosgraph.js
     stylesheet        = /nagios/nagiosgraph.css

 - Set permissions of "rrddir" (as defined in nagiosgraph.conf) so that
   the *nagios* user can write to it and the *www* user can read it:
     mkdir /var/nagios/rrd
     chown nagios /var/nagios/rrd
     chmod 755 /var/nagios/rrd

 - Set permissions of "logfile" so that the *nagios* user can write to it:
     touch /var/log/nagiosgraph.log
     chown nagios /var/log/nagiosgraph.log
     chmod 644 /var/log/nagiosgraph.log

 - Set permissions of "cgilogfile" so that the *www* user can write to it:
     touch /var/log/nagiosgraph-cgi.log
     chown www /var/log/nagiosgraph-cgi.log
     chmod 644 /var/log/nagiosgraph-cgi.log

 - Ensure that the *nagios* user can create and delete perfdata files:
     chown nagios /var/nagios
     chmod 755 /var/nagios

 - In the Nagios configuration file (nagios.cfg) add this:

     process_performance_data=1
     service_perfdata_file=/var/nagios/perfdata.log
     service_perfdata_file_template=$LASTSERVICECHECK$||$HOSTNAME$||$SERVICEDESC$||$SERVICEOUTPUT$||$SERVICEPERFDATA$
     service_perfdata_file_mode=a
     service_perfdata_file_processing_interval=30
     service_perfdata_file_processing_command=process-service-perfdata

 - In the Nagios commands file (commands.cfg) add this:

     define command {
       command_name  process-service-perfdata
       command_line  /usr/local/nagios/libexec/insert.pl
     }

 - Check the nagios configuration

     /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

 - Restart nagios

     /etc/init.d/nagios restart

 - Verify that nagiosgraph is working by running showconfig.cgi

     http://server/nagios/cgi-bin/showconfig.cgi

 - Try graphing some data by running show.cgi

     http://server/nagios/cgi-bin/show.cgi

 - In the Nagios configuration, add a template for graphed services:

     define service {
       name graphed-service
       action_url /nagiosgraph/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$' onMouseOver='showGraphPopup(this)' onMouseOut='hideGraphPopup()' rel='/nagiosgraph/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&period=week&rrdopts=-w+450+-j
       register 0
     }

 - Enable graph links for services by appending the graphed-service to existing
   service definitions in the Nagios configuration:

     define service {
       use local-service,graphed-service
       ...
     }

 - Replace the Nagios action icon with the nagiosgraph graph icon:
     mv /usr/local/nagios/share/images/action.gif /usr/local/nagios/share/images/action.gif-orig
     cp share/graph.gif /usr/local/nagios/share/images/action.gif

 - In the nagiosgraph SSI file, set the URL for nagiosgraph.js:
     vi share/nagiosgraph.ssi
     src="/nagiosgraph/nagiosgraph.js"   ->    src="/nagios/nagiosgraph.js"

 - Install the nagiosgraph SSI file:
     cp share/nagiosgraph.ssi /usr/local/nagios/share/ssi/common-header.ssi

 - Add links to graphs in the Nagios sidebar (side.php or side.html):

<ul>
<li><a href="/nagios/cgi-bin/show.cgi" target="main">Graphs</a></li>
<li><a href="/nagios/cgi-bin/showhost.cgi" target="main">Graphs by Host</a></li>
<li><a href="/nagios/cgi-bin/showservice.cgi" target="main">Graphs by Service</a></li>
<li><a href="/nagios/cgi-bin/showgroup.cgi" target="main">Graphs by Group</a></li>
</ul>

 - Check the nagios configuration

     /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

 - Restart nagios

     /etc/init.d/nagios restart

สร้าง template host


# This is my template
define host{
          name                    hostTemplate
 check_command           check-host-alive  
max_check_attempts      5
     contact_groups          admins
    notifi cation_interval    30
    notifi cation_period      24x7
    notifi cation_options     d,u,r
    register                0
}
# myHost is shorter now that it inherits from hostTemplate
define host{  
host_name              myHost  
alias                       My Favorite
Host    address      192.168.1.254  
parents                  myotherhost  
use                        hostTemplate
}

compile configure
$/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg