tech.gate.io blog

Sysadmin Weblog

Find:

after migration to xymon 4.3.3 Beta 2 from 4.2.0, our rrd files, built with SPLITNCV where not working anymore

splitncv separates rrd values to own files, making it easy to generate graphs for an unknown number of output values using regexp
for network interface performance monitoring for example, where the number of interfaces may differ from host to host

 

no updates on existing rrds, and no new ones where created

we spent a lot of time in searching for configuration problems, maybe leading to this problem

but then we found this in xymon mailing list:

http://www.hswn.dk/hobbiton/2009/07/msg00242.html

 
in short:

go to xymon-4.3.0-beta2 directory

change to hobbitd/rrd

and edit the source file do_ncv.c

change line 180:

if (split_ncv && (paridx > 1)) {

 
to

if (split_ncv && (paridx > 0)) {

 
recompile, exchange the new compiled hobbitd_rrd with the one in

server/bin

restart hobbit

should be working now

For Web designers, it's important to know which screen and browser windows sizes the Web site's visitors use because a new design should be optimised in these ways:
1. existing space should be utilised as good as possible
2. horizontal scrolling should be avoided because it's very annoying for users

To get an idea of your users' settings, you can use the AWStats log analysing tool. It's free and generates graphical access statistics for your Web site. It comes with a JavaScript (awstats_misc_tracker.js) that you can embed on your site and when a user visits the site, some properties of his/her browser are reported to the logs in a form like this:

www.gimpusers.com xxx.xxx.xxx.xxx - - [14/Feb/2010:06:38:20 +0100] "GET /js/awstats_misc_tracker.js?screen=1280x800&win=1263x616&cdi=24&java=true&shk=n&svg=n&fla=y&rp=n&mov=y&wma=y&pdf=y&uid=awsuser_id1266125900481r8004&sid=awssession_id1266125900481r8004 HTTP/1.1" 200 2676 "http://www.gimpusers.com/forums/gimp-docs/10734-print-version-of-Gimp-Manual-pdf.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" (35)

 
As you can see, this user had a 1280x800 screen resolution and the (useable) browser size was 1263x616. Sadly, the graphical report of AWStats is only able to report the screen resolutions but not the window sizes although the window sizes are more interesting (we don't expect our users to resize their browser window for our Web site).

To extract information about the window size probability distribution, we'll use R (http://www.r-project.org, or for Debian/Ubuntu, there's a package "r-recommended"), a free software environment for statistical computing and graphics. First, we have to prepare our log files and extract the AWStats lines:

awk -- '/&win=[[:digit:]]+x[[:digit:]]+/ { print $0; }' /var/log/apache2/access-gimpusers.log >raw

 
All lines containing &win=AAxBB will be printed to "raw".

Then we create two files with the window and screen widths:

awk -- '{ match($0, /\?screen=([[:digit:]]+)x[[:digit:]]+/, a); print a[1]; }' raw >screen
awk -- '{ match($0, /\&win=([[:digit:]]+)x[[:digit:]]+/, a); print a[1]; }' raw >win

 
Now we can start R and load the values:

$ R
> screen <- scan('screen')
Read 20914 items
> win <- scan('win')
Read 20914 items

 
Let's get a summary:

> summary(win)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    1027    1263    1258    1358    2545

 
So we can expect our uses to have a window width of about 1258 and 50% of our users have a browser width in the range 1,1263.

For a better view, we can plot a histogram:

> hist(win)

or the distribution function F:

> plot(ecdf(win),do.points=FALSE,verticals=TRUE)

or the density function:

> plot(density(win))

 
image

We can see peaks at the default resolutions.

If we want to know which resolution we can choose so that a part x of our users will have to scroll, we have to look
at the x-quantile of the win distribution:

> quantile(win)
  0%  25%  50%  75% 100% 
   1 1027 1263 1358 2545 
> quantile(win,.1)
 10% 
1000 
> quantile(win,.2)
 20% 
1007 
> quantile(win,.3)
 30% 
1127 
> quantile(win,.4)
 40% 
1255

 
So 10% will have to scroll horizontally if we choose a site width of 1000, 20% for 1007, 30% for 1127 and 40% for 1255. If we design
for 1024 px screen size, i.e. about 990 px Web site width, only 10% our users will have to scroll. However, if we design for 1127 px
site width, 30% of the users will have to scroll.

However, this can't be only aspect that is taken into consideration because maybe it's more important to pack content onto the page
or to optimise it for larger screens (and window sizes). This is up to you, but the statistical analysis gives you a hint.

At the end, we can determine the correlation (by Pearson) between window and browser sizes:

> cor(win,screen)
[1] 0.1900401

=> The linear (!) correlation factor between window and browser size is 19%, that means there is no good linear correlation (in
the form window size = A * browser size + B-). So I think that most people don't work in full-screen mode, because in full-screen mode,
window size = browser size + B (so it would be linear).

Problem:

My company needs to monitor servers, services, switches, UPS. The target of this task is to setup a monitoring system, which is able to check the devices and services and send alarms to several people. There should be a difference between critical and non-critical services and devices.

Preface:

Everything you do here, happens at your own risk!
I'm using FreeBSD 7.2 for this task, to be more precise a jailed instance of it. So you should be able to install FreeBSD and update it. Please update your Ports before starting to be sure that you have the newest version of Nagios. I'll describe in an other post how to update your system.

Tip: Back up, because you will break something!

 

Solution:

As you can see in the title I decided to use Nagios. You can find a lot of resources at:
http://www.nagios.org/
http://www.monitoringexchange.org/
http://nagios.manubulon.com/

Installing

Okay lets start our actual work! We'll use something to see the output of our work, so the standard Apache will do that for us.

>cd /usr/ports/www/apache22 && make install clean

Just use the standard setting for the Apache server, you don't need to change the package for this task.

Now we need Nagios:

>cd /usr/ports/net-mgmt/nagios && make install clean

Enable the embedded Perl package and hit okay, non of the x11 packages are needed, as long you don't use x11. When the installer asks you which packages should be compiled for php you have to check Apache, or the mod_php module won't be complied.

For the Nagiosplugins enable all, just a few megabytes of disk space are needed. FreeBSD will fetch them from sourceforge.

Okay, keep waiting a little bit, depends on the power of core/s, but the installer will ask you if you want to create a group "nagios". Answer Yes. After that you'll be ask to create a user called "nagios". Answer Yes. A few moments later the nagiosistaller is finished, and it gives you some advices we'll follow now.

Configuration

Fist of all, we'll edit the httpd.conf of the Apache. This is needed that the GUI of Nagios can be displayed (porperly).

>vi /usr/local/etc/apache22/httpd.conf

Check if the Phpmodul is implemented:

LoadModule php5_module        libexec/apache22/libphp5.so

 
To enable cgi, delete the # in front of the line, maybe you can add the .pl extension, if you want to run perlscripts

AddHandler cgi-script .cgi .pl

Now search for the section and add:

ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/
Alias /nagios/ /usr/local/www/nagios/

As I don't describe any security issue here, we'll make Nagios visible for all. If you want to restrict it, please read the Apache manual: http://httpd.apache.org/docs/2.0/howto/auth.html
With that in mind we add the lines for the static Nagios page:

Order deny,allow
     Allow from all
     php_flag engine on
     php_admin_value open_basedir /usr/local/www/nagios/:/var/spool/nagios/

and for the CGI-Application:

Options +ExecCGI

Okay now it should be possible to start the web server:

>apachectl start

If you working in a Jail the ps -ax doesn't work properly, so just type http://IP of your server/nagios] into the address line of your browser. The rest should be up to Apache and you should see something like this:

Image

 

If the web server doesn't start in the jail you maybe forgot to load a kernelmodul "accf_http". You can make sure if it's loaded using

>kldstat | grep accf

You should see something like:
5 1 0xc6c22000 2000 accf_http.ko

Kernelmoduls can't be loaded in a jail, you have the to that on the jailhost:

>kldload accf_http

Congratulation, you installed Nagios, but you cannot monitor anything now. You installed the static website, but now we have to get the Nagios service up.

 

Nagiosservice

Preparation:

So I'll try to give you a crash course using Nagios. But before starting it, please be sure you know what SNMP is and how to use it and how to snmpwalk.

You should find your config files under /usr/local/etc/nagios
Here are the nagios.cfg-sample, cgi.cfg-sample and the resource.cfg-sample. Copy and rename this files to another location, in my case the samplefolder. The name of the copied files should be nagios.cfg, cgi.cfg and so on.

>mkdir sample
>cp *.cfg-sample sample/
>mv nagios.cfg-sample nagios.cfg

or use my renamescript. Now you should have 3 files and 2 folders: sample and objects.

Lets go to the object folder and do exactly the same:

>cd objects
>mkdir sample
>cp *.cfg sample/
>mv commands.cfg-sample commands.cfg-sample

and so on...
In the end you should have a fileset like this:
commands.cfg
contacts.cfg
localhost.cfg
printer.cfg
sample
switch.cfg
templates.cfg
timeperiods.cfg

Actual work

Time of mindless copy paste is now over, you have to start thinking.
The hole thing with nagios is knowing inheritance. The Nagiosteam did a lot for you, so let's have a look. To be able to view all host and services edit the cgi.cfg and set the parameter

use_authentication=0

from 1 to Zero, or you'll get an error message. (But as the comments in this file say this is for a producing system a bad idea)

 
You can find the Nagios configuration under /usr/local/etc/nagios/nagios.cfg
Here you just need to define which object typs should be used. For example:

# You can specify individual object config files as shown below:
cfg_file=/usr/local/etc/nagios/objects/commands.cfg
cfg_file=/usr/local/etc/nagios/objects/contacts.cfg
cfg_file=/usr/local/etc/nagios/objects/timeperiods.cfg
cfg_file=/usr/local/etc/nagios/objects/templates.cfg

# Definitions for monitoring the local (FreeBSD) host
cfg_file=/usr/local/etc/nagios/objects/localhost.cfg

In these file the behavior of Nagios is defined. We'll add some of our object files a little bit later to monitor a windows server. But for now, use this file set. Let's see what these files for.

commands.cfg
The commands used by nagios are defined here, to check hosts and how to send mails

templates.cfg
Here start the inheritance. This file is very important, because the "skeleton" of the things to monitor are defined here.

contacts.cfg
If a host switches to a warning and critical stat somebody have to be contacted. These contacts are defined here.

printer.cfg
Ready to use script to monitor printers.

switch.cfg
Ready to use script to monitor printers.

timeperiods.cfg
Defines when a staff member should be alarmed

Let's monitor the localhost!

If you've done everything like i told you

>/usr/local/bin/nagios /usr/local/etc/nagios/nagios.cfg

will show something like:

Nagios Core 3.2.0
Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2009
License: GPL

Website: http://www.nagios.org
Nagios 3.2.0 starting... (PID=66618)
Local time is Wed Jan 27 12:05:58 UTC 2010

and you should be glad!

Have a look at you web server and click on host groups:
Image

Let's monitor a printer!

Start with the easy stuff.

>vi /usr/local/etc/nagios/nagios.cfg

and delete the hashmark

cfg_file=/usr/local/etc/nagios/objects/printer.cfg

Now you told Nagios to read printer.cfg on start up.

>cd usr/local/etc/nagios/objects/templats.cfg

Here you find the section about a generic host:

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1               ; Host notifications are enabled
        event_handler_enabled           1               ; Host event handler is enabled
        flap_detection_enabled          1               ; Flap detection is enabled
        failure_prediction_enabled      1               ; Failure prediction is enabled
        process_perf_data               1               ; Process performance data
        retain_status_information       1               ; Retain status information across program restarts
        retain_nonstatus_information    1               ; Retain non-status information across program restarts
        notification_period             24x7            ; Send host notifications at any time
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }

and some line below the printer definiton:

define host{
        name                    generic-printer ; The name of this host template
        use                     generic-host    ; Inherit default values from the generic-host template
        check_period            24x7            ; By default, printers are monitored round the clock
        check_interval          5               ; Actively check the printer every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      10              ; Check each printer 10 times (max)
        check_command           check-host-alive        ; Default command to check if printers are "alive"
        notification_period     workhours               ; Printers are only used during the workday
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r             ; Only send notifications for specific host states
        contact_groups          admins          ; Notifications get sent to the admins by default
        register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
        }

As you can see the generic-printer inheritanced from the generic-host. So if you make changes in generic-host, the genric-printer skeleton will change to! You can override attributes, just by setting the attribute one level depper. So if you add
retain_status_information 0
to the generic printer it will override the 1 inheritances from the generic host and so on.

I want to monitor a HP 3600n Laserjet, so I'll:

>vi /usr/local/etc/nagios/objects/printer.cfg

and add a new host:

define host{
        use             	generic-printer  
        host_name       	NikosPrinter
        alias           	HP3600n @ ITroom
        address         	100.100.100.101
        hostgroups      	network-printers
        notes_url       	http://100.100.100.66/wiki/index.php/Drucker
        action_url      	http://100.100.100.185
        }

The host get his standard setting from the generic-printer, which get his standard setting from generic-host.

Use: where you get the settings
Host_name: how the host is named in nagios
alias: more information in the webinterface
address: IP or FQDN(but prefer IP)
Hostgroup: use to group host, if you got mor printers
notes_url: a link to our internal wiki, where you got more informations
action_url: a link to the webinterface of the printer

 
Okay host is defined, now the services:

define service{
        use                     		generic-service       
        host_name          	NikosPrinter    
        service_description     	Printer Status          
        check_command           	check_hpjd!-C public    
        normal_check_interval  10      ; Check the service every 10 minutes under normal conditions
        retry_check_interval    	1       ; Re-check the service every minute until its final/hard state is determined
        notification_interval   	0
        }

service_description: Name of the service in the webinterface
normal_check_interval: Check the service every 10 minutes under normal conditions
retry_check_interval: Re-check the service every minute until its final/hard state is determined
notification_interval: How often you receive a mail, but I don't want to get spammed by printers so I think one mail I enogh

define service{
        use                     		generic-service
        hostgroup_name          	network-printers
        service_description     	PING
        check_command           	check_ping!3000.0,80%!5000.0,100%
        normal_check_interval   10
        retry_check_interval    	1
        notification_interval   	0
        }

Okay the same as above but here I use a hostgroup to ping instad of a hostname.

Lets have a look at the commands:
check_command check_hpjd!-C public

Arguments are separated by “!”
How to use standard markos can be found in the Nagios documentation, so I'll don't go any further with this.

and from commands.cfg

define command{
        command_name    check_hpjd
        command_line    $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
        }

 
Restart nagios:

>ps -ax | grep nagios
>kill [pid]
>/usr/local/bin/nagios /usr/local/etc/nagios/nagios.cfg

This should result in:

Image

Installed Nagios but Nagios was quit voiceless, so I want to send notifications via sendmail. Sendmail is installed by default with Freebsd and as I don’t wanted to build a hole mail server from scratch I want to use our MS Exchange server.

First of all check the DNS Setting of your Domain and make sure your FreeBSD machine got a DNS (A) Record.

To make sure it works you should get this response:

>hostname
nagios.domain.at
>host nagios.domain.at
nagios.doman.at has address 100.100.111.111

 
If you don’t get this, have a look at DNS table.

Now allow SMTP connection from you BSD host to the exchange server as seen in the screener below .

smtp accesse

After that you have to tell the Freebsdhost to use an alternative agent

> vi /etc/mail/freebsd.submit.mc

change the line

FEATURE(`msp', `[127.0.0.1]')dnl

To

FEATURE(`msp', `[Ip of the Exchange server]')dnl

Or use instead of IP the FQDN (not tested by me)

Now install the submit.cf

>cd /etc/mail && make install-submit-cf

 
If everything worked fine, you should see an output like this:

cp freebsd.submit.mc [hostname].[domain].submit.mc
/usr/bin/m4 -D_CF_DIR_=/usr/share/sendmail/cf/   /usr/share/sendmail/cf/m4/cf.m4 [hostname].[domain].submit.mc > [hostname].[domain].submit.cf
install -m 444 [hostname].[domain].submit.cf /etc/mail/submit.cf

 
Now test the mailsending via with ‘mail’

>mail –v [exchangeuser@domain]

 
Now you should see a lot of lines starting with

[exchangeuser@domain]... Connecting to [100.100.111.111] via relay...
220 EXCHANGE.doman.dc Microsoft ESMTP MAIL Service, Version: 6.0.3790.3959 ready

Now check your Mail client, there should be a mail sent from your logged in user

since we moved our nfs-servers from standalone AIX to netapp, we need a separate monitoring

I wrote this script, it runs on the xymon server, and appears as a server test

you need to change the (v)filers etc/host.equiv file, to allow rsh access from your xymon server
If that's a problem, this script may run from any system, where the bb client binary is available for.

alot of the code ist just for nice xymon formatting, it can easily be optimized for performance, without using temp files

#!/usr/bin/ksh                                                        
#Daffi 2009                                                           
#Script reads quota report from a netapp vfiler over rsh              
#change vfilers host.equiv, and add the ipaddress from the rsh client, xymon server in most cases


#for testing purpose, should be set in your xymon environment
#BBHOME="/home/hobbit/client"
#BB="${BBHOME}/bin/bb"
#BBDISP=xxx.xxx.xxx.xxx
#MACHINE=xxx

TESTNAME="netappquota"
FILERNAME="xxx"
#critial and fatal severity limit, critcal=yellow, fatal=red
CLIMIT="70"
FLIMIT="80"

#don't change these values, default status
STAT="green"
STATUSTEXT="all Qtree Quotas OK"


#remove files from the last test
find $BBHOME/tmp -type f -name "netapp*.txt" -exec rm {} \;


rsh ${FILERNAME} "quota report" | awk '/vol/{print $4,$5,$6,$9}' | while read QTREE USED QUOTA RPATH
        do

                PERC=$(echo "scale=3;(${USED} * 100) / $QUOTA" | bc)
                stf=$(echo "${PERC} >= ${FLIMIT}" | bc)
                stc=$(echo "${PERC} >= ${CLIMIT}" | bc)
                if [ ${stf} -eq 1 ]
                then
                                echo "&red ${RPATH} ${QTREE} ${PERC}% ${USED} ${QUOTA}" >> $BBHOME/tmp/netappquotared.txt
                elif [ ${stc} -eq 1 ]
                        then
                                echo "&yellow ${RPATH} ${QTREE} ${PERC}% ${USED} ${QUOTA}" >> $BBHOME/tmp/netappquotayellow.txt
                else
                                echo "&green ${RPATH} ${QTREE} ${PERC}% ${USED} ${QUOTA}" >> $BBHOME/tmp/netappquotagreen.txt
                fi
        done

if [ -f $BBHOME/tmp/netappquotayellow.txt ] ; then STAT=yellow ; STATUSTEXT="One or more Quotas over defined CRITICAL level (${CLIMIT}%)" ; fi
if [ -f $BBHOME/tmp/netappquotared.txt ] ; then STAT=red ; STATUSTEXT="One or more Quotas over defined FATAL level (${FLIMIT}%)" ; fi

${BB} ${BBDISP} "status ${MACHINE}.${TESTNAME} ${STAT} ${STATUSTEXT}

$(echo "Path Qtree %used used[MB] quota[MB]" | awk '{printf ("%-43s" "%-30s" "%-15s" "%-15s" "%-15s\n",$1,$2,$3,$4,$5,$6)}')

$([ -f $BBHOME/tmp/netappquotared.txt ] && cat $BBHOME/tmp/netappquotared.txt | sort -rnk 4 | awk '{printf ("%-5s" "%-40s" "%-30s" "%-15s" "%-15.2f" "%-15.2f\n",$1,$2,$3,$4,$5/1024,$6/1024)}' && echo " ")
$([ -f $BBHOME/tmp/netappquotayellow.txt ] && cat $BBHOME/tmp/netappquotayellow.txt | sort -rnk 4 | awk '{printf ("%-8s" "%-40s" "%-30s" "%-15s" "%-15.2f" "%-15.2f\n",$1,$2,$3,$4,$5/1024,$6/1024)}' && echo " ")
$([ -f $BBHOME/tmp/netappquotagreen.txt ] && cat $BBHOME/tmp/netappquotagreen.txt | sort -rnk 4 | awk '{printf ("%-7s" "%-40s" "%-30s" "%-15s" "%-15.2f" "%-15.2f\n",$1,$2,$3,$4,$5/1024,$6/1024)}')
"

Image

Page: 2/7 Last Page
1 2 3 4 5 6 7

Last blog post comments

  1. xymon netapp vfiler quota monitoring: rvksjfq sjmgafs 01:26 CEST
  2. Use Windows as router / NAT traffic: vnvrfihp muygzxbi Sat 19 of May, 2012 16:18 CEST
  3. mount nfs v3 share under Solaris 10: kdvwtux xqowacnm Sat 19 of May, 2012 04:21 CEST
  4. Unix Color Terminal: cbeusw ejbpsnr Sat 19 of May, 2012 01:29 CEST
  5. AIX: Get PVID directly from hdisk using od: buy penis enlargement penis_enlargement Fri 18 of May, 2012 15:14 CEST
  6. AIX: Get PVID directly from hdisk using od: risperdal risperdal Fri 18 of May, 2012 13:00 CEST
  7. AIX: Get PVID directly from hdisk using od: cheap phen375 phen375 Wed 16 of May, 2012 18:30 CEST
  8. AIX: Get PVID directly from hdisk using od: Good info Pharmk935 Mon 14 of May, 2012 23:06 CEST
  9. Unix Color Terminal: Good info Pharmd119 Sun 13 of May, 2012 15:49 CEST
  10. Test network throughput without disk or cpu distortion: Good info Pharmd585 Sun 13 of May, 2012 06:01 CEST

Feeds List