CRITICAL! NAGIOS xi SEGMENTATION FAULT

This support forum board is for support questions relating to Nagios xi, our flagship commercial network monitoring solution.
User avatar
arnab.roy
Posts: 354
Joined: Sat Apr 30, 2011 10:24 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by arnab.roy »

OK since i got very little help from your end ! I went ahead and did the following :

1. I upgraded nagios hoping it would reset in case of any permission issues- Nothing happened
2. That fact i couldnt see services for a particular host raised my suspicioun-went ahead and deleted it- Got deleted from CCM but not from Nagios xi
3. Restarted all services no luck whatsoever.
4. Rebooted the whole system and now that ghost host is gone, and the system is back to normal.

Althought I have solved the problem I am quiet sure the system got caught up in some kind of bug.
User avatar
arnab.roy
Posts: 354
Joined: Sat Apr 30, 2011 10:24 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by arnab.roy »

OK ...I now have a pattern adding that host back caused the system to crash again !!!
User avatar
arnab.roy
Posts: 354
Joined: Sat Apr 30, 2011 10:24 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by arnab.roy »

Its something in that host group adding any hosts to that group causes it to crash
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by nscott »

Is there a way for you to isolate the host that is causing the issue?

Also, the segmentation faults start at 14:00 which coincide. Can you manually connect to the psql database?

psql -U nagiosxi -W -d nagiosxi
password: n@gweb

It appears its having difficulty connecting, not sure what the cause of that is, so lets start there.
Nicholas Scott
Former Nagios employee
User avatar
arnab.roy
Posts: 354
Joined: Sat Apr 30, 2011 10:24 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by arnab.roy »

Hi All,

I have found a major bug in the system the root cause of this issue was a debug that I had enabled on one of my own plugin which was dumping out a table like this :

Code: Select all

WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....[.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....[.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....[.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....[.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....\.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....\.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....\.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....\.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..... ' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'.....!' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'.....(' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'.....)' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....#.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....#.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....#.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....#.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....$.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....$.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....$.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....$.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....&.' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....&.' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....&.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....&.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....)@' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....)A' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....)H' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....)I' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....,.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....,.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....,.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....,.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'......' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../@' = INTEGER: 24
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../A' = INTEGER: 24
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../H' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../I' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../.' = INTEGER: 36
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../.' = INTEGER: 48
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'..../.' = INTEGER: 48
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0`' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0a' = INTEGER: 30
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0h' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0i' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....0.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....1.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....1.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....1.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....1.' = INTEGER: 60
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....D.' = INTEGER: 42
WLSX-SWITCH-MIB::apSignalToNoiseRatio.'....D.' = INTEGER: 42

This resulted the system to crash once I turned the debug off and it stopped dumping this output into nagios it started to work normally. Food for thought for your developers :)
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by tonyyarusso »

There are two issues here. The first is that Nagios didn't intelligently handle the output of your plugin to limit it nicely. The second is that Red Hat / CentOS have a broken Apache that segfaults when there is a problem instead of nicely throwing an error. The latter is outside of our control. The former is the reason that by default Nagios has a limit on plugin output length so that a broken plugin such as yours can not damage the system. However, you choose to bypass that protection (see your previous thread on http://support.nagios.com/forum/viewtopic.php?t=2450), which is why you encountered this problem when your plugin went haywire.
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
User avatar
arnab.roy
Posts: 354
Joined: Sat Apr 30, 2011 10:24 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by arnab.roy »

Hi Tony,

I didn't make any changes to the default output lengths , I would like to highlight only the xi interface broke down not nagios / nagios core as everything was working fine when accessed via nagioscore

Many Thanks
Arnab
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: CRITICAL! NAGIOS xi SEGMENTATION FAULT

Post by mguthrie »

Just for our own future reference, can show us the output from running that check from the command-line with the debugging turned on?

Also, can you show us the exit code for that plugin after running it like that?

Code: Select all

echo $?
Its always good to know "why" things break : )