I want to crate a dashboard in NLS that is able to graph incoming logs as well as the perfdata recorded by nagiosxi.
I have already implemented grafana into my environment, but the current version of grafana is incompatible with the version of the elk stack that powers NLS.
By allowing NLS to graph perfdata, I get the best visibility into my environment for my end users.
It seems that my options would be to either
1. have nagios via the script that send syslogs into NLS also send the perfdata to NLS.
2. Throu some api magig have NLS read the perfdata directly from nagios.
3. Something else
Is what I am thinking possible and if so how can i make it happen?
NLS and xi perfdata
-
- Posts: 1264
- Joined: Tue Apr 12, 2011 12:29 pm
NLS and xi perfdata
Proudly running:
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: NLS and xi perfdata
There isn't a builtin easy way to do this but is on our internal todo list to be able to send check results and performance data to Log Server.
It really needs to be added to the perfdata processing subsystem in a component to be done correctly, I see no other way.
I suppose it could be possible to use the http_poller logstash input
https://www.elastic.co/guide/en/logstas ... oller.html
then poll these API's
objects/hoststatus
objects/servicestatus
but you would need to create a good grok filter that can parse the performance data field properly which would be a challenge
It really needs to be added to the perfdata processing subsystem in a component to be done correctly, I see no other way.
I suppose it could be possible to use the http_poller logstash input
https://www.elastic.co/guide/en/logstas ... oller.html
then poll these API's
objects/hoststatus
objects/servicestatus
but you would need to create a good grok filter that can parse the performance data field properly which would be a challenge
-
- Posts: 3739
- Joined: Thu May 05, 2016 3:54 pm
Re: NLS and xi perfdata
A neat article:
https://www.elastic.co/blog/integrating ... h-logstash
Unfortunately:
https://www.elastic.co/blog/integrating ... h-logstash
Unfortunately:
Note that this input plugin requires Logstash 6.2.3 at minimum.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: NLS and xi perfdata
Well ya it is kind of neat, but who would want to manage all of there checks that way, the configuration management to configure nagios checks would be a nightmare, it even mentions it in the articlemcapra wrote:A neat article:
https://www.elastic.co/blog/integrating ... h-logstash
Unfortunately:Note that this input plugin requires Logstash 6.2.3 at minimum.
This use case begs the question, what if I want to programatically add thousands or more checks into Logstash?
-
- Posts: 1264
- Joined: Tue Apr 12, 2011 12:29 pm
Re: NLS and xi perfdata
for the following reason:
Users start to report that they cant work remotely , after connecting to the network the via the vpn the 'internet" is slooooooowwwww.
Network admins are perplexed the Datacenter is not showing anything being down in nagios.
now the network is so slow no one can even login.
Verizon is called , but verizon is saying everything is ok on their end. Coffee is drunk, pills are popped (headache meds we hope) phones are called.
The big wallets never sprung for the development cycles for NNA so thats out.
Benhank says waitaminute! and cracks open a ice cold 12 pack of NLS to track syslogs for the network gear being affected. Because he has extensive training in creating queries and regex technomagigal miracles,(but in real life he hasnt and will be making a post in the future about it to you guys heh heh ) he quickly creates a dashboad that shows the throughput for the slow flaccid drooping network devices and BAM! there it is! All golden and shiny and sparking: a dashboard that clearly shows when the drop in throughput occurred extrapolated from nagios perfdata and graph of the syslog errors on said devices at the time of the slowdown.
Presenting the info to the quivering redeyed big wallet boys in charge, he helps them make their case to verizon, or on a darker side HR aka Heads will Roll ,pointing out what happened and who responsible for the resolution to the problem..
it's boils down to "it's better to have it and not need it than to need it, really badly, right now idontknowwhatimgonnatellmyboss and not have it.
Users start to report that they cant work remotely , after connecting to the network the via the vpn the 'internet" is slooooooowwwww.
Network admins are perplexed the Datacenter is not showing anything being down in nagios.
now the network is so slow no one can even login.
Verizon is called , but verizon is saying everything is ok on their end. Coffee is drunk, pills are popped (headache meds we hope) phones are called.
The big wallets never sprung for the development cycles for NNA so thats out.
Benhank says waitaminute! and cracks open a ice cold 12 pack of NLS to track syslogs for the network gear being affected. Because he has extensive training in creating queries and regex technomagigal miracles,(but in real life he hasnt and will be making a post in the future about it to you guys heh heh ) he quickly creates a dashboad that shows the throughput for the slow flaccid drooping network devices and BAM! there it is! All golden and shiny and sparking: a dashboard that clearly shows when the drop in throughput occurred extrapolated from nagios perfdata and graph of the syslog errors on said devices at the time of the slowdown.
Presenting the info to the quivering redeyed big wallet boys in charge, he helps them make their case to verizon, or on a darker side HR aka Heads will Roll ,pointing out what happened and who responsible for the resolution to the problem..
it's boils down to "it's better to have it and not need it than to need it, really badly, right now idontknowwhatimgonnatellmyboss and not have it.
Proudly running:
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: NLS and xi perfdata
lol
yep
yep
-
- Posts: 1264
- Joined: Tue Apr 12, 2011 12:29 pm
Re: NLS and xi perfdata
So fellas can this be done with the current version of NLS? =D
Proudly running:
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
-
- Posts: 3739
- Joined: Thu May 05, 2016 3:54 pm
Re: NLS and xi perfdata
You could maybe rig up something like a Logstash HTTP poller to hit the various Nagios xi API perfdata endpoints, but I haven't looked at the API docs in a while. What you'd really want is an API endpoint that exposes the "last measurement" for each of your checks. I'd think scraping that every 5/10/30 minutes would be valuable by itself. Combine it with a simple filter to break up each xi check into an individual message.
Here's a custom API endpoint I rigged up a while ago:
https://support.nagios.com/forum/viewto ... 93#p214393
All of that still sounds reasonably expensive (in time and performance) to me though. The most performant solution I can think of would be some sort of custom perfdata handler which is unlikely to cooperate well with upgrades.
Depending on the database chosen for the perfdata rework scheduled for xi 6, this could all become much easier in a year or so.
It's totally absolutely not applicable to this particular situation, but if you already had your check definitions held by Puppet/Ansible/Chef/etc achieving parity between the two systems doesn't sound *toooo* terrible?
Here's a custom API endpoint I rigged up a while ago:
https://support.nagios.com/forum/viewto ... 93#p214393
All of that still sounds reasonably expensive (in time and performance) to me though. The most performant solution I can think of would be some sort of custom perfdata handler which is unlikely to cooperate well with upgrades.
Depending on the database chosen for the perfdata rework scheduled for xi 6, this could all become much easier in a year or so.
¯\_(ツ)_/¯scottwilkerson wrote:who would want to manage all of there checks that way
It's totally absolutely not applicable to this particular situation, but if you already had your check definitions held by Puppet/Ansible/Chef/etc achieving parity between the two systems doesn't sound *toooo* terrible?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
Re: NLS and xi perfdata
This would be possible but you would still need to figure out some type of grok to extract the perfdata
-
- Posts: 1264
- Joined: Tue Apr 12, 2011 12:29 pm
Re: NLS and xi perfdata
That's what I thought as well. i wish there was a .rrd plugin or something but I cant find it. Lord knows I don't wanna bring graphite into my environment...
you can lock this for now, but if there was anyway to make this a feature request or something ....
you can lock this for now, but if there was anyway to make this a feature request or something ....
Proudly running:
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Nagiosxi 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion