Re: [Nagios-devel] Test Please: Buffer Slots Variable CVS Code

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Guest

Re: [Nagios-devel] Test Please: Buffer Slots Variable CVS Code

Post by Guest »


--Apple-Mail-64--495123578
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=UTF-8;
delsp=yes;
format=flowed


On 22 Dec 2006, at 01:50, Ethan Galstad wrote:

> Based on the recent thread about hanging Nagios processes, I have
> removed the COMMAND_BUFFER_SLOTS and SERVICE_BUFFER_SLOTS definitions
> out to config file variables:
>
> external_command_buffer_slots=3D4096
> check_result_buffer_slots=3D4096
>
> I have also updated nagiostats to report the avail/used number of =20
> slots
> for graphing in MRTG. Could folks try out the latest 2.x CVS code and
> give it some testing?

Ethan,

Thanks for applying to CVS. Several comments:

- external_command_buffer_slots and check_result_buffer_slots only =20
needs to be an int as the circular_buffer struct only uses an int for =20=

items

- in xsddefault.c, when you print out external_command_buffer.items, =20
I think this is not thread-safe. My thread knowledge is pretty =20
limited, so please correct me if I am wrong. The main nagios process =20
writes the status data via xsddefault_save_status_data, which needs =20
to read the external_command_buffer variable. However, this variable =20
is written to by the command_file_worker_thread. So I think the =20
xsddefault_save_status_data routine needs a thread lock on =20
external_command_buffers before it can read the items data, otherwise =20=

there is the potential for corrupt data. Note, there is a cost to =20
that, especially if the status data is being written with =20
aggregate_status_updates =3D 0.

- your output to status.dat is different from mine. You are =20
outputting max_external_command_buffer_slots (the value defined in =20
nagios.cfg) and used_external_command_buffer_slots (the current =20
number of items in the buffer). In my patch, I had a different =20
definition: max_command_buffer_items meant the "maximum number of =20
items that has been in the buffer".

(I would prefer used_external_command_buffer_slots be changed to =20
current_external_command_buffer_slots because it more accurately =20
describes "this is the number I have now".)

=46rom now on, I'll call it high_external_command_buffer_items, as it =20=

can also be the "high water mark of the number of items in the =20
buffer". This is a useful statistic as it tells you what the =20
max_external_command_buffer_slots should be to get no holdups.

Also, it probably makes sense to put the high water mark within the =20
circular_buffer struct.

Please find a patch attached with these changes.

On my small test system, the used_check_result_buffer_slots is =20
usually 0. When I introduce 1 fake slave (128 results per 10 =20
seconds), used_check_result_buffer fluctuates from 0 to 20s to 30s. =20
Introducing a 2nd fake slave, the high mark moves up to 100s. A 3rd =20
slave moves the high mark to 192.

If I introduce NDO into the system, I get a large iowait time (in the =20=

80%s), presumably database writes. The status file is not updated as =20
regularly (one instance of 60 seconds between writes), but when it =20
does, then the high_* values jump up to the 200-300s. This is a =20
poorly configured database, so I'm guessing that there are delays due =20=

to the main nagios process passing data to the the broker module.

At the moment with 2 slaves sending 128 packets per 10 seconds, I'm =20
getting high values of 983 for external commands and 1405 for check =20
results.

I think these recent changes help with seeing if there are =20
bottlenecks at the reading of the command pipe, but I think there are =20=

possibly other slow downs further down the chain (which Nagios 3 may =20
aid with).

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon

=EF=BF=BC


--Apple-Mail-64--495123578
Content-Type: multipart/mixed;
boundary=Apple-Mail-65--495123577


--Apple-Mail-65--495123577
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=ISO-8859-1

On

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ton.voon@altinity.com