In short the result is as following: No change, except for the following.
- the messages in the elasticsearch logfile now complain about the 2000 queue limit being reached.
- The load of the nodes are even higher (which makes kind of sense).
Due to the high load, I've ruled out the following:
- CPU is high, but not 100% (aprox. 75%), half of it is wait I/O.
- The disks are reading at > 250MB/s (streaming?), but not at 100%. I'm a bit confused since I'm querying data from the past hour. Shouldn't that be in memory?
- Network traffic is just a couple of MB/s, so that's not it.
- Memory....I've read that NLS should assign 50% to the java proces as defined in /etc/sysconfig/elasticsearch. We have 128GB in each node and the calculation in de startup file will give 64GB as an answer:
Code: Select all
colog3:root:/etc/sysconfig> $(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )
-bash: 64339: command not found
Code: Select all
+nmon-14g---------------------Hostname=colog3-------Refresh= 1secs ---15:06.54----------------------------------------------------------------------------+
| CPU Utilisation ------------------------------------------------------------------------------------------------------------------------------------- |
|---------------------------+-------------------------------------------------+ |
|CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| |
| 1 83.2 0.0 0.0 16.8|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU > |
| 2 4.0 1.0 0.0 95.0|U > | |
| 3 35.3 0.0 0.0 64.7|UUUUUUUUUUUUUUUUU > |
| 4 4.0 1.0 0.0 95.0|U > |
| 5 1.0 0.0 0.0 99.0| > | |
| 6 5.9 1.0 0.0 93.1|UU > |
| 7 1.0 0.0 0.0 99.0| > | |
| 8 0.0 0.0 0.0 100.0| > |
| 9 0.0 0.0 0.0 100.0| > | |
| 10 1.0 0.0 0.0 99.0| > | |
| 11 1.0 0.0 0.0 99.0| > | |
| 12 3.0 0.0 0.0 97.0|U > | |
| 13 1.0 0.0 0.0 99.0| > | |
| 14 0.0 0.0 0.0 100.0| > |
| 15 1.0 0.0 0.0 99.0| > | |
| 16 5.0 0.0 0.0 95.0|UU > |
|---------------------------+-------------------------------------------------+ |
|Avg 9.2 0.2 0.0 90.6|UUUU > | |
|---------------------------+-------------------------------------------------+ |
| Top Processes Procs=324 mode=4 (1=Basic, 3=Perf 4=Size 5=I/O)-------------------------------------------------------------------------------------------|
| PID %CPU ResSize Command |
| Used KB |
| 20831 115.5 124512128 /bin/java -Xms64339m -Xmx64339m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFrac|
|b 1634 39.5 778176 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSIniti|
|/ 666 0.0 131776 /usr/lib/systemd/systemd-journald cyFraction|
|n 1506 0.0 75612 /usr/sbin/rsyslogd -n ome=/usr/l|
|/ 19114 0.0 21776 /usr/bin/python -tt /usr/sbin/yum-cron /etc/yum/yum-cron-hourly.conf r/logstash|
| 20368 0.0 14780 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs |
| 20367 0.0 14776 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller |
| 20369 0.0 14616 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs/apache |
| 8385 0.0 13624 /usr/sbin/httpd -DFOREGROUND |
| 4315 0.0 13528 /usr/sbin/httpd -DFOREGROUND |
| 7344 0.0 13512 /usr/sbin/httpd -DFOREGROUND |
| 8400 0.0 13388 /usr/sbin/httpd -DFOREGROUND |
| 7906 0.0 13284 /usr/sbin/httpd -DFOREGROUND
I would like to start the nodes fixing them at either 32GB mem as adviced, or 64GB as "has always worked" to see if my theory makes sence. Soooooo.....can you provide me with the correct syntax?


As for the output of the above provided steps, if you want hem, let me know. I just need to make them more readable, I need a little bit extra time for that.
Grreting...Hans