Cluster Failure

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Cluster Failure

Post by Envera IT »

We're running a two host cluster. Currently the GUI is not accessible, if the hosts are rebooted the "Waiting for Elasticsearch DB" message shows up for awhile but after abit the page just stops loading at all. I'm pretty sure the issues started on the 17th when a change was made by one of our developers on their systems that dumped a bunch of badly formatted data to the cluster. If the DB does start, the Administration tab is unavailable, and one of the two hosts in the cluster are saying they're no longer licensed. It appears that both hosts consider themselves to be "master". I've attempted to delete bad indexes at this point but the attempts to delete or close the index via curl typically time out. I'm seeing a ton of unassigned shards.

The two hosts are NLS appliances running in vmware with two CPU's, 32GB memory, and 300GB of HDD. All this happened while I was out of town, so I'm just getting around to reaching out.

Its a hot mess, at this point I'm in too deep and need support :|

Here's some data, let me know what else you need.

Code: Select all


curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

{
  "cluster_name" : "1427fd37-fc39-4f84-a49e-19562f0bc946",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 69,
  "active_shards" : 69,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 259,
  "number_of_pending_tasks" : 106,
  "number_of_in_flight_fetch" : 0
}

Code: Select all

 curl -XGET 'http://localhost:9200/_cat/shards?v'
index               shard prirep state           docs   store ip         node
nagioslogserver     0     p      UNASSIGNED
nagioslogserver     0     r      UNASSIGNED
logstash-2017.06.18 2     p      UNASSIGNED
logstash-2017.06.18 2     r      UNASSIGNED
logstash-2017.06.18 0     p      STARTED       919006 461.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.18 0     r      UNASSIGNED
logstash-2017.06.18 3     p      UNASSIGNED
logstash-2017.06.18 3     r      UNASSIGNED
logstash-2017.06.18 1     p      STARTED       918363 461.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.18 1     r      UNASSIGNED
logstash-2017.06.18 4     p      STARTED       918976 460.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.18 4     r      UNASSIGNED
logstash-2017.06.19 4     p      UNASSIGNED
logstash-2017.06.19 4     r      UNASSIGNED
logstash-2017.06.19 0     p      UNASSIGNED
logstash-2017.06.19 0     r      UNASSIGNED
logstash-2017.06.19 3     p      STARTED       852691 422.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.19 3     r      UNASSIGNED
logstash-2017.06.19 1     p      STARTED       852587 421.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.19 1     r      UNASSIGNED
logstash-2017.06.19 2     p      STARTED       852394 421.6mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.19 2     r      UNASSIGNED
logstash-2017.07.01 4     p      UNASSIGNED
logstash-2017.07.01 4     r      UNASSIGNED
logstash-2017.07.01 0     p      UNASSIGNED
logstash-2017.07.01 0     r      UNASSIGNED
logstash-2017.07.01 3     p      UNASSIGNED
logstash-2017.07.01 3     r      UNASSIGNED
logstash-2017.07.01 1     p      STARTED       592377 290.3mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.01 1     r      UNASSIGNED
logstash-2017.07.01 2     p      INITIALIZING                 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.01 2     r      UNASSIGNED
logstash-2017.07.02 4     p      UNASSIGNED
logstash-2017.07.02 4     r      UNASSIGNED
logstash-2017.07.02 0     p      STARTED       619116 298.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.02 0     r      UNASSIGNED
logstash-2017.07.02 3     p      STARTED       618853 298.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.02 3     r      UNASSIGNED
logstash-2017.07.02 1     p      STARTED       619089 298.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.02 1     r      UNASSIGNED
logstash-2017.07.02 2     p      UNASSIGNED
logstash-2017.07.02 2     r      UNASSIGNED
logstash-2017.07.03 2     p      UNASSIGNED
logstash-2017.07.03 2     r      UNASSIGNED
logstash-2017.07.03 0     p      STARTED       703274 345.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.03 0     r      UNASSIGNED
logstash-2017.07.03 3     p      STARTED       703500   346mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.03 3     r      UNASSIGNED
logstash-2017.07.03 1     p      UNASSIGNED
logstash-2017.07.03 1     r      UNASSIGNED
logstash-2017.07.03 4     p      STARTED       702623 344.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.03 4     r      UNASSIGNED
logstash-2017.07.04 2     p      UNASSIGNED
logstash-2017.07.04 2     r      UNASSIGNED
logstash-2017.07.04 0     p      UNASSIGNED
logstash-2017.07.04 0     r      UNASSIGNED
logstash-2017.07.04 3     p      UNASSIGNED
logstash-2017.07.04 3     r      UNASSIGNED
logstash-2017.07.04 1     p      STARTED       604238 308.6mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.04 1     r      UNASSIGNED
logstash-2017.07.04 4     p      UNASSIGNED
logstash-2017.07.04 4     r      UNASSIGNED
logstash-2017.07.05 4     p      STARTED       684473 341.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.05 4     r      UNASSIGNED
logstash-2017.07.05 0     p      UNASSIGNED
logstash-2017.07.05 0     r      UNASSIGNED
logstash-2017.07.05 3     p      STARTED       684714 341.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.05 3     r      UNASSIGNED
logstash-2017.07.05 1     p      STARTED       683795 340.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.05 1     r      UNASSIGNED
logstash-2017.07.05 2     p      STARTED       684070 341.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.05 2     r      UNASSIGNED
logstash-2017.07.06 2     p      STARTED       762161 384.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.06 2     r      UNASSIGNED
logstash-2017.07.06 0     p      UNASSIGNED
logstash-2017.07.06 0     r      UNASSIGNED
logstash-2017.07.06 3     p      UNASSIGNED
logstash-2017.07.06 3     r      UNASSIGNED
logstash-2017.07.06 1     p      UNASSIGNED
logstash-2017.07.06 1     r      UNASSIGNED
logstash-2017.07.06 4     p      STARTED       762838 385.3mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.06 4     r      UNASSIGNED
nagioslogserver_log 2     p      UNASSIGNED
nagioslogserver_log 2     r      UNASSIGNED
nagioslogserver_log 0     p      STARTED      1977201 306.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
nagioslogserver_log 0     r      UNASSIGNED
nagioslogserver_log 3     p      UNASSIGNED
nagioslogserver_log 3     r      UNASSIGNED
nagioslogserver_log 1     p      UNASSIGNED
nagioslogserver_log 1     r      UNASSIGNED
nagioslogserver_log 4     p      STARTED      1974076 305.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
nagioslogserver_log 4     r      UNASSIGNED
logstash-2017.06.25 4     p      STARTED       707972 365.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.25 4     r      UNASSIGNED
logstash-2017.06.25 0     p      UNASSIGNED
logstash-2017.06.25 0     r      UNASSIGNED
logstash-2017.06.25 3     p      UNASSIGNED
logstash-2017.06.25 3     r      UNASSIGNED
logstash-2017.06.25 1     p      STARTED       707984 365.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.25 1     r      UNASSIGNED
logstash-2017.06.25 2     p      STARTED       708272 366.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.25 2     r      UNASSIGNED
logstash-2017.06.24 2     p      UNASSIGNED
logstash-2017.06.24 2     r      UNASSIGNED
logstash-2017.06.24 0     p      STARTED       801081 398.3mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.24 0     r      UNASSIGNED
logstash-2017.06.24 3     p      UNASSIGNED
logstash-2017.06.24 3     r      UNASSIGNED
logstash-2017.06.24 1     p      UNASSIGNED
logstash-2017.06.24 1     r      UNASSIGNED
logstash-2017.06.24 4     p      UNASSIGNED
logstash-2017.06.24 4     r      UNASSIGNED
logstash-2017.06.27 4     p      STARTED       862712 435.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.27 4     r      UNASSIGNED
logstash-2017.06.27 0     p      STARTED       862615 435.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.27 0     r      UNASSIGNED
logstash-2017.06.27 3     p      STARTED       862631 435.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.27 3     r      UNASSIGNED
logstash-2017.06.27 1     p      STARTED       862199 436.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.27 1     r      UNASSIGNED
logstash-2017.06.27 2     p      STARTED       861966 434.6mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.27 2     r      UNASSIGNED
logstash-2017.06.26 4     p      UNASSIGNED
logstash-2017.06.26 4     r      UNASSIGNED
logstash-2017.06.26 0     p      UNASSIGNED
logstash-2017.06.26 0     r      UNASSIGNED
logstash-2017.06.26 3     p      UNASSIGNED
logstash-2017.06.26 3     r      UNASSIGNED
logstash-2017.06.26 1     p      UNASSIGNED
logstash-2017.06.26 1     r      UNASSIGNED
logstash-2017.06.26 2     p      STARTED       765507 392.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.26 2     r      UNASSIGNED
logstash-2017.07.18 2     p      UNASSIGNED
logstash-2017.07.18 2     r      UNASSIGNED
logstash-2017.07.18 0     p      INITIALIZING                 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.18 0     r      UNASSIGNED
logstash-2017.07.18 3     p      UNASSIGNED
logstash-2017.07.18 3     r      UNASSIGNED
logstash-2017.07.18 1     p      UNASSIGNED
logstash-2017.07.18 1     r      UNASSIGNED
logstash-2017.07.18 4     p      UNASSIGNED
logstash-2017.07.18 4     r      UNASSIGNED
logstash-2017.06.21 2     p      UNASSIGNED
logstash-2017.06.21 2     r      UNASSIGNED
logstash-2017.06.21 0     p      STARTED       799600 412.2mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.21 0     r      UNASSIGNED
logstash-2017.06.21 3     p      UNASSIGNED
logstash-2017.06.21 3     r      UNASSIGNED
logstash-2017.06.21 1     p      STARTED       800684 412.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.21 1     r      UNASSIGNED
logstash-2017.06.21 4     p      UNASSIGNED
logstash-2017.06.21 4     r      UNASSIGNED
logstash-2017.07.19 4     p      UNASSIGNED
logstash-2017.07.19 4     r      UNASSIGNED
logstash-2017.07.19 0     p      STARTED        17926   6.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.19 0     r      UNASSIGNED
logstash-2017.07.19 3     p      UNASSIGNED
logstash-2017.07.19 3     r      UNASSIGNED
logstash-2017.07.19 1     p      UNASSIGNED
logstash-2017.07.19 1     r      UNASSIGNED
logstash-2017.07.19 2     p      STARTED        17963  13.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.19 2     r      UNASSIGNED
logstash-2017.06.20 2     p      UNASSIGNED
logstash-2017.06.20 2     r      UNASSIGNED
logstash-2017.06.20 0     p      STARTED       959499 470.6mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.20 0     r      UNASSIGNED
logstash-2017.06.20 3     p      UNASSIGNED
logstash-2017.06.20 3     r      UNASSIGNED
logstash-2017.06.20 1     p      STARTED       960369 469.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.20 1     r      UNASSIGNED
logstash-2017.06.20 4     p      STARTED                      10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.20 4     r      UNASSIGNED
logstash-2017.06.23 4     p      UNASSIGNED
logstash-2017.06.23 4     r      UNASSIGNED
logstash-2017.06.23 0     p      STARTED       785489 404.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.23 0     r      UNASSIGNED
logstash-2017.06.23 3     p      UNASSIGNED
logstash-2017.06.23 3     r      UNASSIGNED
logstash-2017.06.23 1     p      UNASSIGNED
logstash-2017.06.23 1     r      UNASSIGNED
logstash-2017.06.23 2     p      STARTED       785928 405.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.23 2     r      UNASSIGNED
logstash-2017.06.22 2     p      UNASSIGNED
logstash-2017.06.22 2     r      UNASSIGNED
logstash-2017.06.22 0     p      UNASSIGNED
logstash-2017.06.22 0     r      UNASSIGNED
logstash-2017.06.22 3     p      STARTED       773050 399.2mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.22 3     r      UNASSIGNED
logstash-2017.06.22 1     p      STARTED       773473 399.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.22 1     r      UNASSIGNED
logstash-2017.06.22 4     p      STARTED       773045 399.5mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.22 4     r      UNASSIGNED
logstash-2017.06.28 2     p      UNASSIGNED
logstash-2017.06.28 2     r      UNASSIGNED
logstash-2017.06.28 0     p      STARTED       902020   438mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.28 0     r      UNASSIGNED
logstash-2017.06.28 3     p      STARTED       902396 438.6mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.28 3     r      UNASSIGNED
logstash-2017.06.28 1     p      STARTED       903235 439.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.28 1     r      UNASSIGNED
logstash-2017.06.28 4     p      STARTED       901853 438.2mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.28 4     r      UNASSIGNED
logstash-2017.06.29 4     p      UNASSIGNED
logstash-2017.06.29 4     r      UNASSIGNED
logstash-2017.06.29 0     p      UNASSIGNED
logstash-2017.06.29 0     r      UNASSIGNED
logstash-2017.06.29 3     p      UNASSIGNED
logstash-2017.06.29 3     r      UNASSIGNED
logstash-2017.06.29 1     p      STARTED       711688 360.3mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.29 1     r      UNASSIGNED
logstash-2017.06.29 2     p      STARTED       711792 359.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.29 2     r      UNASSIGNED
logstash-2017.07.12 4     p      UNASSIGNED
logstash-2017.07.12 4     r      UNASSIGNED
logstash-2017.07.12 0     p      UNASSIGNED
logstash-2017.07.12 0     r      UNASSIGNED
logstash-2017.07.12 3     p      STARTED       728275 366.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.12 3     r      UNASSIGNED
logstash-2017.07.12 1     p      UNASSIGNED
logstash-2017.07.12 1     r      UNASSIGNED
logstash-2017.07.12 2     p      UNASSIGNED
logstash-2017.07.12 2     r      UNASSIGNED
logstash-2017.07.13 4     p      STARTED       706154   356mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.13 4     r      UNASSIGNED
logstash-2017.07.13 0     p      STARTED       706017 355.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.13 0     r      UNASSIGNED
logstash-2017.07.13 3     p      STARTED       706034 355.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.13 3     r      UNASSIGNED
logstash-2017.07.13 1     p      STARTED       706270 355.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.13 1     r      UNASSIGNED
logstash-2017.07.13 2     p      INITIALIZING                 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.13 2     r      UNASSIGNED
logstash-2017.07.10 4     p      UNASSIGNED
logstash-2017.07.10 4     r      UNASSIGNED
logstash-2017.07.10 0     p      UNASSIGNED
logstash-2017.07.10 0     r      UNASSIGNED
logstash-2017.07.10 3     p      UNASSIGNED
logstash-2017.07.10 3     r      UNASSIGNED
logstash-2017.07.10 1     p      UNASSIGNED
logstash-2017.07.10 1     r      UNASSIGNED
logstash-2017.07.10 2     p      UNASSIGNED
logstash-2017.07.10 2     r      UNASSIGNED
logstash-2017.07.11 2     p      UNASSIGNED
logstash-2017.07.11 2     r      UNASSIGNED
logstash-2017.07.11 0     p      UNASSIGNED
logstash-2017.07.11 0     r      UNASSIGNED
logstash-2017.07.11 3     p      UNASSIGNED
logstash-2017.07.11 3     r      UNASSIGNED
logstash-2017.07.11 1     p      UNASSIGNED
logstash-2017.07.11 1     r      UNASSIGNED
logstash-2017.07.11 4     p      UNASSIGNED
logstash-2017.07.11 4     r      UNASSIGNED
logstash-2017.07.16 2     p      UNASSIGNED
logstash-2017.07.16 2     r      UNASSIGNED
logstash-2017.07.16 0     p      STARTED       650660   326mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.16 0     r      UNASSIGNED
logstash-2017.07.16 3     p      STARTED       651045 326.2mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.16 3     r      UNASSIGNED
logstash-2017.07.16 1     p      UNASSIGNED
logstash-2017.07.16 1     r      UNASSIGNED
logstash-2017.07.16 4     p      STARTED       650196   326mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.16 4     r      UNASSIGNED
logstash-2017.07.14 2     p      UNASSIGNED
logstash-2017.07.14 2     r      UNASSIGNED
logstash-2017.07.14 0     p      STARTED       685199 345.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.14 0     r      UNASSIGNED
logstash-2017.07.14 3     p      STARTED       685215 345.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.14 3     r      UNASSIGNED
logstash-2017.07.14 1     p      INITIALIZING                 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.14 1     r      UNASSIGNED
logstash-2017.07.14 4     p      UNASSIGNED
logstash-2017.07.14 4     r      UNASSIGNED
logstash-2017.07.15 4     p      UNASSIGNED
logstash-2017.07.15 4     r      UNASSIGNED
logstash-2017.07.15 0     p      STARTED       674317 344.4mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.15 0     r      UNASSIGNED
logstash-2017.07.15 3     p      STARTED       674738   345mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.15 3     r      UNASSIGNED
logstash-2017.07.15 1     p      UNASSIGNED
logstash-2017.07.15 1     r      UNASSIGNED
logstash-2017.07.15 2     p      STARTED       675329   345mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.15 2     r      UNASSIGNED
logstash-2017.07.09 2     p      UNASSIGNED
logstash-2017.07.09 2     r      UNASSIGNED
logstash-2017.07.09 0     p      UNASSIGNED
logstash-2017.07.09 0     r      UNASSIGNED
logstash-2017.07.09 3     p      UNASSIGNED
logstash-2017.07.09 3     r      UNASSIGNED
logstash-2017.07.09 1     p      UNASSIGNED
logstash-2017.07.09 1     r      UNASSIGNED
logstash-2017.07.09 4     p      UNASSIGNED
logstash-2017.07.09 4     r      UNASSIGNED
logstash-2017.07.08 4     p      UNASSIGNED
logstash-2017.07.08 4     r      UNASSIGNED
logstash-2017.07.08 0     p      UNASSIGNED
logstash-2017.07.08 0     r      UNASSIGNED
logstash-2017.07.08 3     p      STARTED       685427 360.9mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.08 3     r      UNASSIGNED
logstash-2017.07.08 1     p      UNASSIGNED
logstash-2017.07.08 1     r      UNASSIGNED
logstash-2017.07.08 2     p      STARTED       684532 359.7mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.08 2     r      UNASSIGNED
logstash-2017.07.07 2     p      UNASSIGNED
logstash-2017.07.07 2     r      UNASSIGNED
logstash-2017.07.07 0     p      UNASSIGNED
logstash-2017.07.07 0     r      UNASSIGNED
logstash-2017.07.07 3     p      STARTED       790495 407.8mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.07.07 3     r      UNASSIGNED
logstash-2017.07.07 1     p      UNASSIGNED
logstash-2017.07.07 1     r      UNASSIGNED
logstash-2017.07.07 4     p      UNASSIGNED
logstash-2017.07.07 4     r      UNASSIGNED
logstash-2017.06.30 2     p      UNASSIGNED
logstash-2017.06.30 2     r      UNASSIGNED
logstash-2017.06.30 0     p      UNASSIGNED
logstash-2017.06.30 0     r      UNASSIGNED
logstash-2017.06.30 3     p      STARTED       663007 323.1mb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
logstash-2017.06.30 3     r      UNASSIGNED
logstash-2017.06.30 1     p      UNASSIGNED
logstash-2017.06.30 1     r      UNASSIGNED
logstash-2017.06.30 4     p      UNASSIGNED
logstash-2017.06.30 4     r      UNASSIGNED
kibana-int          4     p      UNASSIGNED
kibana-int          4     r      UNASSIGNED
kibana-int          0     p      UNASSIGNED
kibana-int          0     r      UNASSIGNED
kibana-int          3     p      STARTED           46 224.7kb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
kibana-int          3     r      UNASSIGNED
kibana-int          1     p      UNASSIGNED
kibana-int          1     r      UNASSIGNED
kibana-int          2     p      STARTED           43 169.2kb 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0
kibana-int          2     r      UNASSIGNED

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'
{
  "cluster_name" : "1427fd37-fc39-4f84-a49e-19562f0bc946",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 12,
  "active_shards" : 12,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 316,
  "number_of_pending_tasks" : 75,
  "number_of_in_flight_fetch" : 0,
  "indices" : {
    "logstash-2017.06.18" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "nagioslogserver" : {
      "status" : "red",
      "number_of_shards" : 1,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 2
    },
    "logstash-2017.06.19" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.01" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 8
    },
    "logstash-2017.07.02" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.03" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.04" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.05" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.06" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "nagioslogserver_log" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.25" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.24" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.27" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.26" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.18" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 9
    },
    "logstash-2017.06.21" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.06.20" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.19" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.23" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.22" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.06.28" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.29" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.12" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.13" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.10" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.11" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.16" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.14" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    },
    "logstash-2017.07.15" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.09" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.08" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.07.07" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "logstash-2017.06.30" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 0,
      "active_shards" : 0,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 10
    },
    "kibana-int" : {
      "status" : "red",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 9
    }
  }
}

Code: Select all

 curl -XGET localhost:9200/_nodes/jvm?pretty
{
  "cluster_name" : "1427fd37-fc39-4f84-a49e-19562f0bc946",
  "nodes" : {
    "lI0sN5x1T3qv9XoU62Fm3Q" : {
      "name" : "0978ff82-93a6-4fe9-9a9c-2088927e9d3c",
      "transport_address" : "inet[/10.0.1.172:9300]",
      "host" : "srq-nagios-ls2.envera.local",
      "ip" : "10.0.1.172",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "pid" : 1412,
        "version" : "1.7.0_85",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.85-b03",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1500487163313,
        "mem" : {
          "heap_init_in_bytes" : 16900947968,
          "heap_max_in_bytes" : 16883515392,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 16883515392
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    }
  }
}
I like graphs...
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Cluster Failure

Post by mcapra »

Sharing the contents of the Elasticsearch logs from both machines may be a good place to start:

Code: Select all

/var/log/elasticsearch/
Having a bunch of red indices and missing primary shards is definitely not good. Hopefully the logs can share some reasons why.
Former Nagios employee
https://www.mcapra.com/
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Cluster Failure

Post by cdienger »

Thanks for the assist Matt. I was going to also suggest that we run the following on both servers to get an idea of the master/node settings:

Code: Select all

curl 'localhost:9200/_cat/master?v'
curl 'localhost:9200/_cat/nodes?v'
curl localhost:9200/_cluster/settings?pretty
From the sound of it could be that the indices responsible for the system settings have been corrupted. https://support.nagios.com/kb/article.php?id=68 covers restoring a system backup with the restore_backup.sh script. If the master/node settings look good we can try running this on the master and restore a backup from the 16th or prior. I would also suggest stopping the crond service, waiting 5 minute before trying to run the restore script:

Code: Select all

service crond stop
wait 5
/usr/local/nagioslogserver/scripts/restore_backup.sh /store/backups/nagioslogserver/nagioslogserver.2015-07-16orwhatever.tar.gz
service crond start
Let's hold off on running the restore though until we can get a look at the logs and curl results.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Re: Cluster Failure

Post by Envera IT »

mcapra wrote:Sharing the contents of the Elasticsearch logs from both machines may be a good place to start:

Code: Select all

/var/log/elasticsearch/
Having a bunch of red indices and missing primary shards is definitely not good. Hopefully the logs can share some reasons why.
You'll see the ft2events errors, I suspect thats the cause for obvious reasons..the developer has reverted the change on his end at this point.

Thank you!
You do not have the required permissions to view the files attached to this post.
I like graphs...
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Re: Cluster Failure

Post by Envera IT »

cdienger wrote:Thanks for the assist Matt. I was going to also suggest that we run the following on both servers to get an idea of the master/node settings:

Code: Select all

curl 'localhost:9200/_cat/master?v'
curl 'localhost:9200/_cat/nodes?v'
curl localhost:9200/_cluster/settings?pretty
From the sound of it could be that the indices responsible for the system settings have been corrupted. https://support.nagios.com/kb/article.php?id=68 covers restoring a system backup with the restore_backup.sh script. If the master/node settings look good we can try running this on the master and restore a backup from the 16th or prior. I would also suggest stopping the crond service, waiting 5 minute before trying to run the restore script:

Code: Select all

service crond stop
wait 5
/usr/local/nagioslogserver/scripts/restore_backup.sh /store/backups/nagioslogserver/nagioslogserver.2015-07-16orwhatever.tar.gz
service crond start
Let's hold off on running the restore though until we can get a look at the logs and curl results.

NLS-1

Code: Select all

curl 'localhost:9200/_cat/master?v'
id                     host                        ip         node
1-k2mOV8SH-1gQJRGdnKyw srq-nagios-ls1.envera.local 10.0.1.171 881dac7b-e349-4fa5-aaef-41294c3b66e0

Code: Select all

curl 'localhost:9200/_cat/nodes?v'
host                        ip         heap.percent ram.percent load node.role master name
srq-nagios-ls1.envera.local 10.0.1.171           73          64 1.68 d         *      881dac7b-e349-4fa5-aaef-41294c3b66e0

Code: Select all

curl localhost:9200/_cluster/settings?pretty
{
  "persistent" : { },
  "transient" : { }
}
NLS-2

Code: Select all

curl 'localhost:9200/_cat/master?v'
id                     host                        ip         node
lI0sN5x1T3qv9XoU62Fm3Q srq-nagios-ls2.envera.local 10.0.1.172 0978ff82-93a6-4fe9-9a9c-2088927e9d3c

Code: Select all

 curl 'localhost:9200/_cat/nodes?v'
host                        ip         heap.percent ram.percent load node.role master name
srq-nagios-ls2.envera.local 10.0.1.172           66          59 1.82 d         *      0978ff82-93a6-4fe9-9a9c-2088927e9d3c

Code: Select all

curl localhost:9200/_cluster/settings?pretty
{
  "persistent" : { },
  "transient" : { }
}
So I can confirm the location of backups, where would they be kept (local as we don't have remote backups configured at this point). We have the space to setup remote backups moving forward but just want to be transparent at this point. I have backups of the inputs and filter configs, I'd also like to backup the dashboards if I'm looking at a rebuild. I don't really care if I lose log data at this point but retaining the server configuration would be ideal.

Thank you!
I like graphs...
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Re: Cluster Failure

Post by Envera IT »

Ok I can access the web gui again but the Administration page is not loading at all. HTTP 500. I followed the documentation on increasing php's memory allowance but no dice.
I like graphs...
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Re: Cluster Failure

Post by Envera IT »

May be good to go now. I had to delete further indexes around the time of the bad logs (15th - 18th) and then the administration page came up just fine. yay. Going to let it run for a few hours and see what happens.
I like graphs...
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Cluster Failure

Post by cdienger »

Glad to hear. Keep us posted :)
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Envera IT
Posts: 159
Joined: Wed Jun 19, 2013 10:21 am

Re: Cluster Failure

Post by Envera IT »

Good to close I at this point.

Basically when the incorrectly formatted logs came in the cluster went under alot of stress to the point that the ES heartbeats timed out. We only have a two host cluster so they split into two clusters (split brain scenario). From that point on it was an issue of everytime the cluster was reformed, bad logs were being replicated from one to the other. I had to purge all the indices containing those events, slap the developer around abit after he tried to reintroduce the change which broke the cluster again, and then all has been well since.
I like graphs...
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Cluster Failure

Post by cdienger »

slap the developer around abit
Strict but fair. I like it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.