Problem

Elasticsearch stopping log writing due to disk space full on Elasticsearch servers.

Reason/CauseWhen Elasticsearch is installed with default settings, it issues a warning when the disk reaches 85% full and stops log writing when it reaches 90%.
Solution

Free up space on the disk or resize the disk. After this operation, the following command should be used to indicate that the disk is ready for writing again.

curl -X PUT "<ELASTIC_IP>:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
    "index.blocks.read_only_allow_delete": null
}'

Additional Suggestion

This situation can lead to a significant amount of unused space on large disks, so it is recommended to customize it for your servers. These limits can be updated either as a percentage or directly with a numerical value.


Setting the disk size with a numerical limit:

curl -X PUT "<ELASTIC_IP>:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "100gb",
    "cluster.routing.allocation.disk.watermark.high": "80gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "50gb",
    "cluster.info.update.interval": "1m"
}}'


Setting the disk size with a percentage limit:

curl -X PUT "<ELASTIC_IP>:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "93%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.info.update.interval": "1m"
}}'


#Afterwards, you should also enter the same values in the configuration file of Elasticsearch, which you keep running instantly. This is necessary to ensure that your settings do not get lost in the event of a possible restart of the application.


cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 93%
cluster.routing.allocation.disk.watermark.high: 95%



Problem

Error in log searches in Kibana: "x of y shards failed: The data you are seeing might be incomplete or wrong. The length of [X] field of [Y] doc of [<INDEX_NAME>] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting."

Reason/Cause

The default limit for the data size that Elasticsearch can perform highlighting for each record is 1.000.000 characters. This limit is set by Elasticsearch for optimal JVM RAM usage and search speed.

Solution

This setting can be increased with the following command. If you don't know the size of your data, it is recommended to gradually increase this value to set a limit suitable for your data.

curl -XPUT "<ELASTIC_IP>:9200/.ds-apinizer-log-apiproxy-AAAA-000*/_settings" -H "Content-Type: application/json" -d'{

  "index" : {

    "highlight.max_analyzed_offset" : 2000000

  }

}'


Problem

Warning of "Request cannot be executed; I/O reactor status: STOPPED" on Api Traffic pages with no logs

Reason/Cause

It is necessary to increase the RAM limits used by Elasticsearch.

Solution

This setting can be increased from the jvm.options file. It is recommended not to exceed half of the total amount of system RAM.

sudo vi /opt/elasticsearch/elasticsearch-7.9.2/config/jvm.options

-Xms8g
-Xmx8g

systemctl restart elasticsearch


Problem

Elasticsearch exception [type=validation_exception, reason=Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;]

Reason/Cause

 You are reaching the limit cluster.max_shards_per_node. Add more data node, reduce the number of shards in cluster or increase the shard limit on the system.

Solution

The correct solution to this problem is increasing the data node amount.

Alternative Solution

Since increasing the data nodes may not always be possible, manual management to the shards is also an usable solution. To do this, following commands can be used.


curl http://<ELASTICSEARCH_IP>:9200/_cluster/settings --header "Content-Type:application/json" -d '{
  "persistent" : {
    "cluster.routing.allocation.total_shards_per_node" : 2000 ,
    "cluster.max_shards_per_node":2000
  }
}'

curl http://<ELASTICSEARCH_IP>:9200/apinizer-log-apiproxy-<INDEX_KEY>/_rollover

Alternative Solution

If both previous solutions may not be applicable, the latest resort is to clear/delete old indices which is NOT RECOMMENDED since it will cause the loss on old logs.

curl -XDELETE http://<ELASTICSEARCH_IP>:9200/apinizer-log-apiproxy-<INDEX_KEY>-<INDEX_NUMBER>/_rollover

Problem

UnassignedShards-CLUSTER-RECOVERED

Solution

Bu sorunun birden fazla çözümü olabilir. Tüm elasticsearch node'larının çalıştığından ve dosya kaybı olmadığından emin olunması gerekmektedir.




Since increasing the data nodes may not always be possible, manual management to the shards is also an usable solution. To do this, following commands can be used.


curl http://<ELASTICSEARCH_IP>:9200/_cluster/settings --header "Content-Type:application/json" -d '{
  "persistent" : {
    "cluster.routing.allocation.total_shards_per_node" : 2000 ,
    "cluster.max_shards_per_node":2000
  }
}'

curl http://<ELASTICSEARCH_IP>:9200/apinizer-log-apiproxy-<INDEX_KEY>/_rollover



Problem

UnassignedShards-CLUSTER-RECOVERED

SolutionThere may be more than one solution to this problem. It is necessary to ensure that all elasticsearch nodes are running and that there is no file loss.
Alternative Solution

Check the config/elasticsearch.yml file on the Elasticsearch master node. Here you can find the IP addresses of other nodes and make sure they are working. When the connected nodes are down, you cannot see these nodes with the "GET /_nodes" request.


The status of nodes, clusters and shards is checked with the following commands.


curl "<ELASTICSEARCH_IP>:9200/_nodes"


curl "<ELASTICSEARCH_IP>:9200/_cluster/allocation/explain"


curl "<ELASTICSEARCH_IP>:9200/_cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state"


#The following command will reactivate the sharing on the nodes.

curl -XPUT "<ELASTICSEARCH_IP>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "transient" : { "cluster.routing.allocation.enable": true } }'


#When the above command is not sufficient, this operation is forced with the following command.

curl -XPOST "<ELASTICSEARCH_IP>:9200/_cluster/reroute?retry_failed=true&pretty"

Reason/Cause

Shard distribution may not be possible after a server restart or file loss.