Manually Back Up and Restore Elasticsearch
This document explains the log backup policies that can be used when API Traffic logs are stored in the Elasticsearch database managed by Apinizer.
Firstly, the method of backup needs to be clarified. Some commonly used methods include:
- The path.data address specified in the Elasticsearch configuration file can be backed up incrementally or at specific intervals by system administrators.
PROS Access to historical data will always continue with the same ease.
CONS Both the active and backup disks will continue to grow continuously.
CONS In case of issues with the main disk, reinstallation will be required to access the data on the backup disk.
- The server where Elasticsearch is located can be backed up either using the RAID-0 method or at regular intervals..
PROS Access to historical data will always continue with the same ease.
PROS In case of an issue with the main disk, the backup disk can be instantly put into use through network redirection.
CONS Both the active and backup disks will continue to grow continuously.
- Elasticsearch data can be dumped using the Elasticsearch Snapshot API. By setting up a snapshot policy, these backup files can be extracted to a specific system address, and afterward, these backup files should be separately backed up to a different server.
PROS Access to historical data will always continue with the same ease.
CONS Both the active and backup disks will continue to grow continuously.
SUGGESTION Regardless of the method used from the above, once regular backups are set up, backed-up logs can be automatically deleted using Elasticsearch ILM, or a desired amount can be manually deleted at any time using the Elasticsearch API.
PROS The active server will continue to operate with much lower disk resources.
CONS An application needs to be installed on the backup disk for accessing historical data, or the backups must be transferred to a specific server for operation.
CONS Only the backup disk will continue to grow continuously.
Elasticsearch Manuel Backup and Restore
This section explains the creation of a Snapshot Lifecycle Management (SLM) policy for automatically backing up logs on Elasticsearch through a cron definition and describes methods for taking instant backups and restoring them.
Variables
İstekler yer alan dinamik değerler ve açıklamaları aşağıdaki tabloda görülmektedir.
Variable | Description |
---|---|
<ELASTICSEARCH_IP_ADDRESS> | The host information of Elasticsearch cluster |
<INDEX_KEY> | This value must be singular as it is descriptive on a cluster basis. Therefore, the same value must be used in all requests. |
1. Specifying the File Location of Backups
In the elasticsearch.yml configuration file of all nodes in the cluster, path.repo field is added and the file location where the backup files will be stored is written.
If this information was later added to the configuration file, the node must be restarted.
path:
repo:
- /backups/my_backup_location
2. Defining Snapshot Repository Information
Repository keeps information about where the files to be loaded in the snapshot are stored.
curl -X PUT "http://<ELASTICSEARCH_IP_ADDRESS>/_snapshot/apinizer-repository-<INDEX_KEY>?pretty" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/backups/my_backup_location",
"compress": true
}
}
'
3. Verify Location to Repository
It should be checked whether Elasticsearch has access to the file.
If verification is successful, a list of nodes where the repository is used is returned. If validation fails, an error is returned from the request.
curl -X POST "http://<ELASTICSEARCH_IP_ADDRESS>/_snapshot/apinizer-repository-<INDEX_KEY>/_verify?pretty"
If backup is to be made automatically, SLM Policy commands must be run. If a backup is to be taken at the desired time, Snapshot commands must be run.
4. Creating SLM Policy
4.1. Creating Snapshot Policy
curl -X PUT "http://<ELASTICSEARCH_IP_ADDRESS>/_slm/policy/apinizer-slm-policy-<INDEX_KEY>?pretty" -H 'Content-Type: application/json' -d'
{
"schedule": "0 0 0 ? * 1#1 *",
"name": "<apinizer-snapshot-<INDEX_KEY>-{now/d}>",
"repository": "apinizer-repository-<INDEX_KEY>",
"config": {
"indices": ["apinizer-log-apiproxy-<INDEX_KEY>"],
"ignore_unavailable": false,
"partial": false
},
"retention": {
"expire_after": "30d",
"min_count": 5,
"max_count": 50
}
}
'
4.2.Running Policy Manually
curl -X POST "http://<ELASTICSEARCH_IP_ADDRESS>/_slm/policy/apinizer-slm-policy-<INDEX_KEY>/_execute?pretty"
4.3. Viewing Snapshot Records
curl -X GET "http://<ELASTICSEARCH_IP_ADDRESS>/_snapshot/apinizer-repository-<INDEX_KEY>/apinizer-snapshot-<INDEX_KEY>*?pretty"
Generally, backups (snapshots) made in institutions are kept in the same environment where indexing takes place. Afterwards, these snapshots may be requested to be stored in a different environment. In Elasticsearch, with just file transfer, data is not automatically thrown into another Elasticsearch cluster. In addition to file transfer, the snapshot structure must also be transferred in the same way. Just moving the files may damage the structure within the snapshot and prevent the backup from being restored.
When moving a snapshot to another cluster, the repository must be created first and then the snapshot.
When the snapshot process is completed, backed up indexes are not deleted. If the delete phase is activated in the index's ILM policy, it is deleted.
5. Taking Immediate Backups
5.1. Creating a Snapshot
curl –XPUT "http://<ELASTICSEARCH_IP_ADDRESS>/_snapshot/apinizer-repository-<INDEX_KEY>/apinizer-snapshot-<INDEX_KEY>?wait_for_completion=true" -H 'Content-Type: application/json' –d
'{
"indices":"index001, index002, index003",
"ignore_unavailable":true,
"include_global_state": false
}'
5.2. Restoring Index(es)
curl -XPOST "http://<ELASTICSEARCH_IP_ADDRESS>/_snapshot/apinizer-repository-<INDEX_KEY>/apinizer-snapshot-<INDEX_KEY>/_restore?pretty" -H 'Content-Type: application/json' -d
'{
"indices":"logapiproxy20200102,
"ignore_unavailable":true,
"include_global_state": false
}'
Elasticsearch Snapshot Migration and Restore Script
This script, after setting up the Snapshot policy, is responsible for moving the snapshot files created at a specific address to the backup server, restoring them there, and keeping them in a ready-to-read state.
There are specific requirements for the script to run:
- The log and backup servers must run on a Linux server that supports shell scripting.
- The backup server must have an Elasticsearch running with the same or a supported version as the existing Elasticsearch server.
- Communication between the log server and the backup server should be possible using protocols like SSH and SCP.
- The log server should support crontab (it is available by default in many popular Linux distributions).
- Basic Linux shell knowledge.
The steps to be applied in the script are as follows:
- Check the repository.
- Get the name and address of the snapshot file.
- Send the snapshot file to the restore server.
- Start the snapshot restore process.
- Check the status of the restore process.
#!/bin/bash
#Logs will be written to a file
current_date=$(date +'%d-%m-%Y')
exec > logfile$current_date.log 2>&1
#Server IP's need to be set
es_snapshot_ip="<ELASTICSEARCH_IP_ADDRESS>"
es_restore_ip="<ELASTICSEARCH_BACKUP_SERVER>"
repository_dst_location="<BACKUP_PATH_REPO>"
log_key="<INDEX_KEY>"
time_start=`date +%s`
echo -e "\n\nScript has started on \"`date`\""
es_snapshot_address="http://$es_snapshot_ip:9200"
es_restore_address="http://$es_restore_ip:9200"
echo "Variables:"
echo " es_snapshot_address: $es_snapshot_address"
echo " es_restore_address: $es_restore_address"
echo " repository_dst_location: $repository_dst_location"
##Show repositories, take name and path
repository_name=$(curl -XGET -s "$es_snapshot_address/_snapshot/_all" | jq -r 'keys[] | select(contains("repository"))')
repository_src_location=$(curl -XGET -s "$es_snapshot_address/_snapshot/_all" | jq -r ' .[].settings.location ' | head -1)
echo " repository_name: $repository_name"
echo " repository_src_location: $repository_src_location"
##Show snapshots on repository
echo -e "Command to be used: curl -XGET -s \"$es_snapshot_address/_snapshot/apinizer-repository-$log_key/_all\" | jq '.snapshots[].snapshot' | tr -d '\"' \n"
snapshot_name=$(curl -XGET -s "$es_snapshot_address/_snapshot/apinizer-repository-$log_key/_all" | jq '.snapshots[].snapshot' | tr -d '"')
echo " snapshot_name: $snapshot_name"
time_1=`date +%s`
echo -e "\nduration - since beginning: $((time_1-time_start)) seconds"
##Move Snapshot files to remote server
echo "---Moving Snapshot to remote server: Started"
size_snapshot=$(du -sh $repository_src_location)
echo "Snapshot file size: $size_snapshot"
size_dst_initial=$(ssh elasticsearch@$es_restore_ip "du -sh ${repository_dst_location/}")
echo "Target disk size before moving snapshot: $size_dst_initial"
echo "scp -r $repository_src_location/* elasticsearch@$es_restore_ip:$repository_dst_location/ &"
scp -r $repository_src_location/* elasticsearch@$es_restore_ip:$repository_dst_location/ &
SCP_PID=$!
wait $SCP_PID
echo "---Moving Snapshot to remote server: Done"
size_dst_afterscp=$(ssh elasticsearch@$es_restore_ip "du -sh ${repository_dst_location/}")
echo "Target disk size after moving snapshot: $size_dst_afterscp"
time_2=`date +%s`
echo "---duration - scp: $((time_2-time_1)) seconds"
if [ "$size_dst_initial" = "$size_dst_afterscp" ];
then
echo "Moving snapshot file has failed. Script is being terminated."
exit
fi
##Register repository on remote server
time_3=`date +%s`
echo -e "Command to be used: curl -XPUT \"$es_restore_address/_snapshot/$repository_name?pretty\" -H \"Content-Type: application/json\" -d '{ \"type\": \"fs\", \"settings\": { \"compress\" : \"true\", \"location\": \"$repository_dst_location\" } }' \n"
curl -XPUT "$es_restore_address/_snapshot/$repository_name?pretty" -H "Content-Type: application/json" -d '{
"type": "fs",
"settings": {
"compress" : "true",
"location": "'$repository_dst_location'"
}
}'
##Start restoring snapshot
echo -e "\nRestore: Started"
echo "Command to be used: curl -XPOST -s \"$es_restore_address/_snapshot/$repository_name/$snapshot_name/_restore?pretty\" -H \"Content-Type: application/json\" -d '{ \"indices\": \".ds-apinizer-log-token-$log_key-*,.ds-apinizer-token-oauth-$log_key-*,.ds-apinizer-log-apiproxy-$log_key-*\", \"rename_pattern\": \"(.ds-apinizer-)(.*$)\", \"rename_replacement\": \"restored_$1$2\" }'"
index_to_close_list=()
while [ true ]
do
curl -XPOST -s "$es_restore_address/_snapshot/$repository_name/$snapshot_name/_restore?pretty" -H "Content-Type: application/json" -d '{
"indices": ".ds-apinizer-log-token-$log_key-*,.ds-apinizer-log-apiproxy-$log_key-*",
"rename_pattern": "(.ds-apinizer-)(.*$)",
"rename_replacement": "restored_$1$2"
}' -o restore.output
is_error_exist=$(grep -oPm 1 'error' < restore.output)
if [ "$is_error_exist" = "error" ];then
##There are always expected to be at least 1 conflicted index (The last one). Those indexes needs to be closed to write on them
index_to_close=$(grep -oPm 1 'restored_.ds-apinizer-.*$log_key-\d{6}' < restore.output)
curl -XPOST -s "$es_restore_address/$index_to_close/_close?pretty" >> closed_indexes.output
index_to_close_list+=($index_to_close)
echo "Conflicted index has closed: $index_to_close"
else
echo "There is no conflicted indeks. Script is being continued."
break
fi
done
##Check restore process hourly
echo -e "Command to be used: curl -XGET -s \"$es_restore_address/_cluster/state\" | jq '.restore.snapshots[].state' \n"
echo "Restore process will be checked every hour before continuing to script."
while [ true ]
do
sleep 3600
restore_result=$(curl -XGET -s "$es_restore_address/_cluster/state" | jq '.restore.snapshots[].state')
if [[ "$restore_result" = "STARTED" ] || [ "$restore_result" = "INIT" ]];then
echo "Status of Restore process as per _cluster/state: $restore_result. Restore is in progress."
elif [ "$restore_result" = "DONE" ]; then
echo "Status of Restore process as per _cluster/state: $restore_result. Continuing to script."
break
elif [ "$restore_result" = "" ]; then
echo "Status of Restore process could not obtained from _cluster/state. Continuing to script."
break
fi
done
time_4=`date +%s`
echo "duration - restore: $((time_4-time_3)) seconds"
##Open closed indexes if there are any
for index in $index_to_close_list; do
curl -XPOST -s "$es_restore_address/$index/_open"
done
##Setting visibility of restored indexes to visible
echo -e "\nSet visibility of restored indexes: Started"
cluster_dst_state=$(curl -s "$es_restore_address/_cluster/state")
restored_indices=$(echo "$cluster_dst_state" | jq '.metadata.indices | keys | .[]' | grep '^"restored_.*"')
restored_indices=${restored_indices//\"}
echo -e "Command to be used: curl -XPUT -s \"$es_restore_address/INDEX/_settings?pretty\" -H 'Content-Type: application/json' -d'{ \"index.hidden\": false }' \n"
for index in $restored_indices; do
curl -XPUT -s "$es_restore_address/$index/_settings?pretty" -H 'Content-Type: application/json' -d'{
"index.hidden": false
}'
done
echo "Set visibility of restored indexes: Done"
time_5=`date +%s`
echo "duration - restored visibility: $((time_5-time_4)) seconds"
##Making sure of if the restore process done. Index counts should be the same as snapshot file has
echo -e "\nChecking restore results: Started"
echo -e "Command to be used: curl -s \"$es_restore_address/_snapshot/$repository_name/$snapshot_name?pretty\" \n"
snapshot_json=$(curl -s "$es_restore_address/_snapshot/$repository_name/$snapshot_name?pretty")
snapshot_indices_array=$(echo "$snapshot_json" | jq -r '.snapshots[0].indices[]')
snapshot_index_count=$(echo "$snapshot_indices_array" | grep -c ".")
echo "Total number of indices in snapshot: $snapshot_index_count"
recovery_info=$(curl -s "$es_restore_address/_cat/recovery")
filtered_lines=$(echo "$recovery_info" | grep "$snapshot_name" | awk '$14 == "100.0%" || $4 == "100.0%"')
completed_count=$(echo "$filtered_lines" | grep -c "100.0%")
uncompleted_count=$(echo "$filtered_lines" | grep -cv "100.0%")
echo "-Restore completed: $completed_count"
echo "-Restore uncompleted: $uncompleted_count"
if [ "$uncompleted_count" -gt 0 ];
then
echo -e "\nUncompleted Indices:"
echo "$filtered_lines" | grep -v "100.0%"
echo -e "\n\n---There are indexes that could not be restored!---\n\n"
elif [ "$uncompleted_count" -e 0 ];
then
echo -e "\n\n---Restore was successful."
echo "Snapshot file will be deleted by Elasticsearch according to SLM policy."
echo -e "To manually delete, following command can be used: curl -XDELETE \"$es_snapshot_address/_snapshot/$repository_name/$snapshot_name\" \n"
else
echo "Checking restore results has failed. Please check results manually."
fi
echo "Checking restore results: Done"
time_6=`date +%s`
echo "duration - checking restore results: $((time_6-time_5)) seconds"
time_end=`date +%s`
echo "\nduration - total time of script: $((time_end-time_start)) seconds"
##Clear the variables set to shell just in case
unset time_1 time_2 time_3 time_4 time_5 time_6 time_start time_end snapshot_json snapshot_indices_array snapshot_index_count recovery_info filtered_lines completed_count uncompleted_count current_date es_snapshot_address es_restore_address repository_dst_location repository_name repository_src_location snapshot_name cluster_dst_state restored_indices size_dst_afterscp size_dst_initial size_snapshot restore_result es_snapshot_ip es_restore_ip is_error_exist index_to_close_list index_to_close apinizer_adres
echo "Used variables has been cleansed."
echo -e "\n\nScript is done on \"`date`\" \n"
echo "Note: If there is a error log like -All shards failed-, those indexes needs to be deleted from remote cluster and restore process needs to be initialized partially."
##Script Ends##
How It Works:
- It is saved as a file at an appropriate address on your log server with a name like "ESMoveSnapshotAndRestore.sh". The script can be copied using Linux shell with editors like vi, nano, or it can be saved on a Windows server and transferred via SFTP using applications like WinSCP, MobaXterm.
- To make the file executable, permission is granted with the command "chmod +x ESMoveSnapshotAndRestore.sh".
- To avoid asking for a password during the connection with SCP and enable automatic connection, it leverages the SSH key authentication feature of SCP.
- A key is generated on the log server using "ssh-keygen." This key is then added to the backup server using the "ssh-copy-id" command.
- If not already present, the jq package is installed on the log server. For Ubuntu, the command "apt install jq" can be used, and for Red Hat, "yum install jq" can be used.
- The data.path and repo.path values in the Elasticsearch configuration files on both servers are checked.
- The variables in the script are configured according to your environments.
- ELASTICSEARCH_SERVER
- ELASTICSEARCH_BACKUP_SERVER
- LOG_KEY
- BACKUP_PATH_REPO
Usage:
Before running the script, enter your own information into the Elasticsearch variables.
chmod +x /path/to/ESMoveSnapshotAndRestore.sh
./path/to/ESMoveSnapshotAndRestore.sh &
This process can be done manually or set to repeat at specific intervals. To repeat it, you need to enter this record in the Linux cronjob settings.
CronJob Kullanım:
1) Open the cron editor by running the following command in the terminal:
crontab -e
2) In the opened editor, add a line based on how often you want to run the script.
For example, to run it on the 3rd day of every month at 23:00, you can write as follows:
0 23 3 * * /path/to/ESMoveSnapshotAndRestore.sh
To save the added line, press the Esc key, type ":wq", and press Enter.
In both methods, when the script is executed, the processes inside it will write to a file in the same folder with the format "logfile<DATE>.log".