Saving All Query Result to File with Bash Script with Scroll API

The logs kept on Apinizer may need to be transferred to other environments or examined using other products.

In such cases, it is necessary to query the data kept in Apinizer Log database ElasticSearch and save it to the file. Due to the structure of Elastic Search, more than 1000 records are not returned to the queries made.

In cases where the total number of records exceeds 1000, it is necessary to query with the Scroll API.

This process should be done in a loop, as the result coming with the Scroll API may need to be processed and querying again.

You can find the implementation of this loop with Linux Script below.

Prerequisite: JQ(Json Processor) Installation

The JQ package must be installed on the server for Bash Script to work properly.

You can follow the steps below for this setup:

1.Install the EPEL repository

yum install epel-release -y

CODE

2.Update your server

 yum update -y

CODE

3.Install the jq(JSON Processor) tool

 yum install jq -y

CODE

Scrolling Script

The script below should be saved in a directory with the name script.sh and made executable with the chmod 777 command.

#!/bin/bash

es_url='http://172.16.0.49:9200'
index=apinizer-log-apiproxy-lelc


response=$(curl -X GET -s $es_url/$index/_search?scroll=1m -H 'Content-Type: application/json' -d @query.json)
scroll_id=$(echo $response | jq -r ._scroll_id)
hits_count=$(echo $response | jq -r '.hits.hits | length')
hits_so_far=${hits_count}
echo Got initial response with $hits_count hits and scroll ID $scroll_id

# process first page of results here (ex. put the response into result.json)
echo $response | jq . >> result.json

while [ "$hits_count" != "0" ]; do

 
  response=$(curl -X GET -s $es_url/_search/scroll -H 'Content-Type: application/json' -d "{ \"scroll\": \"1m\", \"scroll_id\": \"$scroll_id\" }")
  scroll_id=$(echo $response | jq -r ._scroll_id)
  hits_count=$(echo $response | jq -r '.hits.hits | length')
  hits_so_far=$((hits_so_far + hits_count))
  echo "Got response with $hits_count hits (hits so far: $hits_so_far), new scroll ID $scroll_id"

  # process page of results (ex. put the response into result.json)
  echo $response | jq . >> result.json
done

echo Done!

#script reference: https://gist.github.com/toripiyo/8b14e8a387069bae372d49296b0077d7

CODE

Example Query

The following query needs to be saved in the same directory as the script.sh file with the name query.json.

Since this query needs to be sent to Apinizer ElasticSearch, the requested address and index name must be corrected according to your environment.

curl --location --request POST 'http://10.10.10.10:9200/apinizer-log-apiproxy-lelc/_search' --header 'Content-Type: application/json' --data-raw '{
    "from": 0,
    "size": 3000000,
    "query": {
        "bool": {
            "filter": [
                 
                {
                    "bool": {
                        "filter": [
                            {
                                "bool": {
                                    "filter": [
                                        {
                                            "match": {
                                                "uok": {
                                                    "query": "username",
                                                    "operator": "OR",
                                                    "prefix_length": 0,
                                                    "max_expansions": 50,
                                                    "fuzzy_transpositions": true,
                                                    "lenient": false,
                                                    "zero_terms_query": "NONE",
                                                    "auto_generate_synonyms_phrase_query": true,
                                                    "boost": 1.0
                                                }
                                            }
                                        }
                                    ],
                                    "adjust_pure_negative": true,
                                    "boost": 1.0
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                    }
                },
                {
                    "bool": {
                        "filter": [
                            
                            {
                                "bool": {
                                    "should": [
                                        {
                                            "term": {
                                                "pi": {
                                                    "value": "6130d19b59f2007bff548d29",
                                                    "boost": 1.0
                                                }
                                            }
                                        }
                                    ],
                                    "adjust_pure_negative": true,
                                    "boost": 1.0
                                }
                            },
                            {
                                "range": {
                                    "@timestamp": {
                                        "from": "now-4320m/m",
                                        "to": "now/m",
                                        "include_lower": true,
                                        "include_upper": true,
                                        "boost": 1.0
                                    }
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "_source": {
        "includes": [
            "@timestamp",
            "uok",
            "fcrb",
            "sc",
            "pet",
            "rt",
            "tch",
            "tcb",
            "hr1ra",
            "et",
            "fcrh"
        ],
        "excludes": []
    }
}'

CODE

You can visit this page to see what the fields in this query mean.

Running the Script

What you need to do for this is to write ./script.sh from the script.

After that, information notes will start to appear as below, and the results will accumulate in the result.json file.