Managing your elasticsearch indices with curator
If you are monitoring your environment using beats the default action is to rotate the indices every day and create new indices at midnight. After some time your disk will start filling up and it is very hard to see what you want to delete or keep and what you would like to keep or make snapshot of what would you like to delete and so on. For our help elastic developed a tool called curator.
In this post I will show you how to install, configure and run curator. This post will assume that you have a version of python installed with pip. If you don't have then it is a good time to install it before we move forward. To install curator all we need is to run the following command:
#pip install -U elasticsearch-curator
This by default will install the latest curator available. If you are running an older version of elasticsearch you can always check the compatibility map by looking at the map bellow.
In case you need a different version of curator to be installed you can specify the version like this:
#pip install -U elasticsearch-curator==<version>
Once the curator is installed it is time to create a configuration file. By default the curator config file will be located at ~/.curator/curator.yml
therefore I will create it in a slightly different path.
#mkdir ~/.curator
#cat > ~/.curator/config.yml << EOF
---
# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType"
client:
hosts:
- 127.0.0.1
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']
EOF
As you can see I am having a pretty default elasticsearch configuration. In case you have something more complex you can adapt the config file above to your needs.
Curator can be used as a command line interface or a singleton command line interface. The command line has the following format:
#curator --config [curator.yml] --dry-run ACTION_FIlE.yml
Now let's create an action file. I am creating the action file in the same folder as the configuration file, but that can be changed as needed.
#cat > ~/.curator/delete_indices.yml << EOF
---
actions:
1:
action: delete_indices
description: >-
Delete indices older than 30 days based on index name
options:
ignore_empty_list: True
disable_action: False
filters:
- filtertype: pattern
kind: regex
value: '^(metric|heart)beat-.*'
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: '30'
EOF
With this action file I will delete any indices that has the name metricbeat-*
or heartbeat-*
that is older than 30 days.
Let's add more actions to this file. Since I have my beats configured to send monitoring data to elasticsearch I want to delete those indexes as well if they are older than 15 days.
#cat >> ~/.curator/delete_indices.yml << EOF
2:
action: delete_indices
description: >-
Delete indices older than 30 days based on index name
options:
ignore_empty_list: True
disable_action: False
filters:
- filtertype: pattern
kind: prefix
value: .monitoring-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: '15'
EOF
After doing some cleaning let's add another action for creating snapshots of the more important indices which we do not want to loose. Creating a snapshot requires some extra steps for creating a shared storage or using aws s3 or google buckets and adding some more configuration arguments to elasticsearch which you can find here.
Before doing anything we need to create a snapshot repository. For that we will be using a different tool called es_repo_mgr
. To add a repo we will be running the following command:
#es_repo_mgr --config .curator/config.yml create fs --repository filebeat_backup --location /bkp --compression True --skip_repo_fs_check True
To check if the repo was created we are running:
#es_repo_mgr --config .curator/config.yml show
filebeat_backup
Good. Now that the repo is created we can start adding another action to the action file for creating the snapshots of the latest indexes created by filebeat.
#cat >> ~/.curator/delete_indices.yml << EOF
3:
action: snapshot
description: >-
Snapshot selected indices to 'repository' with the snameshot name or name pattern in 'name'
options:
repository: filebeat_backup
# leaving the bane bank will result in the default 'curator-%Y%m%d%H%M%S' format
name: curator-%Y.%m.%d-%H:%M:%S
wait_for_completion: True
max_wait: 3600
wait_interval: 10
skip_repo_fs_check: True
filters:
- filtertype: pattern
kind: prefix
value: filebeat-
- filtertype: age
source: creation_date
direction: younger
unit: days
unit_count: 1
EOF
Once that is done it is time to run our action file with curator.
#curator --config ~/.curator/config.yml ~/.curator/delete_indices.yml
It will run for a short while. If you have configured a log file then all the details will be shown in the log file otherwise the result will be shown on the console. Let's check if the snapshots were created. For this I will be using the singleton command line interface which is curator_cli
. The singleton command line interface works exactly as the curator command but the actions are provided as a command line argument instead of a config file. Example:
#curator_cli --config ~/.curator/config.yml show_indices --verbose
filebeat-6.5.0-2019.01.01 open 26.2MB 68564 3 1 2019-01-01T00:00:01Z
filebeat-6.5.0-2019.01.02 open 73.5MB 165566 3 1 2019-01-02T00:00:06Z
filebeat-6.5.0-2019.01.03 open 79.2MB 178480 3 1 2019-01-03T00:00:01Z
filebeat-6.5.0-2019.01.04 open 142.5MB 295582 3 1 2019-01-04T00:00:02Z
filebeat-6.5.0-2019.01.05 open 66.7MB 153728 3 1 2019-01-05T00:00:01Z
filebeat-6.5.0-2019.01.06 open 65.6MB 148712 3 1 2019-01-06T00:00:02Z
filebeat-6.5.0-2019.01.07 open 82.6MB 177506 3 1 2019-01-07T00:00:02Z
For checking the snapshots I will be running the following:
#curator_cli --config .curator/config.yml show_snapshots --repository filebeat_backup
curator-2019.01.24-02:59:29
As I can see there is a snapshot successfully created. For more options you can run:
#curator_cli --help
All we need to do now is to add a cronjob to run the commands periodically and then it will be cleaning up your environment and creating backups of the important indexes as needed.