Managing your elasticsearch indices with curator

If you are monitoring your environment using beats the default action is to rotate the indices every day and create new indices at midnight. After some time your disk will start filling up and it is very hard to see what you want to delete or keep and what you would like to keep or make snapshot of what would you like to delete and so on. For our help elastic developed a tool called curator.

In this post I will show you how to install, configure and run curator. This post will assume that you have a version of python installed with pip. If you don't have then it is a good time to install it before we move forward. To install curator all we need is to run the following command:

#pip install -U elasticsearch-curator

This by default will install the latest curator available. If you are running an older version of elasticsearch you can always check the compatibility map by looking at the map bellow.

In case you need a different version of curator to be installed you can specify the version like this:

#pip install -U elasticsearch-curator==<version>

Once the curator is installed it is time to create a configuration file. By default the curator config file will be located at ~/.curator/curator.yml therefore I will create it in a slightly different path.

#mkdir ~/.curator
#cat > ~/.curator/config.yml << EOF
---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - 127.0.0.1
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']
EOF

As you can see I am having a pretty default elasticsearch configuration. In case you have something more complex you can adapt the config file above to your needs.

Curator can be used as a command line interface or a singleton command line interface. The command line has the following format:

#curator --config [curator.yml] --dry-run ACTION_FIlE.yml

Now let's create an action file. I am creating the action file in the same folder as the configuration file, but that can be changed as needed.

#cat > ~/.curator/delete_indices.yml << EOF
---
actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 30 days based on index name
    options:
       ignore_empty_list: True
       disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '^(metric|heart)beat-.*'
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: '30'
EOF

With this action file I will delete any indices that has the name metricbeat-* or heartbeat-* that is older than 30 days.
Let's add more actions to this file. Since I have my beats configured to send monitoring data to elasticsearch I want to delete those indexes as well if they are older than 15 days.

 #cat >> ~/.curator/delete_indices.yml << EOF
 2:
    action: delete_indices
    description: >-
      Delete indices older than 30 days based on index name
    options:
       ignore_empty_list: True
       disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: .monitoring-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: '15'
EOF

After doing some cleaning let's add another action for creating snapshots of the more important indices which we do not want to loose. Creating a snapshot requires some extra steps for creating a shared storage or using aws s3 or google buckets and adding some more configuration arguments to elasticsearch which you can find here.

Before doing anything we need to create a snapshot repository. For that we will be using a different tool called es_repo_mgr. To add a repo we will be running the following command:

 #es_repo_mgr --config .curator/config.yml create fs --repository filebeat_backup --location /bkp --compression True --skip_repo_fs_check True

To check if the repo was created we are running:

#es_repo_mgr --config .curator/config.yml show
filebeat_backup

Good. Now that the repo is created we can start adding another action to the action file for creating the snapshots of the latest indexes created by filebeat.

#cat >> ~/.curator/delete_indices.yml << EOF
3:
  action: snapshot
  description: >-
    Snapshot selected indices to 'repository' with the snameshot name or name pattern in 'name'
  options:
    repository: filebeat_backup
    # leaving the bane bank will result in the default 'curator-%Y%m%d%H%M%S' format
    name: curator-%Y.%m.%d-%H:%M:%S
    wait_for_completion: True
    max_wait: 3600
    wait_interval: 10
    skip_repo_fs_check: True
  filters:
  - filtertype: pattern
    kind: prefix
    value: filebeat-
  - filtertype: age
    source: creation_date
    direction: younger
    unit: days
    unit_count: 1
    EOF

Once that is done it is time to run our action file with curator.

#curator --config ~/.curator/config.yml ~/.curator/delete_indices.yml

It will run for a short while. If you have configured a log file then all the details will be shown in the log file otherwise the result will be shown on the console. Let's check if the snapshots were created. For this I will be using the singleton command line interface which is curator_cli. The singleton command line interface works exactly as the curator command but the actions are provided as a command line argument instead of a config file. Example:

#curator_cli --config ~/.curator/config.yml show_indices --verbose
filebeat-6.5.0-2019.01.01    open   26.2MB    68564   3   1 2019-01-01T00:00:01Z
filebeat-6.5.0-2019.01.02    open   73.5MB   165566   3   1 2019-01-02T00:00:06Z
filebeat-6.5.0-2019.01.03    open   79.2MB   178480   3   1 2019-01-03T00:00:01Z
filebeat-6.5.0-2019.01.04    open  142.5MB   295582   3   1 2019-01-04T00:00:02Z
filebeat-6.5.0-2019.01.05    open   66.7MB   153728   3   1 2019-01-05T00:00:01Z
filebeat-6.5.0-2019.01.06    open   65.6MB   148712   3   1 2019-01-06T00:00:02Z
filebeat-6.5.0-2019.01.07    open   82.6MB   177506   3   1 2019-01-07T00:00:02Z

For checking the snapshots I will be running the following:

#curator_cli --config .curator/config.yml show_snapshots --repository filebeat_backup 
curator-2019.01.24-02:59:29

As I can see there is a snapshot successfully created. For more options you can run:

#curator_cli --help

All we need to do now is to add a cronjob to run the commands periodically and then it will be cleaning up your environment and creating backups of the important indexes as needed.