Update the number of shards an ElasticSearch index has

When we have a lot of data in ElasticSearch, queries can become slow. Each query is basically mono-threaded (even if parallel queries can be managed by multiple threads) for as long as we have only a single shard to host our data. The goal of this page is to explain how to increase the number of shards an index has, in order to (hopefully) speed-up queries on large datasets.

Find the uuid of the target index

curl -X GET "localhost:9200/_cat/indices?v=true&pretty"

In the following example, we will target uuid 4f47eed0-574d-47b3-86bc-8e5591c67725.

You can check in OKA the UUID of the asset you are targeting by going (as an admin) in Admin Panel -> Data manager -> Asset uuids -> <cluster name>, then search for the asset named <cluster name>_log_js_fetch output_elastic.

Migrate data to a new index, with multiple shards

We cannot directly update the number of shards of an existing index, we will need to generate a new index, then copy all data from the target index to the new index, and then delete the target index.

Warning

During the migration of the data to the new index, you will consume twice the disk storage that you need for the target index.

Let’s first generate a uuid for our new index:

uuid

In this example, we obtained 8a2c1878-4988-11ef-911f-02e6f309883d.

Let’s then generate a new index, named 8a2c1878-4988-11ef-911f-02e6f309883d, with x shards (here 4 shards), you can also set the number of replicas if you have multiple data nodes:

curl -X PUT "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 4,
      "number_of_replicas": 0
    }
  },
  "mappings": {
    "properties": {
      "field_name": { "type": "text" }
    }
  }
}'

Reindex the target index to the new index

curl -X POST "http://127.0.0.1:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "4f47eed0-574d-47b3-86bc-8e5591c67725"
  },
  "dest": {
    "index": "8a2c1878-4988-11ef-911f-02e6f309883d"
  }
}'

This step can be quite long as this is a synchronous task. We recommend that you execute this command in a screen or tmux terminal in order to prevent any issue in case of a network disconnection, OR you can use the wait_for_completion=false option. In this case, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at _tasks/<task_id>. You can monitor your tasks with:

curl -X GET "http://127.0.0.1:9200/_tasks/<task_id>?pretty"

Check the number of elements in each index

curl -X GET "http://127.0.0.1:9200/4f47eed0-574d-47b3-86bc-8e5591c67725/_count?pretty"
curl -X GET "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d/_count?pretty"

Check that the index has the right configuration and mapping

curl -X GET "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d?pretty"

Check if we already have an alias

curl -X GET "localhost:9200/_cat/aliases?v=true&pretty"

Stop OKA

sudo systemctl stop oka

Delete old index

curl -X DELETE "http://127.0.0.1:9200/4f47eed0-574d-47b3-86bc-8e5591c67725"

Create an alias named as the old index, and with the new index as the target

curl -X POST "http://127.0.0.1:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    { "add": { "index": "8a2c1878-4988-11ef-911f-02e6f309883d", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }}
  ]
}'

If we already have an alias (see Delete old index step), remove it first: by adding the following line as the first item in the actions in the above code snippet:

{ "remove": { "index": "ea2ce3ed-4938-4279-a630-4998edae0f44", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }},
curl -X POST "http://127.0.0.1:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    { "add": { "index": "8a2c1878-4988-11ef-911f-02e6f309883d", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }}
  ]
}'

Check newly created alias

curl -X GET "localhost:9200/_cat/aliases?v=true&pretty"

Restart OKA

sudo systemctl restart oka