Update the number of shards an ElasticSearch index has
When we have a lot of data in ElasticSearch, queries can become slow. Each query is basically mono-threaded (even if parallel queries can be managed by multiple threads) for as long as we have only a single shard to host our data. The goal of this page is to explain how to increase the number of shards an index has, in order to (hopefully) speed-up queries on large datasets.
Find the uuid of the target index
curl -X GET "localhost:9200/_cat/indices?v=true&pretty"
In the following example, we will target uuid 4f47eed0-574d-47b3-86bc-8e5591c67725
.
You can check in OKA the UUID of the asset you are targeting by going (as an admin) in Admin Panel
-> Data manager
-> Asset uuids
-> <cluster name>
, then search for the asset named <cluster name>_log_js_fetch output_elastic
.
Migrate data to a new index, with multiple shards
We cannot directly update the number of shards of an existing index, we will need to generate a new index, then copy all data from the target index to the new index, and then delete the target index.
Warning
During the migration of the data to the new index, you will consume twice the disk storage that you need for the target index.
Let’s first generate a uuid for our new index:
uuid
In this example, we obtained 8a2c1878-4988-11ef-911f-02e6f309883d
.
Let’s then generate a new index, named 8a2c1878-4988-11ef-911f-02e6f309883d
, with x shards (here 4 shards), you can also set the number of replicas if you have multiple data nodes:
curl -X PUT "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 4,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"field_name": { "type": "text" }
}
}
}'
Reindex the target index to the new index
curl -X POST "http://127.0.0.1:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "4f47eed0-574d-47b3-86bc-8e5591c67725"
},
"dest": {
"index": "8a2c1878-4988-11ef-911f-02e6f309883d"
}
}'
This step can be quite long as this is a synchronous task. We recommend that you execute this command in a screen
or tmux
terminal in order to prevent any issue in case of a network disconnection, OR you can use the wait_for_completion=false
option. In this case, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at _tasks/<task_id>
. You can monitor your tasks with:
curl -X GET "http://127.0.0.1:9200/_tasks/<task_id>?pretty"
Check the number of elements in each index
curl -X GET "http://127.0.0.1:9200/4f47eed0-574d-47b3-86bc-8e5591c67725/_count?pretty"
curl -X GET "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d/_count?pretty"
Check that the index has the right configuration and mapping
curl -X GET "http://127.0.0.1:9200/8a2c1878-4988-11ef-911f-02e6f309883d?pretty"
Check if we already have an alias
curl -X GET "localhost:9200/_cat/aliases?v=true&pretty"
Stop OKA
sudo systemctl stop oka
Delete old index
curl -X DELETE "http://127.0.0.1:9200/4f47eed0-574d-47b3-86bc-8e5591c67725"
Create an alias named as the old index, and with the new index as the target
curl -X POST "http://127.0.0.1:9200/_aliases" -H 'Content-Type: application/json' -d'
{
"actions": [
{ "add": { "index": "8a2c1878-4988-11ef-911f-02e6f309883d", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }}
]
}'
If we already have an alias (see Delete old index
step), remove it first: by adding the following line as the first item in the actions in the above code snippet:
{ "remove": { "index": "ea2ce3ed-4938-4279-a630-4998edae0f44", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }},
curl -X POST "http://127.0.0.1:9200/_aliases" -H 'Content-Type: application/json' -d'
{
"actions": [
{ "add": { "index": "8a2c1878-4988-11ef-911f-02e6f309883d", "alias": "4f47eed0-574d-47b3-86bc-8e5591c67725" }}
]
}'
Check newly created alias
curl -X GET "localhost:9200/_cat/aliases?v=true&pretty"
Restart OKA
sudo systemctl restart oka