Elasticsearch
This section presents the different actions related to Elasticsearch that you might be required to do while updating/upgrading OKA.
Backup
First, ensure that Elasticsearch database is running and accessible .
- Backup your Elasticsearch database. To learn more about Elasticsearch snapshots see How to create a snapshot.
Note: the snapshots are incremental, so not all the data are saved during each snapshot/backup process.
You must configure your snapshot directory (here we chose
/dir_path/backup_repo) before trying to take a snapshot for the first time:Create snapshot directory:
mkdir -p /dir_path/backup_repo
Grant read/write access to the directory for Elasticsearch to store snapshots:
sudo chown elasticsearch: /dir_path//backup_repo
Add this line in
/etc/elasticsearch/elasticsearch.yml:path.repo: /dir_path/backup_repo
Restart Elasticsearch:
sudo systemctl restart elasticsearch
Query Elasticsearch to create a snapshot directory (in our case it will be
/dir_path/backup_repo), this directory does not need to exist beforehand, Elasticsearch will create it automatically. You can replacemy_backupby the name you want:curl -X PUT "http://host:port/_snapshot/my_backup" -H 'Content-Type: application/json' -d ' { "type": "fs", "settings": { "location": "/dir_path/backup_repo", "compress": "true" } }'
Backup your Elasticsearch database (take a snapshot). We recommend that you create a snapshot with a name that includes the current date (see below example with
$(date +%Y%m%d)). Thewait_for_completion=trueoption will make the command synchronous: it will wait until the backup has been completed:cuindexrl -X PUT "http://host:port/_snapshot/my_backup/snapshot_oka_$(date +%Y%m%d)?wait_for_completion=true" -H 'Content-Type: application/json' -d ' { "ignore_unavailable": true, "include_global_state": true, "metadata": { "taken_by": "your_name", "taken_because": "backup before upgrading OKA" } }'
If you don’t include the
wait_for_completion=trueoption (thus the above command is asynchronous), you can still check the completion of your snapshot with:SNAPSHOT_NAME=snapshot_oka_$(date +%Y%m%d) curl -s "http://${ES_HOST}/_snapshot/my_backup/${SNAPSHOT_NAME}"
On completion, check in the output the
"state"of the backup, it should be"SUCCESS".
Note
If your Elasticsearch is secured with https (recommended), you need to change the http:// by https:// in the
above commands.
If you also need to specify a username and password (recommended) to connect, then you also need to specify the credentials
by adding -u username:password to the curl command.
Warning
The default behavior for selecting indices in a snapshot operation is to include all indices ("indices": "*").
To specify a selection of specific indices, the “indices” field should be modified to list the desired index names, separated by commas
("indices": "index1,index2,index3").
OKA’s main indices that should be backed-up can be seen per cluster directly on the Cluster page.
Note
Snapshots must be identified by unique names (${SNAPSHOT_NAME}). The above command use the date of the snapshot as unique identifier (snapshot_oka_$(date +%Y%m%d)).
Note
You can add multiple metadata to your snapshot. We recommend to add at least your name and the reason why the snapshot was taken.
Note
Over time, snapshot repositories can accumulate stale data that is no longer referenced by existing snapshots.
Use this command to clean the snapshot repository curl -XPOST "http://host:port/_snapshot/my_backup/_cleanup".
In case you need to copy the snapshot to another Elasticsearch (e.g., preproduction VM), we recommend that you run this command first.
This command can take a long time to complete.
Restore
Important
If restoring to a new version of Elasticsearch, the 4 steps in configure your snapshot directory should be repeated for this new version.
# Here we restore all indices, apart from the system indices (``-.*`` meaning to exclude all indices starting by ``.``)
curl -X POST "http://host:port/_snapshot/my_backup/${SNAPSHOT_NAME}/_restore" -H 'Content-Type: application/json' -d '
{
"indices": "*,-.*",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": true
}'
Note
Modify the snapshot name (${SNAPSHOT_NAME}) to match the one provided during the backup procedure.
You can list the available snapshots with:
curl -X GET "http://host:port/_cat/snapshots/${SNAPSHOT_NAME}"
Note
You can make this command synchronous by adding ?wait_for_completion=true after _restore.
Migrating from Elasticsearch 7 to Elasticsearch 9
Important
Elasticsearch does not support migrating directly from version 7 to version 9, you will need to first migrate to version 8, then to version 9.
First of all, create a snapshot of you Elasticsearch database in v7.
On your new Elasticsearch server, install Elasticsearch 8 (see Elasticsearch).
Copy your snapshot from your v7 server to your v8 server, and import you snapshot in v8 (see above on how to restore).
Check depreciations. You should see depreciations in the output of the following command:
curl -X GET "http://localhost:9200/_migration/deprecations?pretty"
You can then use the following script to reindex the indices for them to be v8 compatible, and still be accessible by OKA (creation of aliases):
reindex.oka.sh
#!/bin/bash ################################################################################ # Copyright (c) 2017-2025 UCit SAS # All Rights Reserved # # This software is the confidential and proprietary information # of UCit SAS ("Confidential Information"). # You shall not disclose such Confidential Information # and shall use it only in accordance with the terms of # the license agreement you entered into with UCit. ################################################################################ # ================================================================ # OKA Elasticsearch Reindexing Script for ES9 Migration # ================================================================ # This script reindexes all Elasticsearch indices to make them # compatible with Elasticsearch 9.x # ================================================================ # Default configuration ES_HOST="localhost" ES_PORT="9200" ES_SCHEME="http" ES_USER="" ES_PASSWORD="" CONFIRM_EACH_INDEX=false SKIP_ALREADY_REINDEXED=true LOG_FILE="/tmp/reindex_oka_$(date +%Y%m%d_%H%M%S).log" # Colors for output RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color # ================================================================ # Helper Functions # ================================================================ show_usage() { cat << EOF Usage: $0 [OPTIONS] Options: -h, --host HOST Elasticsearch host (default: localhost) -p, --port PORT Elasticsearch port (default: 9200) -s, --scheme SCHEME Connection scheme: http or https (default: http) -u, --user USER Elasticsearch username (if authentication required) -w, --password PASS Elasticsearch password (if authentication required) -c, --confirm-each Ask for confirmation before each index reindexing --no-skip-reindexed Don't skip indices that are already reindexed (default: skip them) -l, --log-file FILE Log file path (default: /tmp/reindex_oka_TIMESTAMP.log) --help Show this help message Examples: # Basic usage (local ES without auth) $0 # ES with HTTPS and authentication $0 --scheme https --user elastic --password mypassword # Confirm each index before reindexing $0 --confirm-each # Re-run and only process failed indices $0 # Process all indices even if already reindexed $0 --no-skip-reindexed EOF exit 0 } log() { echo -e "$1" | tee -a "${LOG_FILE}" } log_success() { log "${GREEN}✓ $1${NC}" } log_error() { log "${RED}✗ $1${NC}" } log_warning() { log "${YELLOW}⚠ $1${NC}" } log_info() { log "${BLUE}ℹ $1${NC}" } log_step() { log "${BLUE} → $1${NC}" } # Build curl command with authentication if needed curl_es() { local auth_param="" if [[ -n "${ES_USER}" ]] && [[ -n "${ES_PASSWORD}" ]]; then auth_param="-u ${ES_USER}:${ES_PASSWORD}" fi # Longer timeouts for reindex operations curl --connect-timeout 10 --max-time 600 -s -k "${auth_param}" "$@" } # Get full ES URL get_es_url() { echo "${ES_SCHEME}://${ES_HOST}:${ES_PORT}" } # Check if index is already reindexed (ends with -v8) is_already_reindexed() { local index=$1 [[ "${index}" =~ -v8$ ]] } # Check if corresponding v8 index exists has_v8_version() { local old_index=$1 local new_index="${old_index}-v8" local es_url es_url=$(get_es_url) local check check=$(curl_es "${es_url}/_cat/indices/${new_index}?h=index" | tr -d '[:space:]') [[ "${check}" = "${new_index}" ]] } # ================================================================ # Main Reindexing Function # ================================================================ reindex_with_smart_alias() { local old_index=$1 local new_index="${old_index}-v8" local es_url es_url=$(get_es_url) log "\n=========================================" log "Processing index: ${old_index}" log "=========================================" # Check if already reindexed if [[ "${SKIP_ALREADY_REINDEXED}" = true ]] && has_v8_version "${old_index}"; then log_warning "Index already has a -v8 version, skipping" return 2 fi # Ask for confirmation if enabled if [[ "${CONFIRM_EACH_INDEX}" = true ]]; then read -r -p "Reindex this index? (y/n): " confirm if [[ "${confirm}" != "y" ]] && [[ "${confirm}" != "Y" ]]; then log_warning "Skipped by user" return 2 fi fi # Step 1: Check if index exists log_step "Step 1/10: Checking if index exists..." local index_check index_check=$(curl_es "${es_url}/_cat/indices/${old_index}?h=index" | tr -d '[:space:]') if [[ "${index_check}" = "${old_index}" ]]; then log_success "Index exists" else log_error "Index ${old_index} does not exist, skipping" return 1 fi # Step 2: Retrieve existing aliases log_step "Step 2/10: Retrieving existing aliases..." local alias_json alias_json=$(curl_es "${es_url}/${old_index}/_alias") local existing_aliases existing_aliases=$(echo "${alias_json}" | python3 -c " import sys, json try: data = json.load(sys.stdin) for idx, idx_data in data.items(): aliases = list(idx_data.get('aliases', {}).keys()) print(','.join(aliases)) except: pass " 2>/dev/null) # Step 3: Determine target aliases log_step "Step 3/10: Determining target aliases..." local target_aliases="" local need_name_alias=false if [[ -n "${existing_aliases}" ]] && [[ "${existing_aliases}" != "" ]]; then target_aliases="${existing_aliases}" log_info "Found existing aliases: ${target_aliases}" else target_aliases="${old_index}" need_name_alias=true log_info "No existing aliases, will create alias '${old_index}'" fi # Step 4: Count documents in old index log_step "Step 4/10: Counting documents in old index..." local old_count old_count=$(curl_es "${es_url}/${old_index}/_count" | python3 -c "import sys, json; print(json.load(sys.stdin).get('count', 0))" 2>/dev/null) log_info "Document count: ${old_count}" # Step 5: Retrieve settings and mappings log_step "Step 5/10: Retrieving index settings and mappings..." local index_config index_config=$(curl_es "${es_url}/${old_index}") log_success "Settings and mappings retrieved" # Step 6: Create new index with proper settings including analyzers log_step "Step 6/10: Creating new index: ${new_index}..." local new_index_config new_index_config=$(echo "${index_config}" | python3 -c " import sys, json try: data = json.load(sys.stdin) old_idx = list(data.keys())[0] settings = data[old_idx].get('settings', {}).get('index', {}) mappings = data[old_idx].get('mappings', {}) # Clean settings (remove those that cannot be set at creation) clean_settings = {} # Copy important settings including analysis allowed_settings = [ 'number_of_shards', 'number_of_replicas', 'refresh_interval', 'max_result_window', 'analysis', 'similarity', 'max_ngram_diff', 'max_shingle_diff' ] for key in allowed_settings: if key in settings: clean_settings[key] = settings[key] # Also check for analysis in the root settings object root_settings = data[old_idx].get('settings', {}) if 'analysis' in root_settings and 'analysis' not in clean_settings: clean_settings['analysis'] = root_settings['analysis'] # Keep original number of shards, set 0 replicas for faster reindexing # If number_of_shards is not present in settings, default to 1 if 'number_of_shards' not in clean_settings: clean_settings['number_of_shards'] = 1 # Always set replicas to 0 during reindexing for performance clean_settings['number_of_replicas'] = 0 result = { 'settings': { 'index': clean_settings }, 'mappings': mappings } print(json.dumps(result, indent=2)) except Exception as e: import traceback print(json.dumps({ 'error': str(e), 'traceback': traceback.format_exc(), 'settings': {'number_of_shards': 1, 'number_of_replicas': 0} }), file=sys.stderr) print(json.dumps({'settings': {'number_of_shards': 1, 'number_of_replicas': 0}})) " 2>/tmp/python_error_$$.log) # Check if Python had errors if [[ -s /tmp/python_error_$$.log ]]; then log_warning "Python processing had warnings:" cat /tmp/python_error_$$.log | tee -a "${LOG_FILE}" rm -f /tmp/python_error_$$.log fi local create_result create_result=$(curl_es -X PUT "${es_url}/${new_index}" \ -H 'Content-Type: application/json' \ -d "${new_index_config}") if echo "${create_result}" | grep -q "acknowledged"; then log_success "New index created successfully" # Display the number of shards used local shards_count shards_count=$(echo "${new_index_config}" | python3 -c "import sys, json; d=json.load(sys.stdin); print(d.get('settings', {}).get('index', {}).get('number_of_shards', 'unknown'))" 2>/dev/null) log_info "Number of shards: ${shards_count}" else log_error "Failed to create new index" log_error "Response: ${create_result}" log_error "Config used:" echo "${new_index_config}" | tee -a "${LOG_FILE}" # Ask user what to do echo "" log_warning "An error occurred while creating the index." read -r -p "Do you want to (c)ontinue with next index, (r)etry this index, or (a)bort? [c/r/a]: " action case ${action} in r|R) log_info "Retrying..." reindex_with_smart_alias "${old_index}" return $? ;; a|A) log_error "Aborting script as requested by user" exit 1 ;; *) log_info "Continuing with next index..." return 1 ;; esac fi # Step 7: Reindex data (ASYNC VERSION WITH PROGRESS TRACKING) log_step "Step 7/10: Reindexing data asynchronously (this may take a while)..." log_info "Started at: $(date '+%Y-%m-%d %H:%M:%S')" local reindex_start reindex_start=$(date +%s) # Start async reindex local reindex_task reindex_task=$(curl_es -X POST "${es_url}/_reindex?wait_for_completion=false" \ -H 'Content-Type: application/json' \ -d "{ \"source\": { \"index\": \"${old_index}\" }, \"dest\": { \"index\": \"${new_index}\", \"op_type\": \"create\" } }") # Extract task ID local task_id task_id=$(echo "${reindex_task}" | python3 -c "import sys, json; print(json.load(sys.stdin).get('task', ''))" 2>/dev/null) if [[ -z "${task_id}" ]] || [[ "${task_id}" = "" ]]; then log_error "Failed to start reindex task" log_error "Response: ${reindex_task}" echo "" read -r -p "Do you want to (c)ontinue with next index, (r)etry this index, or (a)bort? [c/r/a]: " action case ${action} in r|R) curl_es -X DELETE "${es_url}/${new_index}" >/dev/null 2>&1 log_info "Retrying..." reindex_with_smart_alias "${old_index}" return $? ;; a|A) log_error "Aborting script as requested by user" exit 1 ;; *) log_info "Continuing with next index..." return 1 ;; esac fi log_info "Reindex task ID: ${task_id}" # Poll task status local completed=false local check_interval=5 local max_wait=3600 # 1 hour max local elapsed=0 local last_progress_time=0 while [[ "${completed}" = false ]] && [[ ${elapsed} -lt ${max_wait} ]]; do sleep "${check_interval}" elapsed=$((elapsed + check_interval)) local task_status task_status=$(curl_es "${es_url}/_tasks/${task_id}") local is_completed is_completed=$(echo "${task_status}" | python3 -c "import sys, json; print(str(json.load(sys.stdin).get('completed', False)))" 2>/dev/null) if [[ "${is_completed}" = "True" ]]; then completed=true local reindex_result="${task_status}" break fi # Show progress every 5 seconds if [[ $((elapsed - last_progress_time)) -ge 5 ]]; then last_progress_time=${elapsed} local progress progress=$(echo "${task_status}" | python3 -c " import sys, json try: data = json.load(sys.stdin) status = data.get('task', {}).get('status', {}) created = status.get('created', 0) total = status.get('total', 0) if total > 0: pct = (created * 100) // total print(f'{created}/{total} ({pct}%)') else: print('In progress...') except: print('Checking...') " 2>/dev/null) log_info "Progress: ${progress} (${elapsed}s elapsed)" fi done if [[ "${completed}" = false ]]; then log_error "Reindex task did not complete within ${max_wait}s" log_warning "Task ${task_id} may still be running in background" log_warning "You can check its status with: curl ${es_url}/_tasks/${task_id}" echo "" read -r -p "Do you want to (c)ontinue with next index or (a)bort? [c/a]: " action case ${action} in a|A) log_error "Aborting script as requested by user" exit 1 ;; *) log_info "Continuing with next index..." return 1 ;; esac fi local reindex_end reindex_end=$(date +%s) local reindex_duration=$((reindex_end - reindex_start)) log_info "Completed at: $(date '+%Y-%m-%d %H:%M:%S')" log_info "Duration: ${reindex_duration} seconds" # Extract results from completed task local new_count new_count=$(echo "${reindex_result}" | python3 -c " import sys, json try: data = json.load(sys.stdin) response = data.get('response', {}) print(response.get('total', 0)) except: print(0) " 2>/dev/null) local failures failures=$(echo "${reindex_result}" | python3 -c " import sys, json try: data = json.load(sys.stdin) response = data.get('response', {}) print(len(response.get('failures', []))) except: print(0) " 2>/dev/null) log_info "Documents reindexed: ${new_count}" log_info "Failures: ${failures}" if [[ "${failures}" = "0" ]] && [[ "${new_count}" = "${old_count}" ]]; then log_success "Reindexing successful: ${new_count} documents" elif [[ "${new_count}" = "0" ]]; then log_error "No documents were reindexed - possible task failure" log_warning "Full task result:" echo "${reindex_result}" | python3 -m json.tool 2>/dev/null | tee -a "${LOG_FILE}" echo "" read -r -p "Do you want to (c)ontinue with next index, (r)etry this index, or (a)bort? [c/r/a]: " action case ${action} in r|R) curl_es -X DELETE "${es_url}/${new_index}" >/dev/null 2>&1 log_info "Retrying..." reindex_with_smart_alias "${old_index}" return $? ;; a|A) log_error "Aborting script as requested by user" exit 1 ;; *) log_info "Continuing with next index..." return 1 ;; esac else log_error "Reindexing issue detected (old: ${old_count}, new: ${new_count}, failures: ${failures})" log_warning "Full task result:" echo "${reindex_result}" | python3 -m json.tool 2>/dev/null | tee -a "${LOG_FILE}" echo "" read -r -p "Do you want to (c)ontinue with next index, (r)etry this index, or (a)bort? [c/r/a]: " action case ${action} in r|R) curl_es -X DELETE "${es_url}/${new_index}" >/dev/null 2>&1 log_info "Retrying..." reindex_with_smart_alias "${old_index}" return $? ;; a|A) log_error "Aborting script as requested by user" exit 1 ;; *) log_info "Continuing with next index..." return 1 ;; esac fi # Step 8: Handle aliases and delete old index if [[ "${need_name_alias}" = true ]]; then # Case: No existing aliases - need to create alias with old index name log_step "Step 8/10: Deleting old index before creating alias..." local delete_result delete_result=$(curl_es -X DELETE "${es_url}/${old_index}") if echo "${delete_result}" | grep -q "acknowledged"; then log_success "Old index deleted: ${old_index}" else log_error "Failed to delete old index" log_error "Response: ${delete_result}" return 1 fi log_step "Step 9/10: Creating alias '${old_index}' pointing to '${new_index}'..." local alias_result alias_result=$(curl_es -X POST "${es_url}/_aliases" \ -H 'Content-Type: application/json' \ -d "{ \"actions\": [ {\"add\": {\"index\": \"${new_index}\", \"alias\": \"${old_index}\"}} ] }") if echo "${alias_result}" | grep -q "acknowledged"; then log_success "Alias '${old_index}' created and points to '${new_index}'" else log_error "Failed to create alias" log_error "Response: ${alias_result}" return 1 fi else # Case: Existing aliases - switch them to new index log_step "Step 8/10: Switching existing aliases to new index..." log_info "Aliases to switch: ${target_aliases}" # Build alias actions JSON local alias_actions='{"actions":[' for alias in $(echo "${target_aliases}" | tr ',' ' '); do log_info " Processing alias: ${alias}" alias_actions="${alias_actions}{\"remove\":{\"index\":\"${old_index}\",\"alias\":\"${alias}\",\"must_exist\":false}}," alias_actions="${alias_actions}{\"add\":{\"index\":\"${new_index}\",\"alias\":\"${alias}\"}}," done alias_actions="${alias_actions%,}]}" local alias_result alias_result=$(curl_es -X POST "${es_url}/_aliases" \ -H 'Content-Type: application/json' \ -d "${alias_actions}") if echo "${alias_result}" | grep -q "acknowledged"; then log_success "Aliases switched to new index" else log_error "Failed to switch aliases" log_error "Response: ${alias_result}" return 1 fi log_step "Step 9/10: Deleting old index..." local delete_result delete_result=$(curl_es -X DELETE "${es_url}/${old_index}") if echo "${delete_result}" | grep -q "acknowledged"; then log_success "Old index deleted: ${old_index}" else log_error "Failed to delete old index" log_error "Response: ${delete_result}" fi fi # Step 10: Verify aliases log_step "Step 10/10: Verifying aliases..." for alias in $(echo "${target_aliases}" | tr ',' ' '); do local check check=$(curl_es "${es_url}/_alias/${alias}" | python3 -c " import sys, json try: data = json.load(sys.stdin) indices = list(data.keys()) print(','.join(indices)) except: pass " 2>/dev/null) if echo "${check}" | grep -q "${new_index}"; then log_success "Alias '${alias}' correctly points to ${new_index}" else log_warning "Issue: Alias '${alias}' does not point to new index" log_warning "Current target: ${check}" fi done # Configure replicas log_step "Configuring number of replicas to 0..." curl_es -X PUT "${es_url}/${new_index}/_settings" \ -H 'Content-Type: application/json' \ -d '{"index":{"number_of_replicas":0}}' > /dev/null log_success "Replicas configured" log_success "Index ${old_index} → ${new_index} completed successfully" return 0 } # ================================================================ # Main Function # ================================================================ main() { local es_url es_url=$(get_es_url) log "=========================================" log " OKA REINDEXING FOR ELASTICSEARCH 9" log "=========================================" log "Date: $(date '+%Y-%m-%d %H:%M:%S')" log "Elasticsearch URL: ${es_url}" log "Authentication: $([[ -n "${ES_USER}" ]] && echo "Enabled (user: ${ES_USER})" || echo "Disabled")" log "Log file: ${LOG_FILE}" log "Confirm each index: $([[ "${CONFIRM_EACH_INDEX}" = true ]] && echo "Yes" || echo "No")" log "Skip already reindexed: $([[ "${SKIP_ALREADY_REINDEXED}" = true ]] && echo "Yes" || echo "No")" log "" # Verify Elasticsearch is accessible log_step "Checking Elasticsearch connectivity..." local es_test es_test=$(curl_es "${es_url}/") if ! echo "${es_test}" | grep -q "cluster_name"; then log_error "Cannot connect to Elasticsearch at ${es_url}" log_error "Response: ${es_test}" log_error "" log_error "Please check:" log_error " - Elasticsearch is running: systemctl status elasticsearch" log_error " - Host and port are correct: ${ES_HOST}:${ES_PORT}" log_error " - Scheme (http/https) is correct: ${ES_SCHEME}" log_error " - Network connectivity: ping ${ES_HOST}" if [[ -n "${ES_USER}" ]]; then log_error " - Credentials are valid: user=${ES_USER}" fi exit 1 fi log_success "Connected to Elasticsearch" local es_version es_version=$(echo "${es_test}" | python3 -c "import sys, json; print(json.load(sys.stdin)['version']['number'])" 2>/dev/null) log "Elasticsearch version: ${es_version}" log "" # Create safety snapshot log_warning "IMPORTANT: Creating safety snapshot..." local snapshot_name snapshot_name="before_reindex_$(date +%Y%m%d_%H%M%S)" log_step "Creating snapshot: ${snapshot_name}" local snapshot_result snapshot_result=$(curl_es -X PUT "${es_url}/_snapshot/oka_backup/${snapshot_name}?wait_for_completion=true" \ -H 'Content-Type: application/json' \ -d '{ "indices": "*,-.*", "ignore_unavailable": true, "include_global_state": false }') if echo "${snapshot_result}" | grep -q "SUCCESS"; then log_success "Snapshot created: ${snapshot_name}" else log_warning "Snapshot creation failed or timed out" log_warning "It's recommended to have a backup before proceeding" read -r -p "Continue anyway? (y/n): " continue_without_snapshot if [[ "${continue_without_snapshot}" != "y" ]] && [[ "${continue_without_snapshot}" != "Y" ]]; then log "Operation cancelled by user" exit 0 fi fi log "" # List indices to reindex log_step "Retrieving list of indices to reindex..." local all_indices all_indices=$(curl_es "${es_url}/_cat/indices?h=index" | grep -v "^\." | sort) # Filter out indices that are already reindexed (-v8) if skip is enabled local indices="" if [[ "${SKIP_ALREADY_REINDEXED}" = true ]]; then log_info "Filtering out already reindexed indices (ending with -v8)..." while IFS= read -r idx; do if ! is_already_reindexed "${idx}"; then if [[ -z "${indices}" ]]; then indices="${idx}" else indices="${indices}\n${idx}" fi fi done <<< "${all_indices}" indices=$(echo -e "${indices}") else indices="${all_indices}" fi local total total=$(echo "${indices}" | wc -l) log_info "Number of indices to process: ${total}" log "" # Show sample of indices log "First 10 indices:" echo "${indices}" | head -10 | while read -r idx; do log " - ${idx}" done if [[ "${total}" -gt 10 ]]; then log " ... and $((total - 10)) more" fi log "" # Ask for confirmation log_warning "This operation will reindex ${total} indices." log_warning "Estimated duration: ~$((total * 2)) minutes (depends on data size)" log "" read -r -p "Do you want to continue? (type YES in uppercase): " confirmation if [[ "${confirmation}" != "YES" ]]; then log "Operation cancelled by user" exit 0 fi log "" log "Starting reindexing process..." log "" # Process each index local counter=0 local success=0 local failed=0 local skipped=0 local start_time start_time=$(date +%s) while IFS= read -r index; do counter=$((counter + 1)) log "\n╔═══════════════════════════════════════════════════════════╗" log "║ Progress: [${counter}/${total}] - $(date '+%H:%M:%S')" log "╚═══════════════════════════════════════════════════════════╝" reindex_with_smart_alias "${index}" local result=$? if [[ ${result} -eq 0 ]]; then success=$((success + 1)) elif [[ ${result} -eq 2 ]]; then skipped=$((skipped + 1)) else failed=$((failed + 1)) fi # Show progress summary log "" log_info "Current progress: Success: ${success}, Failed: ${failed}, Skipped: ${skipped}" done <<< "${indices}" local end_time end_time=$(date +%s) local total_duration=$((end_time - start_time)) local duration_minutes=$((total_duration / 60)) local duration_seconds=$((total_duration % 60)) # Final summary log "\n=========================================" log " REINDEXING SUMMARY" log "=========================================" log_success "Successfully processed: ${success} indices" if [[ "${skipped}" -gt 0 ]]; then log_warning "Skipped: ${skipped} indices" fi if [[ "${failed}" -gt 0 ]]; then log_error "Failed: ${failed} indices" fi log "Total duration: ${duration_minutes}m ${duration_seconds}s" log "" # Final alias verification log "Final alias verification (first 30):" curl_es "${es_url}/_cat/aliases?v&s=alias" | head -31 | tee -a "${LOG_FILE}" log "" log "Complete log available at: ${LOG_FILE}" log "" if [[ "${failed}" -eq 0 ]]; then log_success "Reindexing completed successfully!" log "" log "=========================================" log " NEXT STEPS" log "=========================================" log "" log "1. Verify OKA is working correctly:" log " curl ${es_url}/_cat/indices?v" log "" log "2. Upgrade Elasticsearch to version 9:" log " sudo systemctl stop elasticsearch" log " sudo yum update elasticsearch -y" log " sudo systemctl start elasticsearch" log "" else log_error "Some indices failed to reindex." log_error "Please check the log file: ${LOG_FILE}" log "" log "You can re-run this script to retry only the failed indices." log "The script will automatically skip already reindexed indices." fi } # ================================================================ # Parse Command Line Arguments # ================================================================ while [[ $# -gt 0 ]]; do case $1 in -h|--host) ES_HOST="$2" shift 2 ;; -p|--port) ES_PORT="$2" shift 2 ;; -s|--scheme) ES_SCHEME="$2" shift 2 ;; -u|--user) ES_USER="$2" shift 2 ;; -w|--password) ES_PASSWORD="$2" shift 2 ;; -c|--confirm-each) CONFIRM_EACH_INDEX=true shift ;; --no-skip-reindexed) SKIP_ALREADY_REINDEXED=false shift ;; -l|--log-file) LOG_FILE="$2" shift 2 ;; --help) show_usage ;; *) echo "Unknown option: $1" show_usage ;; esac done # ================================================================ # Execute Main Function # ================================================================ main
Note
If you have large indices, we recommend that you run this script in
tmuxto prevent it from being killed on network disconnect.Also, reindexing indices requires free disk space, as data will temporarily be duplicated. Ensure that your server has enough space to hold at least twice the biggest index.
Warning
This script is given as example, and will try to migrate all indices in your Elasticsearch.
If you share Elasticsearch with other tools, adapt the process to your needs, but in this case, you need to keep the same behavior as this script for OKA migrations: the ID of the indices must remain the same for OKA. We achieve this by creating aliases: the ID of the old index is the ID of the alias pointing to the newly migrated index.
Upgrade to Elasticsearch 9 and start OKA:
systemctl stop elasticsearch # See Elasticsearch documentation to enable repos for version 9 dnf update --enablerepo=elasticsearch elasticsearch systemctl start elasticsearch systemctl start oka