1、Elasticsearch快照迁移

背景说明

1、源端是AWS、目标端是阿里云

2、最后个割接环境，客户能接受停机迁移

3、源端数据量比较大，大概3T

官方迁移文档

Elasticsearch迁移方案选取指南_检索分析服务 Elasticsearch版(ES)-阿里云帮助中心

将AWS中的ES数据迁移到阿里云ES中_检索分析服务 Elasticsearch版(ES)-阿里云帮助中心

ES迁移整体流程

优化目标端ES配置项

修改下面的配置可以有效加速快照恢复的速度。

增加恢复带宽

bash 复制代码

PUT /_cluster/settings
{
    "transient": {
        "indices.recovery.max_bytes_per_sec": "300mb"
    }
}
GET /_cluster/settings?include_defaults=true&flat_settings=true

1、限制恢复带宽: 将 indices.recovery.max_bytes_per_sec 参数设置为 300mb（默认是40mb）

2、临时生效: 使用 transient 配置，重启后失效

3、影响范围: 控制分片恢复、重新平衡等操作的网络传输速度

调整集群分片分配并发度: 配置 Elasticsearch 集群中分片恢复和分配的并发数量

默认值是4

bash 复制代码

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.node_concurrent_recoveries" : "16",
    "cluster.routing.allocation.node_initial_primaries_recoveries" : "16"
  }
}

cluster.routing.allocation.node_concurrent_recoveries:

设置每个节点上同时进行的分片恢复操作数量为 16
影响集群重新平衡、节点重启后的数据恢复速度

cluster.routing.allocation.node_initial_primaries_recoveries:

设置每个节点上同时进行的主分片初始恢复操作数量为 16
主要影响新索引创建或集群启动时的主分片分配速度

配置特点

持久化设置: 使用 persistent 配置，重启后仍然有效
性能优化: 增加并发度可以加快集群恢复和重新平衡速度
资源管理: 需要根据节点硬件资源合理设置，避免过度消耗系统资源

适用场景

集群扩容或缩容时加速分片重新分配
节点维护后快速恢复服务
大规模数据迁移时提高效率

总结的迁移步骤

第一次快照恢复动作

1、在AWS上的es创建快照并上传到S3

bash 复制代码

GET _snapshot
# 创建仓库
PUT _snapshot/ali_backup
{
  "type": "s3",
  "settings": {
    "bucket": "AWS云S3桶名称",
    "base_path": "桶下面的路径",
    "region": "cn-north-1",
    "endpoint": "s3.cn-north-1.amazonaws.com.cn"
  }
}
# 第一个版本快照
PUT _snapshot/ali_backup/prd_01
{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": false
}
# 查看快照进度
GET _cat/snapshots/ali_backup?v

2、启动S3到oss的数据同步任务：经过上面的步骤AWS的es数据已经同步到S3，配置数据同步将S3数据同步到阿里云OSS并观察数据同步结果

3、在阿里云ES上恢复快照

bash 复制代码

# 更新仓库
PUT _snapshot/ali_backup
{
"type": "oss",
"settings": {
            "base_path": "桶下面的路径",
            "endpoint": "http://oss-cn-beijing-internal.aliyuncs.com", 
            "access_key_id": "*************",
            "secret_access_key": "*************",
            "bucket": "oss桶",
            "compress": true
      }
}

# 查看快照
GET _snapshot/ali_backup/prd_01

# 全量恢复快照（排除系统索引，注意这里是第一次恢复所以不需要关闭索引）
POST _snapshot/ali_backup/prd_01/_restore
{
    "indices": "*,-.*,-system*,-ilm*,-kibana*,-.kibana*"
}
# * 恢复镜像中的所有索引
# -.*  排除系统索引
# -system*  
# -ilm* 排除ILM索引
# -kibana*,-.kibana*  排除kibana索引

第二次快照恢复动作

1、在AWS上的es创建快照并上传到S3，第2个版本prd_02

bash 复制代码

# 第2个版本快照
PUT _snapshot/ali_backup/prd_02
{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": false
}
# 查看快照进度
GET _cat/snapshots/ali_backup?v

2、启动S3到oss的数据同步任务

3、在阿里云ES上恢复快照

bash 复制代码

# 查看快照
GET _snapshot/ali_backup/prd_02

# 关闭所有索引主要要排除系统索引，不然kibana会报错就没法操作了
POST /*,-.*,-system*,-ilm*,-kibana*,-.kibana*/_close

# 全量恢复快照排除系统索引
POST _snapshot/ali_backup/prd_02/_restore
{
    "indices": "*,-.*,-system*,-ilm*,-kibana*,-.kibana*"
}

总结说明

快照恢复比较简单高效，当据量大的情况下第一次打快照和恢复快照时间较长，后面做增量快照和恢复增量快照比较快。

es生成快照脚本

下面是基于python开发的生成快照、关闭索引、恢复索引、打开索引、查看恢复状态、查看索引版本的工具，方便进行多次打快照恢复快照避免在kibana上频繁操作误操作。

es_ba_config.toml

ini 复制代码

[AWS-es01]
es_host = "xxxxx.ssss.net.cn"
es_port = 9200
protocol = "https"
username = "elastic"  # 用户名
password = "xxxxxxxxxxxxxx"  # 密码

[AWS-es02]
es_host = "xxxxx.ssss.cn"
es_port = 9200
protocol = "https"
username = "elastic"  # 用户名
password = "XXXXXXXXP"  # 密码

es_re_config.toml

ini 复制代码

[ALI-es-01]
es_host = "XXXXXXXXXXXXXXXXXXXXXXXXxx.aliyuncs.com"
es_port = 9200
protocol = "https"
username = "elastic"  # 用户名
password = "XXXXXXXXX"  # 密码

[ALI-es-02]
es_host = "XXXXXXXXXXXXXXXXXXXXXXXXxx.aliyuncs.com"
es_port = 9200
protocol = "https"
username = "elastic"  # 用户名
password = "XXXXXXXXX"  # 密码

es-backup.py

python 复制代码

import requests
import toml
from flask import Flask, jsonify, request
app = Flask(__name__)

#----------------------------加载配置文件---------------------------#
def set_config(config_name):
    """
    获取配置信息并设置全局变量
    """
    global ES_HOST, ES_PORT, PROTOCOL, USERNAME, PASSWORD

    try:
        file_path = r"D:\work\es_ba_config.toml"
        config = toml.load(file_path)[config_name]

        # 设置全局变量
        ES_HOST = config.get('es_host')
        ES_PORT = config.get('es_port')
        PROTOCOL = config.get('protocol')
        USERNAME = config.get('username')
        PASSWORD = config.get('password')

        print({
            'ES_HOST': ES_HOST,
            'ES_PORT': ES_PORT,
            'PROTOCOL': PROTOCOL,
            'USERNAME': USERNAME,
            'PASSWORD': PASSWORD
        })
    except Exception as e:
        print(f"加载配置失败: {e}")
        return None
#----------------------------加载配置文件---------------------------#

@app.route('/api/get_snapshot_status', methods=['POST'])
def api_get_snapshot_status():
    """
    API接口：获取快照状态
    执行命令: GET _cat/snapshots/plrs_target_ps_visit_precondition?v
    """
    data = request.get_json()
    repository = data.get('repository')
    config_name = data.get('config_name')
    set_config(config_name)
    # 构建请求 URL
    url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_cat/snapshots/{repository}?v"

    # 设置请求参数
    params = {
        'v': 'true'  # 显示详细信息
    }

    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    try:
        # 发送 GET 请求，添加认证信息和参数
        response = requests.get(url, params=params, auth=auth)
        response.raise_for_status()  # 检查请求是否成功

        print(response.text)

        return jsonify({
            "code": 200,
            "message": "success",
            "data": response.text
        })

    except requests.exceptions.RequestException as e:
        return jsonify({
            "code": 500,
            "message": f"获取快照状态失败: {str(e)}",
            "data": None
        })


@app.route('/api/create_snapshot', methods=['POST'])
def api_create_snapshot():
    """
    API接口：创建快照
    执行命令: PUT _snapshot/plrs_target_ps_visit_precondition/prd_01
    """
    # 使用默认参数匹配指定的快照创建命令
    ignore_unavailable = True
    include_global_state = False
    data = request.get_json()
    repository = data.get('repository')
    snapshot = data.get('snapshot')
    indices = data.get('indices')
    config_name = data.get('config_name')
    set_config(config_name)

    # 构建请求 URL
    url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_snapshot/{repository}/{snapshot}"

    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    # 设置请求体
    body = {
        "indices": indices,
        "ignore_unavailable": ignore_unavailable,
        "include_global_state": include_global_state
    }

    try:
        # 发送 PUT 请求，添加认证信息和请求体
        response = requests.put(url, auth=auth, json=body)
        response.raise_for_status()  # 检查请求是否成功

        # 返回 JSON 响应数据
        return jsonify({
            "code": 200,
            "message": "快照创建成功",
            "data": response.json()
        })

    except requests.exceptions.RequestException as e:
        return jsonify({
            "code": 500,
            "message": "创建快照失败",
            "data": None
        })
if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8888, debug=True)

es-restore.py

python 复制代码

import requests
import toml
from flask import Flask, jsonify, request
app = Flask(__name__)


#----------------------------加载配置文件---------------------------#
def set_config(config_name):
    """
    获取配置信息并设置全局变量
    """
    global ES_HOST, ES_PORT, PROTOCOL, USERNAME, PASSWORD

    try:
        file_path = r"D:\work\es_re_config.toml"
        config = toml.load(file_path)[config_name]

        # 设置全局变量
        ES_HOST = config.get('es_host')
        ES_PORT = config.get('es_port')
        PROTOCOL = config.get('protocol')
        USERNAME = config.get('username')
        PASSWORD = config.get('password')

        print({
            'ES_HOST': ES_HOST,
            'ES_PORT': ES_PORT,
            'PROTOCOL': PROTOCOL,
            'USERNAME': USERNAME,
            'PASSWORD': PASSWORD
        })
    except Exception as e:
        print(f"加载配置失败: {e}")
        return None
#----------------------------加载配置文件---------------------------#

def get_es_indices_status(es_host, es_port=9200, protocol='http', username=None, password=None):
    """
    获取 Elasticsearch 所有索引的状态信息

    Args:
        es_host (str): Elasticsearch 主机地址
        es_port (int): Elasticsearch 端口号，默认为 9200
        protocol (str): 协议类型，默认为 'http'
        username (str): 用户名，用于基本认证
        password (str): 密码，用于基本认证

    Returns:
        dict: 索引状态信息
    """
    # 构建请求 URL
    url = f"{protocol}://{es_host}:{es_port}/_cat/indices/*"

    # 设置请求参数
    params = {
        'h': 'index,status',  # 只返回索引名和状态
        'expand_wildcards': 'all'  # 展开所有通配符匹配的索引
    }

    # 设置认证信息
    auth = None
    if username and password:
        auth = (username, password)

    try:
        # 发送 GET 请求，添加认证信息
        response = requests.get(url, params=params, auth=auth)
        response.raise_for_status()  # 检查请求是否成功

        # 处理响应数据
        indices_data = response.text.strip()
        if indices_data:
            # 按行分割并处理数据
            lines = indices_data.split('\n')
            indices_status = {}
            for line in lines:
                if line.strip():
                    parts = line.strip().split()
                    if len(parts) >= 2:
                        index_name = parts[0]
                        status = parts[1]
                        indices_status[index_name] = status
            # 2. 过滤出非系统索引
            print("\n正在过滤非系统索引...")
            non_system_indices = filter_non_system_indices(indices_status)
            print(f"共找到 {len(non_system_indices)} 个非系统索引:")
            for index in non_system_indices:
                print(f"  - {index}")
            return non_system_indices
        else:
            return {}

    except requests.exceptions.RequestException as e:
        print(f"请求 Elasticsearch 失败: {e}")
        return None
def filter_non_system_indices(indices_status):
    """
    过滤出所有非系统索引（不以 . 开头的索引）

    Args:
        indices_status (dict): 所有索引及其状态

    Returns:
        list: 非系统索引列表
    """
    non_system_indices = []
    for index_name in indices_status.keys():
        # 过滤掉以 . 开头的系统索引
        if not index_name.startswith('.'):
            non_system_indices.append(index_name)

    return non_system_indices

# 逐个关闭索引
@app.route('/api/close_individual_indices', methods=['POST'])
def close_individual_indices():
    data = request.get_json()
    indices_list = data.get('indices_list')
    config_name = data.get('config_name')
    set_config(config_name)
    """
    逐个关闭索引（更安全的方式）

    Args:
        es_host (str): Elasticsearch 主机地址
        es_port (int): Elasticsearch 端口号
        protocol (str): 协议类型
        username (str): 用户名
        password (str): 密码
        indices_list (list): 要关闭的索引列表,依赖接口传递，如果是空就获取全部

    Returns:
        dict: 关闭操作结果统计
    """
    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    success_count = 0
    failed_count = 0
    details = []
    if not indices_list:
        try:
            # 构建请求 URL
            url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/*,-.*,-system*,-ilm*,-kibana*,-.kibana*/_close"

            # 发送 POST 请求关闭单个索引
            response = requests.post(url, auth=auth)
            response.raise_for_status()

        except requests.exceptions.RequestException as e:
            print(f"关闭索引 * 失败: {e}")
            details.append({"index": "*", "status": "failed", "error": str(e)})
    else:
        for index_name in indices_list:
            try:
                # 构建请求 URL
                url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/{index_name}/_close"

                # 发送 POST 请求关闭单个索引
                response = requests.post(url, auth=auth)
                response.raise_for_status()

                print(f"成功关闭索引: {index_name}")
                success_count += 1
                details.append({"index": index_name, "status": "success"})

            except requests.exceptions.RequestException as e:
                print(f"关闭索引 {index_name} 失败: {e}")
                failed_count += 1
                details.append({"index": index_name, "status": "failed", "error": str(e)})
    return jsonify({
        "code": 200,
        "message": "success",
        "data": {
            "success": success_count,
            "failed": failed_count,
            "details": details
        }
    })

@app.route('/api/open_individual_indices', methods=['POST'])
def open_individual_indices():
    data = request.get_json()
    indices_list = data.get('indices_list')
    config_name = data.get('config_name')
    set_config(config_name)
    """
    逐个开启索引（更安全的方式）

    Args:
        es_host (str): Elasticsearch 主机地址
        es_port (int): Elasticsearch 端口号
        protocol (str): 协议类型
        username (str): 用户名
        password (str): 密码
        indices_list (list): 要开启的索引列表

    Returns:
        dict: 开启操作结果统计
    """
    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    success_count = 0
    failed_count = 0
    details = []

    if not indices_list:
        # 开启所有索引
        try:
            # 构建请求 URL
            url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_all/_open"

            # 发送 POST 请求开启单个索引
            response = requests.post(url, auth=auth)
            response.raise_for_status()

        except requests.exceptions.RequestException as e:
            print(f"开启索引_all失败: {e}")
            details.append({"index": "_all", "status": "failed", "error": str(e)})
    else:
        for index_name in indices_list:
            try:
                # 构建请求 URL
                url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/{index_name}/_open"

                # 发送 POST 请求开启单个索引
                response = requests.post(url, auth=auth)
                response.raise_for_status()

                print(f"成功开启索引: {index_name}")
                success_count += 1
                details.append({"index": index_name, "status": "success"})

            except requests.exceptions.RequestException as e:
                print(f"开启索引 {index_name} 失败: {e}")
                failed_count += 1
                details.append({"index": index_name, "status": "failed", "error": str(e)})

    return jsonify({
        "code": 200,
        "message": "success",
        "data": {
            "success": success_count,
            "failed": failed_count,
            "details": details
        }
    })


@app.route('/api/restore_snapshot', methods=['POST'])
def api_restore_snapshot():
    """
    API接口：恢复快照
    执行命令: POST _snapshot/ali_es_backup/prd_01/_restore
    """
    # 使用固定的参数来匹配指定的恢复命令
    data = request.get_json()
    repository = data.get('repository')
    snapshot =  data.get('snapshot')
    indices = data.get('indices')
    config_name = data.get('config_name')
    set_config(config_name)
    # 构建请求 URL
    get_url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_snapshot/{repository}/{snapshot}"
    post_url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_snapshot/{repository}/{snapshot}/_restore"
    print(get_url)
    print(post_url)

    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    # 设置请求体 - 匹配指定命令的参数
    body = {
        "indices": indices
    }
    print(body)

    try:
        # 发送 POST 请求，添加认证信息和请求体
        response = requests.get(get_url, auth=auth, json=body)
        response.raise_for_status()  # 检查请求是否成功
        pass
    except requests.exceptions.RequestException as e:
        return jsonify({
            "code": 500,
            "message": f"获取快照失败: {str(e)}",
            "data": None
        })

    try:
        # 发送 POST 请求，添加认证信息和请求体
        response = requests.post(post_url, auth=auth, json=body)
        response.raise_for_status()  # 检查请求是否成功

        return jsonify({
            "code": 200,
            "message": f"快照恢复成功",
            "data": response.json()
        })

    except requests.exceptions.RequestException as e:
        return jsonify({
            "code": 500,
            "message": f"恢复快照失败: {str(e)}",
            "data": None
        })

@app.route('/api/get_all_snapshots', methods=['POST'])
def get_all_snapshots():
    """
    获取指定仓库下所有快照的信息

    Args:
        es_host (str): Elasticsearch 主机地址
        es_port (int): Elasticsearch 端口号，默认为 9200
        protocol (str): 协议类型，默认为 'http'
        username (str): 用户名，用于基本认证
        password (str): 密码，用于基本认证
        repository (str): 快照仓库名称

    Returns:
        dict: 快照信息
    """
    data = request.get_json()
    repository = data.get('repository')
    config_name = data.get('config_name')
    set_config(config_name)
    # 构建请求 URL
    url = f"{PROTOCOL}://{ES_HOST}:{ES_PORT}/_snapshot/{repository}/_all"
    # 设置请求参数
    params = {
        'filter_path': 'snapshots.name,snapshots.snapshot,snapshots.state,snapshots.start_time,snapshots.end_time'
    }

    # 设置认证信息
    auth = None
    if USERNAME and PASSWORD:
        auth = (USERNAME, PASSWORD)

    try:
        # 发送 GET 请求，添加认证信息和参数
        response = requests.get(url, params=params, auth=auth)
        response.raise_for_status()  # 检查请求是否成功
        # 返回 JSON 响应数据
        return jsonify({
            "code": 200,
            "message": "获取成功",
            "data": response.json()
        })

    except requests.exceptions.RequestException as e:
        return jsonify({
            "code": 500,
            "message": f"获取信息失败: {str(e)}",
            "data": None
        })


if __name__ == "__main__":
    app.run(host='0.0.0.0', port=6666, debug=True)