Elasticsearch集群配置-节点职责划分 & Hot Warm 架构实践

前言

本文主要讲了ES在节点部署时可以考虑的节点职责划分,如果不考虑节点部署,那么所有节点都会身兼数职(master-eligible ,data,coordinate等),这对后期的维护拓展并不利,所以本文从节点介绍出发,再到实践Hot Warm 架构,让大家有个es集群分职责部署有个直观印象。然后仍要着重强调 ,本文只是一个引子,只是告诉你ES有这个东西,当看完本文,以后的所有问题都应该直接去看 Elasticsearc-Nodes 官方文档介绍,ES完善的文档内容已经能够解决大部分问题,下方很多节点介绍也直接是文档原文整理了一下,都没有翻译,一开始本来想翻译的,后面想想完全没有必要,程序猿看英文文档不是很正常吗,哈哈。

一个节点只承担一个角色(Elasticsearch 8.14.3)

节点职责

You define a node's roles by setting node.roles in elasticsearch.yml. If you set node.roles, the node is only assigned the roles you specify. If you don't set node.roles, the node is assigned the following roles:

  • master
  • data
  • data_content
  • data_hot
  • data_warm
  • data_cold
  • data_frozen
  • ingest
  • ml
  • remote_cluster_client
  • transform

If you set node.roles, ensure you specify every node role your cluster needs. Every cluster requires the following node roles:

  • master
  • (data_content and data_hot) OR data
Master-eligible node

The master node is responsible for lightweight cluster-wide actions,It is important for cluster health to have a stable master node

  • creating or deleting an index
  • tracking which nodes are part of the cluster
  • deciding which shards to allocate to which nodes

To create a dedicated master-eligible node, set:

text-plain 复制代码
node.roles: [ master ]
Coordinating Node

Requests like search requests or bulk-indexing requests may involve data held on different data nodes. A search request, for example, is executed in two phases which are coordinated by the node which receives the client request --- the coordinating node.

  1. In the scatter phase, the coordinating node forwards the request to the data nodes which hold the data. Each data node executes the request locally and returns its results to the coordinating node.
  2. In the gather phase, the coordinating node reduces each data node's results into a single global result set.

Every node is implicitly a coordinating node. This means that a node that has an explicit empty list of roles via node.roles will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.

To create a dedicated coordinating node, set:

text-plain 复制代码
node.roles: [ ]
Data Node

Data nodes hold the shards that contain the documents you have indexed. Data nodes handle data related operations like CRUD, search, and aggregations. These operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded.

In a multi-tier deployment architecture, you use specialized data roles to assign data nodes to specific tiers: data_content,data_hot, data_warm, data_cold, or data_frozen. A node can belong to multiple tiers.

If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic data role.

Generic data node

Generic data nodes are included in all content tiers.

To create a dedicated generic data node, set:

text-plain 复制代码
node.roles: [ data ]
Content data node

Content data nodes are part of the content tier. Data stored in the content tier is generally a collection of items such as a product catalog or article archive. Content data typically has long data retention requirements, and you want to be able to retrieve items quickly regardless of how old they are. Content tier nodes are usually optimized for query performance---​they prioritize processing power over IO throughput so they can process complex searches and aggregations and return results quickly.

The content tier is required and is often deployed within the same node grouping as the hot tier. System indices and other indices that aren't part of a data stream are automatically allocated to the content tier.

To create a dedicated content node, set:

text-plain 复制代码
node.roles: ["data_content"]
Hot data node

Hot data nodes are part of the hot tier. The hot tier is the Elasticsearch entry point for time series data and holds your most-recent, most-frequently-searched time series data. Nodes in the hot tier need to be fast for both reads and writes, which requires more hardware resources and faster storage (SSDs) . For resiliency, indices in the hot tier should be configured to use one or more replicas.

The hot tier is required. New indices that are part of a data stream are automatically allocated to the hot tier.

To create a dedicated hot node, set:

text-plain 复制代码
node.roles: [ "data_hot" ]
Warm data node

Warm data nodes are part of the warm tier. Time series data can move to the warm tier once it is being queried less frequently than the recently-indexed data in the hot tier. The warm tier typically holds data from recent weeks. Updates are still allowed, but likely infrequent. Nodes in the warm tier generally don't need to be as fast as those in the hot tier. For resiliency, indices in the warm tier should be configured to use one or more replicas.

To create a dedicated warm node, set:

text-plain 复制代码
node.roles: [ "data_warm" ]
Cold data node

Cold data nodes are part of the cold tier. When you no longer need to search time series data regularly, it can move from the warm tier to the cold tier. While still searchable, this tier is typically optimized for lower storage costs rather than search speed.

For better storage savings, you can keep fully mounted indices of searchable snapshots on the cold tier. Unlike regular indices, these fully mounted indices don't require replicas for reliability. In the event of a failure, they can recover data from the underlying snapshot instead. This potentially halves the local storage needed for the data. A snapshot repository is required to use fully mounted indices in the cold tier. Fully mounted indices are read-only.

Alternatively, you can use the cold tier to store regular indices with replicas instead of using searchable snapshots. This lets you store older data on less expensive hardware but doesn't reduce required disk space compared to the warm tier.

To create a dedicated cold node, set:

text-plain 复制代码
node.roles: [ "data_cold" ]
Frozen data node

Frozen data nodes are part of the frozen tier. Once data is no longer being queried, or being queried rarely, it may move from the cold tier to the frozen tier where it stays for the rest of its life.

The frozen tier requires a snapshot repository. The frozen tier uses partially mounted indices to store and load data from a snapshot repository. This reduces local storage and operating costs while still letting you search frozen data. Because Elasticsearch must sometimes fetch frozen data from the snapshot repository, searches on the frozen tier are typically slower than on the cold tier.

To create a dedicated frozen node, set:

text-plain 复制代码
node.roles: [ "data_frozen" ]

WARNING: Adding too many coordinating only nodes to a cluster can increase the burden on the entire cluster because the elected master node must await acknowledgement of cluster state updates from every node! The benefit of coordinating only nodes should not be overstated --- data nodes can happily serve the same purpose.

Ingest Node

数据前置处理转换节点,支持pipeline管道设置,可以使用ingest对数据进行过滤、转换等操作

Hot Warm 架构实践

部署3个master-eligible节点, 2个coordinate only node,2个data-content(充当warm节点),3个data-hot。这些基本算是如果要做节点职责划分的最小配置。

  • 单一 master eligible nodes: 负责集群状态(cluster state)的管理
    • 使用低配置的CPU,RAM和磁盘
  • 单一 data nodes: 负责数据存储及处理客户端请求
    • 使用高配置的CPU,RAM和磁盘
  • 单一ingest nodes: 负责数据处理
    • 使用高配置CPU; 中等配置的RAM; 低配置的磁盘
  • 单一Coordinating Only Nodes(Client Node)
    • 使用高配置CPU; 高配置的RAM; 低配置的磁盘
Docker安装脚本
message-http 复制代码
docker network create elastic

docker run -d ^
  --name es-master-01 ^
  --hostname es-master-01 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9200:9200 ^
  -e node.name=es-master-01 ^
  -e node.roles=["master"]  ^
  -e cluster.initial_master_nodes=es-master-01,es-master-02,es-master-03 ^
  -e discovery.seed_hosts=es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-master-01\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3
  
docker run -d ^
  --name es-master-02 ^
  --hostname es-master-02 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9201:9200 ^
  -e node.name=es-master-02 ^
  -e node.roles=["master"]  ^
  -e cluster.initial_master_nodes=es-master-01,es-master-02,es-master-03 ^
  -e discovery.seed_hosts=es-master-01,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-master-02\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3
  
  
docker run -d ^
  --name es-master-03 ^
  --hostname es-master-03 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9202:9200 ^
  -e node.name=es-master-03 ^
  -e node.roles=["master"]  ^
  -e cluster.initial_master_nodes=es-master-01,es-master-02,es-master-03 ^
  -e discovery.seed_hosts=es-master-01,es-master-02 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-master-03\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
  

docker run -d ^
  --name es-coordinating-only-01 ^
  --hostname es-coordinating-only-01 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9210:9200 ^
  -e node.name=es-coordinating-only-01 ^
  -e node.roles=[]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-coordinating-only-01\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
  
docker run -d ^
  --name es-coordinating-only-02 ^
  --hostname es-coordinating-only-02 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9211:9200 ^
  -e node.name=es-coordinating-only-02 ^
  -e node.roles=[]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-coordinating-only-02\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
  

docker run -d ^
  --name es-data-hot-01 ^
  --hostname es-data-hot-01 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9220:9200 ^
  -e node.name=es-data-hot-01 ^
  -e node.roles=["data_hot"]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-data-hot-01\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
 
docker run -d ^
  --name es-data-hot-02 ^
  --hostname es-data-hot-02 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9221:9200 ^
  -e node.name=es-data-hot-02 ^
  -e node.roles=["data_hot"]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-data-hot-02\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  

docker run -d ^
  --name es-data-hot-03 ^
  --hostname es-data-hot-03 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9222:9200 ^
  -e node.name=es-data-hot-03 ^
  -e node.roles=["data_hot"]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-data-hot-03\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
  
docker run -d ^
  --name es-data-content-01 ^
  --hostname es-data-content-01 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9230:9200 ^
  -e node.name=es-data-content-01 ^
  -e node.roles=["data_content"]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-data-content-01\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  

docker run -d ^
  --name es-data-content-02 ^
  --hostname es-data-content-02 ^
  --restart unless-stopped ^
  --net elastic ^
  -p 9231:9200 ^
  -e node.name=es-data-content-02 ^
  -e node.roles=["data_content"]  ^
  -e discovery.seed_hosts=es-master-01,es-master-02,es-master-03 ^
  -e cluster.name=docker-cluster ^
  -e xpack.security.enabled=false ^
  -v C:\Users\wayne\docker\elasticsearch\docker-cluster\es-data-content-02\data:/usr/share/elasticsearch/data ^
  -m 1GB ^
  docker.elastic.co/elasticsearch/elasticsearch:8.14.3  
create index template
message-http 复制代码
PUT /_index_template/books_template HTTP/1.1
Host: 123.123.8.2:9210
Content-Type: application/json
Content-Length: 887
{
   "index_patterns" : ["books"],
   "template": {
       "settings": {
           "index": {
               "number_of_shards": "1",
               "number_of_replicas": "1",
               "routing": {
                   "allocation": {
                       "include": {
                       	   // 写入data_hot节点
                           "_tier_preference": "data_hot"
                       }
                   }
               }
           }
       },
       "mappings": {
           "properties": {
               "author": {
                   "type": "text"
               },
               "page_count": {
                   "type": "integer"
               },
               "name": {
                   "type": "text"
               },
               "release_date": {
                   "type": "date"
               }
           }
       }
   }
}
add a data
message-http 复制代码
POST /books/_doc HTTP/1.1
Host: 123.123.8.2:9210
Content-Type: application/json
Content-Length: 123

{
    "name": "Snow Crash",
    "author": "Neal Stephenson",
    "release_date": "1992-06-01",
    "page_count": 470
} 
Set a tier preference for existing indices (move index to other data node)

你可以使用ilm机制去自动迁移index,这里只是演示http请求手动迁移books索引从data_hot节点到data_content节点。

message-http 复制代码
PUT /books/_settings HTTP/1.1
Host: 123.123.8.2:9210
Content-Type: application/json
Content-Length: 79

{
    "index.routing.allocation.include._tier_preference": "data_content"
}
相关推荐
九章云极AladdinEdu1 小时前
AI集群全链路监控:从GPU微架构指标到业务Metric关联
人工智能·pytorch·深度学习·架构·开源·gpu算力
蒋星熠1 小时前
深入 Kubernetes:从零到生产的工程实践与原理洞察
人工智能·spring boot·微服务·云原生·容器·架构·kubernetes
即兴小索奇1 小时前
Google AI Mode 颠覆传统搜索方式,它是有很大可能的
前端·后端·架构
在未来等你2 小时前
Elasticsearch面试精讲 Day 14:数据写入与刷新机制
大数据·分布式·elasticsearch·搜索引擎·面试
敲上瘾2 小时前
Docker 存储卷(Volume)核心概念、类型与操作指南
linux·服务器·数据库·docker·容器·架构
John_ToDebug2 小时前
从源码视角全面解析 Chrome UI 布局系统及 Views 框架的定制化实现方法与实践经验
c++·chrome·架构
一水鉴天3 小时前
整体设计 之 绪 思维导图引擎 :思维价值链分层评估的 思维引导和提示词导航 之 引 认知系统 之8 之 序 认知元架构 之3(豆包助手 之5)
架构·认知科学
phac1233 小时前
git 如何直接拉去远程仓库的内容且忽略本地与远端不一致的commit
大数据·git·elasticsearch
在未来等你3 小时前
Elasticsearch面试精讲 Day 11:索引模板与动态映射
大数据·分布式·elasticsearch·搜索引擎·面试
uzong3 小时前
深入浅出:画好技术图
后端·架构