Elasticsearch基础

Elasticsearch 是使用 Java 编写的，它的内部使用 Lucene 做索引与搜索。

一个分布式的实时文档存储，每个字段可以被索引与搜索
一个分布式实时分析搜索引擎
能胜任上百个服务节点的扩展，并支持 PB 级别的结构化或者非结构化数据

测试 Elasticsearch 是否启动成功:

复制代码

# curl 'http://localhost:9200/?pretty'
{
  "status" : 200,
  "name" : "node1",
  "version" : {
    "number" : "1.3.4",
    "build_hash" : "a70f3ccb52200f8f2c87e9c370c6597448eb3e45",
    "build_timestamp" : "2014-09-30T09:07:17Z",
    "build_snapshot" : false,
    "lucene_version" : "4.9"
  },
  "tagline" : "You Know, for Search"
}

一个 Elasticsearch 请求和任何 HTTP 请求一样由若干相同的部件组成：

复制代码

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>’

|--------------|--------------------------------------------------------------------------------------|
| VERB | 适当的 HTTP 方法或谓词 : GET`、 `POST`、 `PUT`、 `HEAD 或者 `DELETE`。 |
| PROTOCOL | http 或者 https`（如果你在 Elasticsearch 前面有一个 `https 代理） |
| HOST | Elasticsearch 集群中任意节点的主机名，或者用 localhost 代表本地机器上的节点。 |
| PORT | 运行 Elasticsearch HTTP 服务的端口号，默认是 9200 。 |
| PATH | API 的终端路径（例如 _count 将返回集群中文档数量）。Path 可能包含多个组件，例如：_cluster/stats 和 _nodes/stats/jvm 。 |
| QUERY_STRING | 任意可选的查询字符串参数 (例如 ?pretty 将格式化地输出 JSON 返回值，使其更容易阅读) |
| BODY | 一个 JSON 格式的请求体 (如果请求需要的话 |

Elasticsearch 是面向文档的，意味着它存储整个对象或文档_。Elasticsearch 不仅存储文档，而且 _索引每个文档的内容使之可以被检索。

默认的，一个文档中的每一个属性都是被索引的（有一个倒排索引）和可搜索的。

搜索类型为doc的文本：一个搜索默认返回十条结果。

复制代码

# curl -X GET "localhost:9200/info_log/doc/_search”
                              索引名称  类型名称

搜索程序名为 statistics的文档：

复制代码

# curl -X GET "localhost:9200/info_log-2018.05.07/doc/_search?q=app:statistics”

使用查询表达式搜索：

复制代码

# curl -X GET "localhost:9200/info_log/doc/_search" -H 'Content-Type: application/json' -d'                 
{
    "query" : {
        "match" : {
            "app" : "statistics"         
        }
    }
}

添加过滤器的查询：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith"
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            }
        }
    }
}
'

相似匹配：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}
'

Elasticsearch 默认按照相关性得分排序，即每个文档跟查询的匹配程度。

同时匹配多个词语(仅匹配同时包含 "rock" 和 "climbing" ，并且二者以短语 "rock climbing" 的形式紧挨着)：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}
'

高亮搜索：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
            "about" : {}
        }
    }
}
'

聚合(以interests字段来进行聚合)：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}
'

聚合汇总(查询特定兴趣爱好员工的平均年龄)：

复制代码

curl -X GET "localhost:9200/megacorp/employee/_search" -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}
'

Elasticsearch 尽可能地屏蔽了分布式系统的复杂性。

分配文档到不同的容器或分片中，文档可以储存在一个或多个节点中
按集群节点来均衡分配这些分片，从而对索引和搜索过程进行负载均衡
复制每个分片以支持数据冗余，从而防止硬件故障导致的数据丢失
将集群中任一节点的请求路由到存有相关数据的节点
集群扩容时无缝整合新节点，重新分配分片以便从离群节点恢复