一、数据聚合
1. 聚合的分类
①聚合可以对文档数据的统计,分析,运算
②聚合的分类
- 桶Bucket聚合:对文档按照字段分组
- 度量Metric聚合:计算最大值,最小值,平均值
- 管道pipeline聚合:以聚合的结果为基础聚合
③聚合的类型
- keyword
- 数值
- 日期
- 布尔
2.DSL 实现Bucket聚合
案例:统计所有数据中酒店的品牌有几种,根据酒店品牌做聚合
得到的结果是从大到小
①结果排序
②限定聚合范围,加上query条件
// 只对200元以下的文档聚合
总结
①aggs表示聚合,与query同级。Query的作用是添加过滤条件。
②聚合三要素
③聚合的属性
size:聚合结果的数量
order:排序方式
field:聚合字段
3.DSL 实现Metrics聚合
获取每个品牌(分类)的用户评分的min,max,avg,sum等值
结果如下
4. RestAPI 实现聚合
DSL构造
结果解析
代码
java
@Test
void test() throws IOException{
SearchRequest request = new SearchRequest("hotel");
request.source().size(0);//不显示具体的文档内容
request.source().aggregation(AggregationBuilders
.terms("brand_agg")
.field("brand")
.size(20));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 2.解析聚合结果
Aggregations aggregations = response.getAggregations();
// 根据名称得到Buckets
Terms brandTerms = aggregations.get("brand_agg");
// 获取桶
List<? extends Terms.Bucket> buckets = brandTerms.getBuckets();
// 遍历
for (Terms.Bucket bucket : buckets) {
String brandName = bucket.getKeyAsString();
System.out.println(brandName);
}
}
5. 案例:在IUserService中定义方法,实现对品牌、城市、星级的聚合,展示给前端页面关键字
java
/**
* 前端展示的聚合字段
*/
@Override
public Map<String, List<String>> filters() throws IOException {
// 创建map对象
Map<String, List<String>> map = new HashMap();
// 创建request请求
SearchRequest request = new SearchRequest("hotel");
// 组织DSL size,agg,
request.source().size(0);
// 聚合
buildAggregation(request);
// 发送请求得到响应
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 解析
Aggregations aggregations = response.getAggregations();
// 解析1:city
List<String> cityList = getAggList(aggregations,"cityAgg");
List<String> brandList = getAggList(aggregations,"brandAgg");
List<String> starList = getAggList(aggregations,"starAgg");
map.put("城市",cityList);
map.put("品牌",brandList);
map.put("星级",starList);
return map;
}
/**
* 代码抽取:获取数据
* @param aggregations
* @param aggName
* @return
*/
private List<String> getAggList(Aggregations aggregations,String aggName) {
Terms cityTerms = aggregations.get(aggName);
final List<? extends Terms.Bucket> buckets = cityTerms.getBuckets();
List<String> list = new ArrayList();
for (Terms.Bucket bucket : buckets) {
list.add(bucket.getKeyAsString());
}
return list;
}
/**
* 代码抽取:聚合
*
* @param request
*/
private void buildAggregation(SearchRequest request) {
request.source().aggregation(AggregationBuilders
.terms("brandAgg")
.field("brand")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("cityAgg")
.field("city")
.size(100)
);
request.source().aggregation(AggregationBuilders
.terms("starAgg")
.field("starName")
.size(100)
);
}
二、自动补全
1. 安装拼音分词器
要实现根据字母做补全,就必须对文档按照拼音分词。
elasticsearch的拼音分词插件GitHub - medcl/elasticsearch-analysis-pinyin: This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
安装方式分三步:
① 解压
② 上传到虚拟机中,elasticsearch的plugin目录
③ 重启elasticsearch
④ 测试
java
POST /_analyze
{
"text": "如家酒店",
"analyzer": "pinyin"
}
2. 自定义分词器
以上的不够智能,需要改进。
拼音都是单个字或者全部一句话的首字母
①es分词器analyzer的组成:
- character filters:删除字符、替换字符
- tokenizer:将文本按照一定的规则切割成词条(term)。例如keyword,就是不分词;还有ik_smart
- tokenizer filter:将tokenizer输出的词条做进一步处理。例如大小写转换、同义词处理、拼音处理等
②自定义分词器
- 通过settings来配置自定义的analyzer(分词器)。创建索引库test,name字段使用自定义分词器
java
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
测试
java
POST /test/_analyze
{
"text": "如家酒店",
"analyzer": "my_analyzer"
}
结果中英文都有
所以创建倒排索引(添加)的时候可以使用拼音分词,搜索的时候不能使用,因为容易搜到同音字。两个要分别创建
③字段在创建倒排索引时使用自定义拼音分词器(多个搜索的可能),字段在搜索的时候使用ik_smart分词器(中文只能匹配一种,拼音缩写是可能是多种)
总结
①如何自定义分词器?
创建索引库时,在settings中配置,可以包含三部分
character filter
tokenizer
filter
②拼音分词器注意事项?
为了避免搜索到同音字,搜索时不要使用拼音分词器
3. completion suggester 查询实现自动补全
①es提供completion suggester查询实现自动补全功能。这个查询会匹配用户输入的内容开头词条并返回。
参与补全查询的字段必须是completion类型。
字段的内容一般是用来补全的多个词条形成的数组。
查询
4. 实现hotel索引库的自动补全、拼音搜索功能
实现思路如下:
- 修改hotel索引库结构,设置自定义拼音分词器
- 修改索引库的name、all字段,使用自定义分词器
- 索引库添加一个新字段suggestion,类型为completion类型,使用自定义的分词器
- 给HotelDoc类添加suggestion字段,内容包含brand、business
- 重新导入数据到hotel库
注意:name、all是可分词的,自动补全的brand、business是不可分词的,要使用不同的分词器组合
①修改hotel索引库结构,设置自定义拼音分词器
java
PUT /hotel
{
"settings": {
"analysis": {
"analyzer": {
"text_anlyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
},
"completion_analyzer": {
"tokenizer": "keyword",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart"
},
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzer"
}
}
}
}
④给HotelDoc类添加suggestion字段,内容包含brand、business
java
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
if (this.business.contains("/")){
// 如果多个商圈
this.suggestion = new ArrayList<>();
suggestion.add(this.brand);
String[] arr = this.business.split("/");
Collections.addAll(this.suggestion,arr);
}else {
this.suggestion = Arrays.asList(this.brand,this.business);
}
}
}
⑤具体实现
java
/**
* 自动补全
*
* @param key
* @return
*/
@Override
public List<String> getSuggestion(String key) {
List<String> suggestionList = new ArrayList<>();
// 创建请求
SearchRequest request = new SearchRequest("hotel");
// 编写DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"hotelSuggestion",
SuggestBuilders.completionSuggestion("suggestion")
.prefix(key)
.skipDuplicates(true)
.size(10)
));
// 发送请求
try {
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
// 解析
Suggest suggest = response.getSuggest();
// 根据名称获取
CompletionSuggestion suggestion = suggest.getSuggestion("hotelSuggestion");
// 遍历
for (CompletionSuggestion.Entry.Option option : suggestion.getOptions()) {
String text = option.getText().string();
suggestionList.add(text);
}
} catch (IOException e) {
e.printStackTrace();
}
return suggestionList;
}
三、数据同步
数据同步问题分析
es的数据来自mysql,当mysql发生改变时,es的数据要同时改变。E s与mysql之间要数据同步
异步通知
优点:低耦合,实现难度一般。
缺点:依赖mq的可靠性
案例:利用MQ实现mysql与elasticsearch数据同步
利用课前资料提供的hotel-admin项目作为酒店管理的微服务。当酒店数据发生增、删、改时,要求对elasticsearch中数据也要完成相同操作。
步骤:
- 导入课前资料提供的hotel-admin项目,启动并测试酒店数据的CRUD
- 声明exchange、queue、RoutingKey
- 在hotel-admin中的增、删、改业务中完成消息发送
- 在hotel-demo中完成消息监听,并更新elasticsearch中数据
- 启动并测试数据同步功能
- 声明exchange、queue、RoutingKey
当酒店发生增改,删除时发消息
消费者:hotel-demo声明exchange
①导入坐标
java
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-amqp</artifactId>
</dependency>
②配置文件
③声明队列交换机
MqConstants常量
java
public class MqConstants {
/**
* 交换机
*/
public final static String HOTEL_EXCHANGE = "hotel.topic";
/**
* 监听队列
* 新增,修改
*/
public final static String HOTEL_INSERT_QUEUE = "hotel.insert.queue";
/**
* 监听队列
* 删除
*/
public final static String HOTEL_DELETE_QUEUE = "hotel.delete.queue";
/**
* 新增或修改的RoutingKey
*/
public final static String HOTEL_INSERT_KEY = "hotel.insert";
/**
* 删除RoutingKey
*/
public final static String HOTEL_DELETE_KEY = "hotel.delete";
}
基于bean声明交换机对象
java
@Configuration
public class MqConfiguration {
/**
* 交换机定义
*
* @return
*/
@Bean
public TopicExchange topicExchange() {
return new TopicExchange(MqConstants.HOTEL_EXCHANGE, true, false);
}
/**
* 队列定义:insert和update
*/
@Bean
public Queue insertQueue() {
return new Queue(MqConstants.HOTEL_INSERT_QUEUE, true);
}
/**
* 队列定义:delete
*/
@Bean
public Queue deleteQueue() {
return new Queue(MqConstants.HOTEL_DELETE_QUEUE, true);
}
/**
* 绑定关系 insertQueue
* bind队列--to交换机--with
*/
@Bean
public Binding insertQueueBinding() {
return BindingBuilder.bind(insertQueue()).to(topicExchange()).with(MqConstants.HOTEL_INSERT_KEY);
}
/**
* 绑定关系 deleteQueue
* bind队列--to交换机--with
*/
@Bean
public Binding deleteQueueBinding() {
return BindingBuilder.bind(deleteQueue()).to(topicExchange()).with(MqConstants.HOTEL_DELETE_KEY);
}
}
2. 在hotel-admin中的增、删、改业务中完成消息发送
①复制MqConstants常量到hotel-admin
②同样导入队列amqp的maven坐标
③同样配置amqp地址
④消息发送的代码在controller
新增/修改
java
@Autowired
private RabbitTemplate rabbitTemplate;
用于构建发送和接收消息的客户端应用程序
java
@PostMapping
public void saveHotel(@RequestBody Hotel hotel){
hotelService.save(hotel);
rabbitTemplate.convertAndSend(MqConstants.HOTEL_EXCHANGE,MqConstants.HOTEL_INSERT_KEY,hotel.getId());
}
发送消息rabbitTemplate.convertAndSend(交换机,新增RoutingKey,消息内容(酒店id))
删除
java
@DeleteMapping("/{id}")
public void deleteById(@PathVariable("id") Long id) {
hotelService.removeById(id);
rabbitTemplate.convertAndSend(MqConstants.HOTEL_EXCHANGE,MqConstants.HOTEL_DELETE_KEY,id);
}
3. 在hotel-demo中完成消息监听,并更新elasticsearch中数据
①创建监听类HotelListener,定义消费者,接收消息
java
@Component
public class HotelListener {
@Autowired
private IHotelService hotelService;
/**
* 监听新增或修改的业务
* @param id
*/
@RabbitListener(queues = MqConstants.HOTEL_INSERT_QUEUE)
public void listenHotelInsertOrUpdate(Long id){
hotelService.insertById(id);
}
/**
* 监听删除业务
* @param id
*/
@RabbitListener(queues = MqConstants.HOTEL_DELETE_QUEUE)
public void listenHotelDelete(Long id) {
// 删除索引库
hotelService.deleteById(id);
}
}
新增或修改hotelService.insertById(id);
java
@Override
public void insertById(Long id) {
try {
// 0.根据id查询数据
Hotel hotel = getById(id);
HotelDoc hotelDoc = new HotelDoc(hotel);
// 1.准备Request
IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
// 2.准备DSL
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
// 3.发送请求
restHighLevelClient.index(request,RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
删除索引hotelService.deleteById(id);
java
@Override
public void deleteById(Long id) {
try {
// 1.准备request
DeleteRequest request = new DeleteRequest("hotel",id.toString());
// 2.发送请求
restHighLevelClient.delete(request,RequestOptions.DEFAULT);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
4. 启动并测试数据同步功能