背景：希望做一个全文搜索的功能，主要目的是为了对附件进行检索

进行使用过程总结，避免重复学习

1.熟悉基础API

首先可以先熟悉基本的创建索引、配置映射、写入文档、简单检索。

2.了解管道功能，进行附件处理

单附件管道

http 复制代码

PUT _ingest/pipeline/attachment
{
  "description": "Extract attachment information",
  "processors": [
    {
      "attachment": {
        "field": "content",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "content"
      }
    }
  ]
}

这个很好理解，将attachment对象中content字段进行解析，然后通过remove移除源字段content。

多附件管道

http 复制代码

PUT _ingest/pipeline/test_attachment
{
  "description": "Extract attachment information",
  "processors": [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.content",
            "target_field": "_ingest._value.attachment"
          }
        }
      }
    },
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "remove": {
            "field": [
              "_ingest._value.content"
            ]
          }
        }
      }
    }
  ]
}

添加多个附件后，文档如图所示

该管道第一个foreach功能表示使用attachment处理器, 将attachments中每条记录的content通过管道解析后抽取到 attachments - attachment(该对象下便是解析后的文件信息)

单附件多附件同时处理

合并后就可以同时处理单附件和多附件了, 合并后效果如下

http 复制代码

PUT _ingest/pipeline/test_attachment
{
  "description": "Extract attachment information",
  "processors": [
    
    {
      "attachment": {
        "field": "content",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "content"
      }
    },
    
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "field": "_ingest._value.content",
            "target_field": "_ingest._value.attachment"
          }
        }
      }
    },
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "remove": {
            "field": [
              "_ingest._value.content"
            ]
          }
        }
      }
    }
  ]
}

示例:

写入文档时注意添加索引

ini 复制代码

索引名称/_doc?pipeline=管道名称

传3个附件 content: 附件1 attachments: 多附件列表(附件2 附件3)

http 复制代码

POST test_create_index/_doc?pipeline=test_attachment
{
  "content": "eyJzdWNjZXNzIjpmYWxzZSwiY29kZSI6MCwibXNnIjoi5LiL6L295aSx6LSlIn0=",
  "attachments": [
    {
      "content": "aGFoYeS9oOWlvWhlbGxvd29ybGQ="
    },
    {
      "content": "aGFoYeS9oOWlvWhlbGxvd29ybGQ="
    }
  ]
}

最终写入的文档

json 复制代码

{
  "_index" : "test_create_index",
  "_type" : "_doc",
  "_id" : "-H0eqowB6o1OPMJBsiNg",
  "_version" : 1,
  "_seq_no" : 46,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "attachments" : [
      {
        "attachment" : {
          "content_type" : "text/plain; charset=UTF-8",
          "language" : "hu",
          "content" : "文档2",
          "content_length" : 17
        }
      },
      {
        "attachment" : {
          "content_type" : "text/plain; charset=UTF-8",
          "language" : "et",
          "content" : "文档3",
          "content_length" : 607
        }
      }
    ],
    "attachment" : {
      "content_type" : "text/plain; charset=UTF-8",
      "language" : "gl",
      "content" : "文档1",
      "content_length" : 40
    }
  }
}

3.总结

所以在处理数据时，我们可以为每个索引构建一个专属于它使用的管道，来处理它的附件信息

所以在构建文档时需要先处理数据

假设一个文章对象 Article

java 复制代码

class Article {
    String id;
    String name;
    File img; //封面
    List<File> attachments; // 附件
}

现在获取到数据如下

json 复制代码

 [
     {
         "id":"1",
         "name":"今日头条",
         "img": "<file>",
         "attachments": [
             {
                 "fileName":"头条1",
                 "InputStream":""
             },
             {
                 "fileName":"头条2",
                 "InputStream":""
             }
         ]
     },
     {
         "id":"2",
         "name":"下期预告",
         "img": "<file>",
         "attachments": [
             {
                 "fileName":"预告1",
                 "InputStream":""
             },
             {
                 "fileName":"预告2",
                 "InputStream":""
             }
         ]
     }
 ]

我们构造的映射信息

json 复制代码

 "mappings": {
     "properties": {
         "attachments": { // 多附件映射
             "properties": {
                 "attachment": {
                     "properties": {
                         "content_type": {
                             "type": "text",
                             "fields": {
                                 "keyword": {
                                     "ignore_above": 256,
                                     "type": "keyword"
                                 }
                             }
                         },
                         "language": {
                             "type": "text",
                             "fields": {
                                 "keyword": {
                                     "ignore_above": 256,
                                     "type": "keyword"
                                 }
                             }
                         },
                         "content": {
                             "type": "text",
                             "fields": {
                                 "keyword": {
                                     "ignore_above": 256,
                                     "type": "keyword"
                                 }
                             }
                         },
                         "content_length": {
                             "type": "long"
                         }
                     }
                 }
             }
         },
         "attachment": { // 单附件映射
             "properties": {
                 "date": {
                     "type": "date"
                 },
                 "content_type": {
                     "type": "text",
                     "fields": {
                         "keyword": {
                             "ignore_above": 256,
                             "type": "keyword"
                         }
                     }
                 },
                 "language": {
                     "type": "text",
                     "fields": {
                         "keyword": {
                             "ignore_above": 256,
                             "type": "keyword"
                         }
                     }
                 },
                 "content": {
                     "type": "text"
                 },
                 "content_length": {
                     "type": "long"
                 }
             }
         },
         "createTime": {
             "type": "text"
         },
         "name": {
             "type": "text"
         },
         "id": {
             "type": "keyword"
         },
         "type": {
             "type": "keyword"
         },
         "content": {
             "type": "text",
             "fields": {
                 "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                 }
             }
         }
     }
 }

Elasticsearch附件管道使用总结

1.熟悉基础API

2.了解管道功能，进行附件处理

单附件管道

多附件管道

单附件 多附件 同时处理

3.总结

单附件多附件同时处理