使用阿里云试用Elasticsearch学习:使用内置模型 lang_ident_model_1 创建管道并使用

文档:https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-deploy-model.html

部署刚刚下载好的内置模型


部署内存不够用

还得花钱,拉几把倒吧。就用自带的吧。

测试模型

bash 复制代码
POST _ml/trained_models/lang_ident_model_1/_infer
{
  "docs":[{"text": "The fool doth think he is wise, but the wise man knows himself to be a fool."}]
}

以下是高概率预测英语的结果。

bash 复制代码
{
  "inference_results": [
    {
      "predicted_value": "en",
      "prediction_probability": 0.9999658805366392,
      "prediction_score": 0.9999658805366392
    }
  ]
}

创建管道

添加处理器

reference 推理

bash 复制代码
# Field map
{
  "message": "text"
}
# Inference configuration
{
  "classification":{
    "num_top_classes":5
  }
}

set 设置

bash 复制代码
#  field 
event.ingested
# value 
{{{_ingest.timestamp}}}

失败处理器

测试

bash 复制代码
[
  {
    "_source": {
    "text_field":"Hello, my name is Josh and I live in Berlin."
    }
  }
]
bash 复制代码
[
 {
   "_source":{
     "message":"Sziasztok! Ez egy rövid magyar szöveg. Nézzük, vajon sikerül-e azonosítania a language identification funkciónak? Annak ellenére is sikerülni fog, hogy a szöveg két angol szót is tartalmaz."
     }
  }
]


测试没问题,创建管道

使用

安装插件

注意版本号与es版本一直,都是8.9.1。安装完会自行重启。
下载mapper-annotated-text安装包

映射索引

注意message字段别写错

bash 复制代码
PUT ner-test
{
  "mappings": {
    "properties": {
      "ml.inference.predicted_value": {"type": "annotated_text"},
      "ml.inference.model_id": {"type": "keyword"},
      "message": {"type": "text"},
      "event.ingested": {"type": "date"}
    }
  }
}

索引文档

通过管道 lang_ident_model_1 索引一批文档

bash 复制代码
POST /_bulk?pipeline=lang_ident_model_1
{"create":{"_index":"ner-test","_id":"1"}}
{"message":"Hello, my name is Josh and I live in Berlin."}
{"create":{"_index":"ner-test","_id":"2"}}
{"message":"I work for Elastic which was founded in Amsterdam."}
{"create":{"_index":"ner-test","_id":"3"}}
{"message":"Elastic has headquarters in Mountain View, California."}
{"create":{"_index":"ner-test","_id":"4"}}
{"message":"Elastic's founder, Shay Banon, created Elasticsearch to solve a simple need: finding recipes!"}
{"create":{"_index":"ner-test","_id":"5"}}
{"message":"Elasticsearch is built using Lucene, an open source search library."}

或者用query

bash 复制代码
POST lang-test/_doc?pipeline=ner-test
{
  "message": "Mon pays ce n'est pas un pays, c'est l'hiver"
}

查看数据

bash 复制代码
"hits": [
      {
        "_index": "ner-test",
        "_id": "1",
        "_score": 1,
        "_source": {
          "message": "Hello, my name is Josh and I live in Berlin.",
          "event": {
            "ingested": "2024-04-13T20:31:48.855089336Z"
          },
          "ml": {
            "inference": {
              "predicted_value": "en",
              "top_classes": [
                {
                  "class_name": "en",
                  "class_probability": 0.9854748734614491,
                  "class_score": 0.9854748734614491
                },
                {
                  "class_name": "tg",
                  "class_probability": 0.003855695585908385,
                  "class_score": 0.003855695585908385
                },
                {
                  "class_name": "ig",
                  "class_probability": 0.0036940515396614113,
                  "class_score": 0.0036940515396614113
                },
                {
                  "class_name": "sw",
                  "class_probability": 0.0021393582129747924,
                  "class_score": 0.0021393582129747924
                },
                {
                  "class_name": "it",
                  "class_probability": 0.0011839650697029283,
                  "class_score": 0.0011839650697029283
                }
              ],
              "prediction_probability": 0.9854748734614491,
              "prediction_score": 0.9854748734614491,
              "model_id": "lang_ident_model_1"
            }
          }
        }
      },......

文档重新索引到新目标

bash 复制代码
POST _reindex
{
  "source": {
    "index": "ner-test-new",
    "size": 50
  },
  "dest": {
    "index": "ner-test",
    "pipeline": "lang_ident_model_1"
  }
}
相关推荐
Elastic 中国社区官方博客12 小时前
用 Elasticsearch 构建一个 ChatGPT connector 来查询 GitHub issues
大数据·人工智能·elasticsearch·搜索引擎·chatgpt·github·全文检索
武子康12 小时前
大数据-172 Elasticsearch 索引操作与 IK 分词器落地实战:7.3/8.15 全流程速查
大数据·后端·elasticsearch
Elasticsearch13 小时前
Elastic 与 Accenture 在 GenAI 数据准备方面的合作
elasticsearch
Elastic 中国社区官方博客13 小时前
Elasticsearch:在隔离环境中安装 ELSER 模型
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
Dxy123931021614 小时前
Elasticsearch删除数据介绍
大数据·elasticsearch·搜索引擎
泻水置平地14 小时前
Docker下安装ES和kibana详细教程
elasticsearch·docker·容器
Wang's Blog15 小时前
Elastic Stack梳理:深度解析Elasticsearch分布式查询机制与相关性算分优化实践
分布式·elasticsearch
yumgpkpm15 小时前
腾讯TBDS和CMP(Cloud Data AI Platform,类Cloudera CDP,如华为鲲鹏 ARM 版)比较的缺陷在哪里?
hive·hadoop·elasticsearch·zookeeper·oracle·kafka·hbase
Elasticsearch16 小时前
Elasticsearch:专用向量数据库很快就会被遗忘,事实上它从未流行过
elasticsearch
Elasticsearch16 小时前
ES|QL 在 9.2:智能查找连接和时间序列支持
elasticsearch