【搜索引擎】Elasticsearch(一):索引创建、数据插入、请求示例

目录

    • 一、索引设计
      • [1. 用户索引 `user`](#1. 用户索引 user)
      • [2. 视频索引 `video`](#2. 视频索引 video)
      • [3. 直播间索引 `live_room`](#3. 直播间索引 live_room)
    • 二、测试数据构造
    • 三、搜索示例及预期返回结构
      • [1. 用户搜索:按粉丝数降序](#1. 用户搜索:按粉丝数降序)
      • [2. 用户搜索:按财富等级升序](#2. 用户搜索:按财富等级升序)
      • [3. 视频搜索:只返回精选视频,按点赞量降序](#3. 视频搜索:只返回精选视频,按点赞量降序)
      • [4. 视频搜索:正文匹配 + 精选优先 + 点赞量排序](#4. 视频搜索:正文匹配 + 精选优先 + 点赞量排序)
      • [5. 直播间搜索:标题匹配 "Tai Chi",按主播粉丝数降序](#5. 直播间搜索:标题匹配 "Tai Chi",按主播粉丝数降序)
      • [6. 直播间搜索:主播昵称匹配 "Zhang",按直播等级升序](#6. 直播间搜索:主播昵称匹配 "Zhang",按直播等级升序)
      • [7. 综合排序示例:用户按粉丝数降序 + 财富等级降序(多级排序)](#7. 综合排序示例:用户按粉丝数降序 + 财富等级降序(多级排序))
    • 四、清理索引

一、索引设计

1. 用户索引 user

新增字段:

  • fans_count (long):粉丝数
  • wealth_level (integer):财富等级(1-100)
  • live_level (integer):直播等级(1-100)
bash 复制代码
curl -s -u "$AUTH" -X PUT "$ES/user" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "nickname_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "uid": { "type": "keyword" },
      "nickname": {
        "type": "text",
        "analyzer": "nickname_analyzer",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      },
      "avatar": { "type": "keyword", "index": false },
      "fans_count": { "type": "long" },
      "wealth_level": { "type": "integer" },
      "live_level": { "type": "integer" },
      "created_at": { "type": "date" }
    }
  }
}
' | jq .

2. 视频索引 video

新增字段:

  • is_featured (boolean):是否精选
  • like_count (long):历史点赞量(已有,确保为 long)
bash 复制代码
curl -s -u "$AUTH" -X PUT "$ES/video" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "content_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "video_id": { "type": "keyword" },
      "title": { "type": "text", "analyzer": "content_analyzer" },
      "content": { "type": "text", "analyzer": "content_analyzer" },
      "author_uid": { "type": "keyword" },
      "duration": { "type": "integer" },
      "is_featured": { "type": "boolean" },
      "like_count": { "type": "long" },
      "publish_time": { "type": "date" }
    }
  }
}
' | jq .

3. 直播间索引 live_room

新增字段:

  • anchor_fans_count (long):主播粉丝数(冗余存储,便于排序)
  • anchor_wealth_level (integer):主播财富等级
  • anchor_live_level (integer):主播直播等级
bash 复制代码
curl -s -u "$AUTH" -X PUT "$ES/live_room" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "title_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "room_id": { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "title_analyzer",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 512 } }
      },
      "anchor_uid": { "type": "keyword" },
      "anchor_nickname": {
        "type": "text",
        "analyzer": "title_analyzer",
        "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } }
      },
      "status": { "type": "keyword" },
      "viewer_count": { "type": "integer" },
      "anchor_fans_count": { "type": "long" },
      "anchor_wealth_level": { "type": "integer" },
      "anchor_live_level": { "type": "integer" },
      "start_time": { "type": "date" }
    }
  }
}
' | jq .

二、测试数据构造

用户数据(含粉丝数、等级)

bash 复制代码
curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1001", "nickname": "Zhang Sanfeng", "avatar": "avatar1.jpg", "fans_count": 1500000, "wealth_level": 80, "live_level": 75, "created_at": "2024-01-01T00:00:00Z"}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1002", "nickname": "Zhang Wuji", "avatar": "avatar2.jpg", "fans_count": 980000, "wealth_level": 92, "live_level": 88, "created_at": "2024-01-02T00:00:00Z"}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1003", "nickname": "Linghu Chong", "avatar": "avatar3.jpg", "fans_count": 320000, "wealth_level": 65, "live_level": 70, "created_at": "2024-01-03T00:00:00Z"}
' | jq .

视频数据(含是否精选、点赞量)

bash 复制代码
curl -s -u "$AUTH" -X POST "$ES/video/_doc" -H 'Content-Type: application/json' -d'
{
  "video_id": "v001",
  "title": "Tai Chi Chuan Tutorial",
  "content": "Tai Chi is a traditional Chinese martial art that combines slow movements and deep breathing.",
  "author_uid": "1001",
  "duration": 600,
  "is_featured": true,
  "like_count": 125000,
  "publish_time": "2024-02-01T10:00:00Z"
}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/video/_doc" -H 'Content-Type: application/json' -d'
{
  "video_id": "v002",
  "title": "Python Programming for Beginners",
  "content": "Learn Python basics: variables, loops, functions, and data structures.",
  "author_uid": "1002",
  "duration": 1800,
  "is_featured": false,
  "like_count": 89000,
  "publish_time": "2024-02-10T14:30:00Z"
}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/video/_doc" -H 'Content-Type: application/json' -d'
{
  "video_id": "v003",
  "title": "Cooking Braised Pork Belly",
  "content": "Braised pork belly is a classic Chinese dish made with pork belly, sugar, and soy sauce.",
  "author_uid": "1003",
  "duration": 900,
  "is_featured": true,
  "like_count": 234000,
  "publish_time": "2024-02-15T18:00:00Z"
}
' | jq .

直播间数据(含主播粉丝数、等级)

bash 复制代码
curl -s -u "$AUTH" -X POST "$ES/live_room/_doc" -H 'Content-Type: application/json' -d'
{
  "room_id": "r001",
  "title": "Tai Chi Class with Zhang Sanfeng",
  "anchor_uid": "1001",
  "anchor_nickname": "Zhang Sanfeng",
  "status": "live",
  "viewer_count": 1234,
  "anchor_fans_count": 1500000,
  "anchor_wealth_level": 80,
  "anchor_live_level": 75,
  "start_time": "2024-03-01T20:00:00Z"
}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/live_room/_doc" -H 'Content-Type: application/json' -d'
{
  "room_id": "r002",
  "title": "Qian Kun Great Move by Zhang Wuji",
  "anchor_uid": "1002",
  "anchor_nickname": "Zhang Wuji",
  "status": "ended",
  "viewer_count": 5678,
  "anchor_fans_count": 980000,
  "anchor_wealth_level": 92,
  "anchor_live_level": 88,
  "start_time": "2024-03-02T19:00:00Z"
}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/live_room/_doc" -H 'Content-Type: application/json' -d'
{
  "room_id": "r003",
  "title": "Linghu Chong Playing Guqin",
  "anchor_uid": "1003",
  "anchor_nickname": "Linghu Chong",
  "status": "live",
  "viewer_count": 890,
  "anchor_fans_count": 320000,
  "anchor_wealth_level": 65,
  "anchor_live_level": 70,
  "start_time": "2024-03-03T21:30:00Z"
}
' | jq .

三、搜索示例及预期返回结构

1. 用户搜索:按粉丝数降序

请求:搜索昵称包含 "Zhang" 的用户,按粉丝数从高到低排序

bash 复制代码
curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "nickname": "Zhang" } },
  "sort": [
    { "fans_count": { "order": "desc" } }
  ]
}
' | jq .

预期返回结构hits 部分):

json 复制代码
{
  "hits": {
    "total": { "value": 2, "relation": "eq" },
    "max_score": null,
    "hits": [
      {
        "_index": "user",
        "_source": { "uid": "1001", "nickname": "Zhang Sanfeng", "fans_count": 1500000, ... },
        "sort": [1500000]
      },
      {
        "_index": "user",
        "_source": { "uid": "1002", "nickname": "Zhang Wuji", "fans_count": 980000, ... },
        "sort": [980000]
      }
    ]
  }
}

2. 用户搜索:按财富等级升序

bash 复制代码
curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "wealth_level": { "order": "asc" } }
  ]
}
' | jq .

3. 视频搜索:只返回精选视频,按点赞量降序

bash 复制代码
curl -s -u "$AUTH" "$ES/video/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "term": { "is_featured": true } },
  "sort": [
    { "like_count": { "order": "desc" } }
  ]
}
' | jq .

预期返回:v003 (like 234000) 在前,v001 (125000) 在后。

4. 视频搜索:正文匹配 + 精选优先 + 点赞量排序

bash 复制代码
curl -s -u "$AUTH" "$ES/video/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match": { "content": "Chinese" } },
      "filter": { "term": { "is_featured": true } }
    }
  },
  "sort": [
    { "like_count": { "order": "desc" } }
  ]
}
' | jq .

5. 直播间搜索:标题匹配 "Tai Chi",按主播粉丝数降序

bash 复制代码
curl -s -u "$AUTH" "$ES/live_room/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "title": "Tai Chi" } },
  "sort": [
    { "anchor_fans_count": { "order": "desc" } }
  ]
}
' | jq .

6. 直播间搜索:主播昵称匹配 "Zhang",按直播等级升序

bash 复制代码
curl -s -u "$AUTH" "$ES/live_room/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "anchor_nickname": "Zhang" } },
  "sort": [
    { "anchor_live_level": { "order": "asc" } }
  ]
}
' | jq .

7. 综合排序示例:用户按粉丝数降序 + 财富等级降序(多级排序)

bash 复制代码
curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "fans_count": { "order": "desc" } },
    { "wealth_level": { "order": "desc" } }
  ]
}
' | jq .

四、清理索引

bash 复制代码
curl -s -u "$AUTH" -X DELETE "$ES/user" | jq .
curl -s -u "$AUTH" -X DELETE "$ES/video" | jq .
curl -s -u "$AUTH" -X DELETE "$ES/live_room" | jq .

配方:

需求:我要设计一个可以通过"(1)用户uid、用户昵称搜用户;(2)视频正文搜视频,(3)直播间标题和主播uid、主播昵称搜直播间。""用户和直播间支持粉丝数、主播财富等级、主播直播等级排序,视频支持是否精选,历史点赞量排序"的搜索引擎,

问:怎么设计索引

PS:以及给我构造一些测试数据,给出curl,并给出预期返回结构(注意使用内置分词器)


相关推荐
迦蓝叶21 小时前
【开源自荐】JAiRouter:一个轻量级 AI 模型服务网关的开源实践
java·人工智能·spring·开源·llm-gateway·mass
卷Java21 小时前
混合检索让RAG召回率从62%干到89%
深度学习
Java知识技术分享21 小时前
opencode安装ui-ux-pro-max和frontend-ui-ux技能
人工智能·ui·个人开发·ai编程·ux
苏映视官方账号21 小时前
精品案例丨方寸之间,“微” 毫毕现 —— 圆刀机高精度检测工艺优化实例
人工智能·数码相机·视觉检测·制造
Cloud_Shy61821 小时前
解读《Effective Python 3rd Edition》:从练气到老魔(第六章 Item 40 - 43)
android·开发语言·人工智能·笔记·python·学习方法
Sammyyyyy21 小时前
月之暗面 Kimi Code 0.4.0 发布,终端 AI 编码助手全面采用 TypeScript,实现毫秒级启动
前端·javascript·人工智能·ai·typescript·servbay
装不满的克莱因瓶21 小时前
掌握生成对抗网络(GAN)的优化目标与评估指标——从博弈函数到生成质量衡量体系
人工智能·python·深度学习·算法·机器学习
whyfail21 小时前
小米 MiMo Code 开源:能免费用 2.5 模型的 AI 编程 Agent
人工智能
技术小黑21 小时前
CNN算法实战系列06 | InceptionV1实现猴痘病识别
深度学习·算法·cnn·inceptionv1
慕木沐1 天前
【Spring AI + Google ADK 】流式输出时 outputKey 状态缓存失败的问题
人工智能·spring·缓存