【搜索引擎】Elasticsearch（二）：基于function_score的搜索排序

- [一、索引设计（沿用之前的 `user` 索引）](#一、索引设计（沿用之前的 user 索引）)
- 二、加权排序查询
- - 设计思路
  - 查询命令
- 三、预期返回结构
- 四、动态调整权重与平滑
- - [示例：使用 `script_score` 自定义公式](#示例：使用 script_score 自定义公式)
- 五、其他注意事项

一、索引设计（沿用之前的 `user` 索引）

bash 复制代码

curl -s -u "$AUTH" -X PUT "$ES/user" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "nickname_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "uid": { "type": "keyword" },
      "nickname": {
        "type": "text",
        "analyzer": "nickname_analyzer",
        "fields": { "keyword": { "type": "keyword" } }
      },
      "fans_count": { "type": "long" },
      "wealth_level": { "type": "integer" },
      "live_level": { "type": "integer" }
    }
  }
}
' | jq .

插入测试数据（含不同等级的多个用户）：

bash 复制代码

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1001", "nickname": "Zhang Sanfeng", "fans_count": 1500000, "wealth_level": 80, "live_level": 75}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1002", "nickname": "Zhang Wuji", "fans_count": 980000, "wealth_level": 92, "live_level": 88}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1003", "nickname": "Linghu Chong", "fans_count": 320000, "wealth_level": 65, "live_level": 70}
' | jq .

二、加权排序查询

设计思路

匹配度 ：来自 match 查询的 _score（越高越相关）。
粉丝数 ：值范围大（数十万到百万），使用 log(1 + fans_count) 平滑，防止大数完全主导。
财富等级：范围 1-100，线性使用。
直播等级：范围 1-100，线性使用。

权重分配示例（可根据业务调整）：

匹配度权重：2.0
粉丝数权重：1.0
财富等级权重：1.5
直播等级权重：1.2

最终得分 = _score * 2.0 + log(1 + fans_count) * 1.0 + wealth_level * 1.5 + live_level * 1.2

查询命令

bash 复制代码

curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "match": { "nickname": "Zhang" }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "fans_count",
            "modifier": "log1p",
            "factor": 1.0
          },
          "weight": 1.0
        },
        {
          "field_value_factor": {
            "field": "wealth_level",
            "modifier": "none",
            "factor": 1.0
          },
          "weight": 1.5
        },
        {
          "field_value_factor": {
            "field": "live_level",
            "modifier": "none",
            "factor": 1.0
          },
          "weight": 1.2
        }
      ],
      "score_mode": "sum",
      "boost_mode": "sum",
      "boost": 2.0
    }
  }
}
' | jq .

参数解释：

field_value_factor：对数值字段进行转换，log1p = log(1 + value)，none 表示原值。
weight：每个 function 的权重。
score_mode: sum：将所有 function 的得分相加。
boost_mode: sum：将原始 _score（匹配度）与 function 总分相加。同时 boost: 2.0 相当于原始 _score 乘以 2 再加。
最终公式：final_score = 2.0 * original_score + (log1p(fans_count) * 1.0) + (wealth_level * 1.5) + (live_level * 1.2)

注意：boost_mode 和 score_mode 有多种组合，这里用 sum 实现线性加权。

三、预期返回结构

由于不同用户的数值差异，排序结果可能与纯匹配度不同。例如：

用户	匹配度（原始 _score）	fans_count	wealth_level	live_level	最终得分
Zhang Wuji	0.5	980000 (log≈13.8)	92	88	0.52 + 13.8 1 + 921.5 + 881.2 = 1 + 13.8 + 138 + 105.6 = 258.4
Zhang Sanfeng	0.6	1500000 (log≈14.2)	80	75	1.2 + 14.2 + 120 + 90 = 225.4
Linghu Chong	0 (不匹配，不会返回)	-	-	-	-

实际上只有匹配关键词 "Zhang" 的文档才会进入结果。预期返回中 hits 按 _score（最终得分）降序排列。

返回结构示例（美化后）：

json 复制代码

{
  "took": 5,
  "timed_out": false,
  "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 },
  "hits": {
    "total": { "value": 2, "relation": "eq" },
    "max_score": 258.4,
    "hits": [
      {
        "_index": "user",
        "_id": "...",
        "_score": 258.4,
        "_source": {
          "uid": "1002",
          "nickname": "Zhang Wuji",
          "fans_count": 980000,
          "wealth_level": 92,
          "live_level": 88
        }
      },
      {
        "_index": "user",
        "_id": "...",
        "_score": 225.4,
        "_source": {
          "uid": "1001",
          "nickname": "Zhang Sanfeng",
          "fans_count": 1500000,
          "wealth_level": 80,
          "live_level": 75
        }
      }
    ]
  }
}

可以看到，虽然 Zhang Sanfeng 匹配度稍高（0.6 vs 0.5），但 Zhang Wuji 凭借更高的财富等级和直播等级反超，排在了前面。

四、动态调整权重与平滑

修改权重 ：调整 boost（匹配度权重）和各个 weight。
使用对数平滑 ：modifier: "log1p" 已经将粉丝数压缩，避免百万级数字主导。
如果需要更复杂的归一化 （例如最大最小值归一化），可以使用 script_score，但性能较差。

示例：使用 `script_score` 自定义公式

json 复制代码

"script_score": {
  "script": {
    "source": """
      double matchScore = _score;
      double fans = Math.log(1 + doc['fans_count'].value);
      double wealth = doc['wealth_level'].value;
      double live = doc['live_level'].value;
      return matchScore * 2.0 + fans * 1.0 + wealth * 1.5 + live * 1.2;
    """
  }
}

但 field_value_factor 性能更好，优先使用。

五、其他注意事项

空查询或全匹配 ：如果查询条件为空（match_all），匹配度得分恒为 1.0，此时加权排序仍然有效。
分页稳定性 ：当多个文档得分相同时，可添加辅助排序字段（如 uid 升序）保证顺序稳定。
json 复制代码
```
"sort": [
  { "_score": "desc" },
  { "uid": "asc" }
]
```
性能：function_score 会为每个匹配文档计算函数，若结果集很大，可考虑将数值字段预先归一化后存储，或使用 rescore 阶段二次排序。

【搜索引擎】Elasticsearch（二）：基于function_score的搜索排序

目录

一、索引设计（沿用之前的 user 索引）

二、加权排序查询

设计思路

查询命令

三、预期返回结构

四、动态调整权重与平滑

示例：使用 script_score 自定义公式

五、其他注意事项

一、索引设计（沿用之前的 `user` 索引）

示例：使用 `script_score` 自定义公式