【搜索引擎】Elasticsearch（三）：基于script_score的自定义搜索排序

- 一、索引创建与查询
- [二、使用 `script_score` 的查询（`boost_mode: replace`）](#二、使用 script_score 的查询（boost_mode: replace）)
- - 参数说明
  - 公式解释
- [三、使用 `script_score` 的查询（`boost_mode: sum` + `boost: 2.0`）](#三、使用 script_score 的查询（boost_mode: sum + boost: 2.0）)
- - 参数说明
  - 公式解释
- 四、预期返回结果对比
- 五、进阶：动态权重与参数传递（适用于两种模式）
- - [5.1 `replace` 模式动态权重](#5.1 replace 模式动态权重)
  - [5.2 `sum` 模式动态权重（配合 `boost` 参数）](#5.2 sum 模式动态权重（配合 boost 参数）)
- 六、注意事项

一、索引创建与查询

bash 复制代码

# 创建索引
curl -s -u "$AUTH" -X PUT "$ES/user" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "nickname_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "uid": { "type": "keyword" },
      "nickname": {
        "type": "text",
        "analyzer": "nickname_analyzer",
        "fields": { "keyword": { "type": "keyword" } }
      },
      "fans_count": { "type": "long" },
      "wealth_level": { "type": "integer" },
      "live_level": { "type": "integer" }
    }
  }
}
' | jq .

# 插入测试数据
curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1001", "nickname": "Zhang Sanfeng", "fans_count": 1500000, "wealth_level": 80, "live_level": 75}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1002", "nickname": "Zhang Wuji", "fans_count": 980000, "wealth_level": 92, "live_level": 88}
' | jq .

curl -s -u "$AUTH" -X POST "$ES/user/_doc" -H 'Content-Type: application/json' -d'
{"uid": "1003", "nickname": "Linghu Chong", "fans_count": 320000, "wealth_level": 65, "live_level": 70}
' | jq .

二、使用 `script_score` 的查询（`boost_mode: replace`）

bash 复制代码

curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "match": { "nickname": "Zhang" }
      },
      "script_score": {
        "script": {
          "lang": "painless",
          "source": """
            // 原始匹配得分（由 match 查询计算）
            double matchScore = _score;
            
            // 粉丝数：使用对数平滑（log(1 + fans_count)）
            double fans = Math.log(1 + doc['fans_count'].value);
            
            // 财富等级和直播等级直接使用原值（范围 1-100）
            double wealth = doc['wealth_level'].value;
            double live = doc['live_level'].value;
            
            // 加权求和：匹配度权重2.0，粉丝数权重1.0，财富等级权重1.5，直播等级权重1.2
            double finalScore = matchScore * 2.0 + fans * 1.0 + wealth * 1.5 + live * 1.2;
            
            return finalScore;
          """
        }
      },
      "boost_mode": "replace"   // 用脚本得分替换原始 _score
    }
  }
}
' | jq .

参数说明

参数	值	作用
`script_score.script.source`	Painless 脚本	自定义得分计算公式
`boost_mode`	`replace`	用脚本计算的得分覆盖原始 `_score`（不累加）。
`_score` 在脚本中	自动传入	代表原始查询（`match`）计算出的相关性得分

公式解释

最终得分 =
matchScore × 2.0 （匹配度权重）
+ log(1 + fans_count) × 1.0 （粉丝数对数平滑）
+ wealth_level × 1.5 （财富等级线性）
+ live_level × 1.2 （直播等级线性）

三、使用 `script_score` 的查询（`boost_mode: sum` + `boost: 2.0`）

该模式通过 boost 参数将原始匹配得分乘以系数 2，然后与脚本计算的自定义得分相加，达到与 replace 模式完全相同的结果。

脚本中只负责业务字段的计算，不包含 _score。

bash 复制代码

curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "match": { "nickname": "Zhang" }
      },
      "boost": 2.0,          // 原始 _score 乘以 2
      "script_score": {
        "script": {
          "lang": "painless",
          "source": """
            // 粉丝数：使用对数平滑（log(1 + fans_count)）
            double fans = Math.log(1 + doc['fans_count'].value);
            // 财富等级和直播等级直接使用原值
            double wealth = doc['wealth_level'].value;
            double live = doc['live_level'].value;
            // 返回脚本部分得分（不包含匹配度）
            return fans * 1.0 + wealth * 1.5 + live * 1.2;
          """
        }
      },
      "boost_mode": "sum"    // 最终得分 = boost * _score + 脚本得分
    }
  }
}
' | jq .

参数说明

参数	值	作用
`boost`	`2.0`	原始 `_score` 乘以该系数
`script_score.script.source`	脚本（不含 `matchScore` 项）	仅计算粉丝数、财富等级、直播等级的加权和
`boost_mode`	`sum`	最终得分 = `boost × _score` + 脚本返回值

公式解释

最终得分 =
_score × 2.0 （匹配度权重，由 boost 参数实现）
+ log(1 + fans_count) × 1.0
+ wealth_level × 1.5
+ live_level × 1.2

注意：boost_mode: sum 会将 boost × _score 与脚本返回值相加。如果未显式设置 boost，则默认为 1.0。

四、预期返回结果对比

两种模式现在采用相同的权重系数，因此最终得分完全一致。

假设匹配度如下（示例数值）：

用户	原始 `_score` (match)	fans_count	log(1+fans)	wealth	live	脚本得分	最终得分
Zhang Wuji	0.5	980000	≈13.8	92	88	13.8+138+105.6 = 257.4	0.5×2 + 257.4 = 258.4
Zhang Sanfeng	0.6	1500000	≈14.2	80	75	14.2+120+90 = 224.2	0.6×2 + 224.2 = 225.4

返回 JSON 结构（两种模式结果相同）：

json 复制代码

{
  "hits": {
    "total": { "value": 2, "relation": "eq" },
    "max_score": 258.4,
    "hits": [
      {
        "_score": 258.4,
        "_source": {
          "uid": "1002",
          "nickname": "Zhang Wuji",
          "fans_count": 980000,
          "wealth_level": 92,
          "live_level": 88
        }
      },
      {
        "_score": 225.4,
        "_source": {
          "uid": "1001",
          "nickname": "Zhang Sanfeng",
          "fans_count": 1500000,
          "wealth_level": 80,
          "live_level": 75
        }
      }
    ]
  }
}

五、进阶：动态权重与参数传递（适用于两种模式）

通过 params 将权重作为参数传递，便于动态调整，无需修改脚本。

5.1 `replace` 模式动态权重

bash 复制代码

curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "match": { "nickname": "Zhang" }
      },
      "script_score": {
        "script": {
          "lang": "painless",
          "source": """
            double matchScore = _score;
            double fans = Math.log(1 + doc['fans_count'].value);
            double wealth = doc['wealth_level'].value;
            double live = doc['live_level'].value;
            return matchScore * params.matchWeight + fans * params.fansWeight + wealth * params.wealthWeight + live * params.liveWeight;
          """,
          "params": {
            "matchWeight": 2.0,
            "fansWeight": 1.0,
            "wealthWeight": 1.5,
            "liveWeight": 1.2
          }
        }
      },
      "boost_mode": "replace"
    }
  }
}
' | jq .

5.2 `sum` 模式动态权重（配合 `boost` 参数）

bash 复制代码

curl -s -u "$AUTH" "$ES/user/_search" -H 'Content-Type: application/json' -d'
{
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "match": { "nickname": "Zhang" }
      },
      "boost": 2.0,
      "script_score": {
        "script": {
          "lang": "painless",
          "source": """
            double fans = Math.log(1 + doc['fans_count'].value);
            double wealth = doc['wealth_level'].value;
            double live = doc['live_level'].value;
            return fans * params.fansWeight + wealth * params.wealthWeight + live * params.liveWeight;
          """,
          "params": {
            "fansWeight": 1.0,
            "wealthWeight": 1.5,
            "liveWeight": 1.2
          }
        }
      },
      "boost_mode": "sum"
    }
  }
}
' | jq .

六、注意事项

性能：script_score 会为每个匹配文档执行脚本，如果结果集很大（例如数万），性能会显著下降。建议优先使用 field_value_factor 组合，只有公式复杂时再用脚本。
数值范围归一化 ：粉丝数使用 log(1 + value) 是常见平滑方法；如果希望线性归一化到 [0,1]，可以结合 params.maxFans 进行缩放。
缺失值处理 ：如果某些文档的字段可能缺失，脚本中需使用 doc['field'].size() != 0 ? doc['field'].value : defaultValue 避免报错。
脚本编译：Elasticsearch 会缓存编译后的脚本，但频繁修改脚本内容会消耗资源。
sum 模式与 boost 的配合 ：通过设置 boost 可以灵活调整原始匹配度的权重，而无需修改脚本。这是推荐的做法，逻辑更清晰，维护成本更低。
两种模式等价性 ：当 boost_mode: replace 脚本中包含 _score * W 且 boost_mode: sum 设置 boost: W 且脚本中不含 _score 时，两者完全等价。可根据个人偏好选择。

以上完整示例展示了两种实现相同加权排序的方法，可直接在你的集群中运行测试。

【搜索引擎】Elasticsearch（三）：基于script_score的自定义搜索排序

目录

一、索引创建与查询

二、使用 script_score 的查询（boost_mode: replace）

参数说明

公式解释

三、使用 script_score 的查询（boost_mode: sum + boost: 2.0）

参数说明

公式解释

四、预期返回结果对比

五、进阶：动态权重与参数传递（适用于两种模式）

5.1 replace 模式动态权重

5.2 sum 模式动态权重（配合 boost 参数）

六、注意事项

二、使用 `script_score` 的查询（`boost_mode: replace`）

三、使用 `script_score` 的查询（`boost_mode: sum` + `boost: 2.0`）

5.1 `replace` 模式动态权重

5.2 `sum` 模式动态权重（配合 `boost` 参数）