作者:来自 Elastic Andre Luiz
探索使用机器学习模型与传统硬编码方法在搜索体验中自动创建筛选器和分类标签的优缺点
筛选器和分类标签是用来优化搜索结果的机制,帮助用户更快速地找到相关内容或产品。在传统方法中,规则是手动定义的。例如,在一个电影目录中,像类型这样的属性是预先定义好的,用于筛选器和分类标签。另一方面,使用 AI 模型可以根据电影的特征自动提取新的属性,使整个过程更加动态和个性化。在这篇博客中,我们将探讨每种方法的优缺点,重点介绍它们的应用场景和面临的挑战。
筛选器和分类标签
在开始之前,我们先来定义一下什么是筛选器和分类标签。筛选器是用于限制结果集的预定义属性。在一个在线市场中,例如,即使用户还没有进行搜索,筛选器也已经可用。用户可以在搜索"PS5"之前先选择一个类别,比如 "Video games - 电子游戏",这样可以将搜索范围限定在一个更具体的子集,而不是整个数据库中。这大大提高了获得相关结果的几率。
Facets 的作用类似于 filters(筛选器),但它们只会在执行搜索之后才可用。换句话说,搜索会先返回结果,然后根据这些结果生成一个新的 refinement options(精炼选项)列表。例如,当搜索一个 PS5 console ( PS5 主机 )时,系统可能会显示一些 facets(分类),比如 storage capacity ( 存储容量 )、 shipping cost ( 运费 )和 color ( 颜色),以帮助用户选择理想的产品。
现在我们已经定义了 filters(筛选器)和 facets(分类),接下来我们来讨论经典方法和基于机器学习( Machine Learning ,简称 ML)的方法对它们实现和使用上的影响。这两种方法各有优点和挑战,会影响搜索的效率。
经典方法(Classical approach)
在这种方法中,filters 和 facets 是根据预先定义的规则手动设定的。这意味着用于精炼搜索的属性是固定的,需要事先根据产品目录结构和用户需求进行规划。
例如,在一个 marketplace(电商平台)中,像 Electronics ( 电子产品 )或 Fashion ( 时尚 )这样的类别,可能会有特定的 filters,比如 brand ( 品牌 )、 format ( 规格 )和 price range ( 价格范围)。这些规则是静态创建的,可以保证搜索体验的一致性,但每当有新的产品或类别出现时,就需要进行手动调整。
尽管这种方法能提供对所显示的 filters 和 facets 的可预测性和控制力,但当新的趋势出现、用户需求发生变化时,它在动态精炼方面可能会显得力不从心。
优点(Pros):
-
可预测性和可控性(Predictability and control):由于 filters 和 facets 是手动定义的,因此更容易进行管理。
-
低复杂性(Low complexity):不需要训练模型,整体实现简单。
-
维护方便(Ease of maintenance):规则是预设的,因此在需要调整或修正时可以迅速完成。
缺点(Cons):
-
添加新筛选器需重新索引:每当需要使用一个新的属性作为 filter 时,必须对整个数据集进行重新索引,以确保所有文档都包含该信息。
-
缺乏动态适应性:filters 是静态的,不能根据用户行为的变化自动进行调整。
使用经典方法实现 Filters/Facets
在 Dev Tools 和 Kibana 中,我们将通过一个示例来展示如何使用经典方法实现 filters 和 facets。
首先,我们需要定义 mapping(映射),来为索引结构设定字段类型:
json
`
1. PUT videogames
2. {
3. "mappings": {
4. "properties": {
5. "name": { "type": "text" },
6. "brand": { "type": "keyword" },
7. "storage": { "type": "keyword" },
8. "price": { "type": "float" },
9. "description": { "type": "text" }
10. }
11. }
12. }
`AI写代码
在上一步中,我们将 brand 和 storage_capacity 字段设置为 keyword
类型,使它们可以直接用于 aggregations (聚合,也就是 facets)。
而 price 字段是 float
类型,这使我们能够根据价格范围创建动态筛选项。
bash
`
1. POST videogames/_bulk
2. { "index": { "_id": 1 } }
3. { "name": "Play Station 5", "brand": "Sony", "storage": "1TB", "price": 499.99, "description": "Stunning Gaming: Marvel at stunning graphics and experience the features of the new PS5. Breathtaking Immersion: Discover a deeper gaming experience with support for haptic feedback, adaptive triggers, and 3D Audio technology. Slim Design: With the PS5 Digital Edition, gamers get powerful gaming technology in a sleek, compact design. 1TB of Storage: Have your favorite games ready and waiting for you to play with 1TB of built-in SSD storage. Backward Compatibility and Game Boost: The PS5 console can play over 4,000 PS4 games. With Game Boost, you can even enjoy faster, smoother frame rates in some of the best PS4 console games." }
4. { "index": { "_id": 2 } }
5. { "name": "Xbox Series X", "brand": "Microsoft", "storage": "1TB", "price": 499.99, "description": "Fastest, most powerful Xbox console ever. Play thousands of titles: Every game looks and plays better on Xbox Series X. At the heart of Series X is the Xbox Velocity. Architecture, which combines a custom SSD and built-in software to significantly reduce load times in and out of game. Switch between multiple games in an instant with Quick Resume. Explore new worlds and experience the action like never before with an unparalleled 12 teraflops of graphics processing power. Enjoy 4K gaming at up to 120 frames per second, premium advanced 3D sound, and more. 4K at 120 FPS: requires compatible content and display X version - with disc drive" }
6. { "index": { "_id": 3 } }
7. { "name": "Nintendo Switch", "brand": "Nintendo", "storage": "512GB", "price": 299.99, "description": "SHARPER, VIBRANT VISUALS. The new 7-inch screen on the Nintendo Switch OLED takes your gaming to the next level: vibrant colors with sharp contrasts for every moment. INTEGRATED GAMEPLAY. Enjoy the console's many multiplayer modes and connect with other players. Online or locally, the fun on the Nintendo Switch is guaranteed. ENJOY IMMERSION FOR LONGER. In addition to delivering an unparalleled experience, thanks to its improved audio, the Nintendo Switch has a rechargeable battery while you play. From 4.5 hours to 9 hours of battery life. INCLUDES SUPER MARIO BROS. WONDER. Transform your world with the phenomenal flowers in this new Mario game, full of amazing adventures, power-ups and new abilities. NINTENDO SWITCH ONLINE SUBSCRIPTION. Access online games, play with friends and enjoy the exclusive benefits of the Nintendo Switch Online subscription." }
8. { "index": { "_id": 4 } }
9. { "name": "Steam Deck", "brand": "Valve", "storage": "512GB", "price": 399.99, "description": "You can save games, apps, photos and videos without worrying about space. High-Level Performance: The 4-core processor and graphics ensure a dynamic experience and fast responses. High-Definition Images: Smooth transitions and sharp images provide complete immersion in the game. Wireless Connectivity: Wi-Fi technology allows you to play wherever you want, without wires or cables limiting your fun" }
10. { "index": { "_id": 5 } }
11. { "name": "Nintendo Switch Lite", "brand": "Nintendo", "storage": "512GB", "price": 299.99, "description": "MADE TO BE PORTABLE. Nintendo Switch Lite is designed specifically for portable gaming. The console lets you jump into your favorite games wherever you are. COMPACT AND LIGHTWEIGHT. With its sleek, lightweight design, this console is ready to hit the road wherever you are. COMPATIBLE GAMES. The Nintendo Switch Lite system plays the library of Nintendo Switch games that work in handheld mode. A WORLD OF COLOR TO CHOOSE FROM. Available in a variety of vibrant and unique colors, Nintendo Switch Lite lets you bring even more personality wherever you go." }
`AI写代码
css
`
1. POST videogames/_search
2. {
3. "size": 0,
4. "aggs": {
5. "brands": {
6. "terms": { "field": "brand" }
7. },
8. "storage_sizes": {
9. "terms": { "field": "storage" }
10. },
11. "price_ranges": {
12. "range": {
13. "field": "price",
14. "ranges": [
15. { "to": 300 },
16. { "from": 300, "to": 500 },
17. { "from": 500 }
18. ]
19. }
20. }
21. }
22. }
`AI写代码
响应将包含 Brand ( 品牌 )、 Storage ( 存储容量 )和 Price ( 价格)的计数,有助于创建 filters(筛选器)和 facets(分类)。
json
`
1. "aggregations": {
2. "brands": {
3. "doc_count_error_upper_bound": 0,
4. "sum_other_doc_count": 0,
5. "buckets": [
6. {
7. "key": "Microsoft",
8. "doc_count": 1
9. },
10. {
11. "key": "Nintendo",
12. "doc_count": 1
13. },
14. {
15. "key": "Sony",
16. "doc_count": 1
17. },
18. {
19. "key": "Valve",
20. "doc_count": 1
21. }
22. ]
23. },
24. "storage_sizes": {
25. "doc_count_error_upper_bound": 0,
26. "sum_other_doc_count": 0,
27. "buckets": [
28. {
29. "key": "1TB",
30. "doc_count": 2
31. },
32. {
33. "key": "512GB",
34. "doc_count": 2
35. }
36. ]
37. },
38. "price_ranges": {
39. "buckets": [
40. {
41. "key": "*-300.0",
42. "to": 300,
43. "doc_count": 1
44. },
45. {
46. "key": "300.0-500.0",
47. "from": 300,
48. "to": 500,
49. "doc_count": 3
50. },
51. {
52. "key": "500.0-*",
53. "from": 500,
54. "doc_count": 0
55. }
56. ]
57. }
58. }
`AI写代码
基于机器学习/人工智能的方法
在这种方法中,Machine Learning (机器学习,简称 ML )模型和 Artificial Intelligence (人工智能,简称 AI)技术会分析数据属性,从而动态生成相关的 filters 和 facets。与依赖预设规则的方法不同,ML/AI 是基于已索引的数据特征来工作的,这使得系统能够自动发现新的筛选项和分类方式。
优点(Pros):
-
自动更新(Automatic updates):新的 filters 和 facets 会自动生成,无需人工手动调整。
-
发现新属性(Discovery of new attributes):可以识别那些之前未被考虑的、有价值的数据特征作为筛选器,丰富搜索体验。
-
减少人工工作量(Reduced manual effort):团队无需持续定义和维护过滤规则,AI 会从现有数据中学习。
缺点(Cons):
-
维护复杂性(Maintenance complexity):模型生成的结果需要预先验证,以确保 filters 的一致性和合理性。
-
需要 ML/AI 专业知识(Requires ML and AI expertise):该方案需要具备专业知识的人员来优化模型、监控效果。
-
生成无关过滤器的风险(Risk of irrelevant filters):如果模型校准不当,可能会生成对用户无用甚至误导的 facets。
-
成本问题(Cost):使用 ML 和 AI 可能需要依赖第三方服务,增加运营成本。
值得注意的是,即便使用了经过良好训练的模型或高质量的 prompt(提示),生成的 facets 在最终展示给用户之前仍应经过审查。这一步验证可以是人工进行,也可以基于规则自动化完成,以确保内容的准确性和安全性。虽然这并不一定是缺点,但它是保障 facet 质量和适用性的重要环节。
使用 AI 方法实现 Filters/Facets
在这个演示中,我们将使用一个 AI 模型自动分析产品特征,并建议相关的属性。通过一个结构良好的 prompt ,我们可以从产品目录中提取有意义的信息,并将其转化为适合展示的 filters 和 facets。以下是实现该流程的每一步。
bash
`
1. PUT _inference/completion/generate_filter_ia
2. {
3. "service": "openai",
4. "service_settings": {
5. "api_key": "your-key",
6. "model_id": "gpt-4o-mini"
7. }
8. }
`AI写代码
现在,我们定义一个 pipeline 来执行 prompt,并获取模型生成的新 filters(筛选器)。
sql
`
1. PUT /_ingest/pipeline/generate_filter_ai
2. {
3. "processors": [
4. {
5. "script": {
6. "source": """ctx.prompt = "You are an expert in data organization for search and product categorization. Your task is to analyze the following product and identify the best dynamic facets that can be used in an e-commerce search experience. Product: " + ctx.name + "description: " + ctx.description + "Instructions: - Analyze the product name and description. - Extract only the dynamic facets (technological features or product characteristics that can be inferred from the description, try to create max 3 facets by characteristics found). Put the values into an array. Using key and value, e.g. dynamic_facets: [{ \"name\": \"Gaming Experience\", \"value\": \"Haptic Feedback\" },{ \"name\": \"Gaming Experience\", \"value\": \"Adaptive Triggers\" } - Return only a JSON."
7. """
8. }
9. },
10. {
11. "inference": {
12. "model_id": "generate_filter_ia",
13. "input_output": {
14. "input_field": "prompt",
15. "output_field": "result"
16. }
17. }
18. },
19. {
20. "gsub": {
21. "field": "result",
22. "pattern": "```json",
23. "replacement": ""
24. }
25. },
26. {
27. "json" : {
28. "field" : "result",
29. "strict_json_parsing": false,
30. "add_to_root" : true
31. }
32. },
33. {
34. "remove": {
35. "field": "result"
36. }
37. },
38. {
39. "remove": {
40. "field": "prompt"
41. }
42. }
43. ]
44. }
`AI写代码
运行这个管道的模拟,针对 "PlayStation 5" 产品,使用以下描述:
sql
`1. Stunning Gaming: Marvel at stunning graphics and experience the features of the new PS5.
3. Breathtaking Immersion: Discover a deeper gaming experience with support for haptic feedback, adaptive triggers, and 3D Audio technology.
5. Slim Design: With the PS5 Digital Edition, gamers get powerful gaming technology in a sleek, compact design.
7. 1TB of Storage: Have your favorite games ready and waiting for you to play with 1TB of built-in SSD storage.
9. Backward Compatibility and Game Boost: The PS5 console can play over 4,000 PS4 games. With Game Boost, you can even enjoy faster, smoother frame rates in some of the best PS4 console games.` AI写代码
让我们观察从这个模拟生成的 prompt 输出。
arduino
`
1. {
2. "docs": [
3. {
4. "doc": {
5. "_index": "index",
6. "_version": "-3",
7. "_id": "1",
8. "_source": {
9. "name": "Play Station 5",
10. "result": """```json
11. {
12. "dynamic_facets": [
13. { "name": "Storage Capacity", "value": "1TB SSD" },
14. { "name": "Graphics Technology", "value": "Stunning Graphics" },
15. { "name": "Audio Technology", "value": "3D Audio" }
16. ]
17. }
18. ```""",
19. "description": "Stunning Gaming: Marvel at stunning graphics and experience the features of the new PS5. Breathtaking Immersion: Discover a deeper gaming experience with support for haptic feedback, adaptive triggers, and 3D Audio technology. Slim Design: With the PS5 Digital Edition, gamers get powerful gaming technology in a sleek, compact design. 1TB of Storage: Have your favorite games ready and waiting for you to play with 1TB of built-in SSD storage. Backward Compatibility and Game Boost: The PS5 console can play over 4,000 PS4 games. With Game Boost, you can even enjoy faster, smoother frame rates in some of the best PS4 console games.",
20. "model_id": "generate_filter_ia",
21. "prompt": """You are an expert in data organization for search and product categorization. Your task is to analyze the following product and identify the best dynamic facets that can be used in an e-commerce search experience. Product: Play Station 5description: Stunning Gaming: Marvel at stunning graphics and experience the features of the new PS5. Breathtaking Immersion: Discover a deeper gaming experience with support for haptic feedback, adaptive triggers, and 3D Audio technology. Slim Design: With the PS5 Digital Edition, gamers get powerful gaming technology in a sleek, compact design. 1TB of Storage: Have your favorite games ready and waiting for you to play with 1TB of built-in SSD storage. Backward Compatibility and Game Boost: The PS5 console can play over 4,000 PS4 games. With Game Boost, you can even enjoy faster, smoother frame rates in some of the best PS4 console games.Instructions: - Analyze the product name and description. - Extract only the dynamic facets (technological features or product characteristics that can be inferred from the description, try create max 3 facets by characteristics found). Put the values like arrays. Using key and value, e.g. dynamic_facets: [{ "name": "Gaming Experience", "value": "Haptic Feedback" },{ "name": "Gaming Experience", "value": "Adaptive Triggers" } - Return only a JSON."""
22. },
23. "_ingest": {
24. "timestamp": "2025-03-19T22:14:32.0161803Z"
25. }
26. }
27. }
28. ]
29. }
`AI写代码
为了象征 facets 的实现,下面是一个简单的前端:
这里是展示的 UI 代码。
结论
创建 filters 和 facets 的两种方法各有其优点和关注点。基于手动规则的经典方法提供了控制力和较低的成本,但需要不断更新,且无法动态适应新产品或新特性。
另一方面,基于 AI 和机器学习的方法自动提取 facets,使得搜索更加灵活,并允许在没有人工干预的情况下发现新属性。然而,这种方法的实现和维护可能更为复杂,需要进行校准以确保结果的一致性。
选择经典方法还是基于 AI 的方法取决于业务的需求和复杂性。在数据属性稳定且可预测的简单场景中,经典方法可能更高效、更易维护,避免了不必要的基础设施和 AI 模型的成本。另一方面,使用 ML/AI 提取 facets 可以增加显著价值,改善搜索体验,使过滤更加智能。
重要的是评估自动化是否值得投资,或者更传统的解决方案是否已经有效地满足业务需求。
Elasticsearch 拥有许多新功能,帮助你为你的使用场景构建最佳搜索解决方案。深入了解我们的示例笔记本,开始免费云试用,或者现在就尝试在本地机器上使用 Elastic。