Elasticsearch:Jira 连接器教程第二部分 - 6 个优化技巧

作者:来自 Elastic Gustavo Llermaly

将 Jira 连接到 Elasticsearch 后,我们现在将回顾最佳实践以升级此部署。

本系列的第一部分中,我们配置了 Jira 连接器并将对象索引到 Elasticsearch 中。在第二部分中,我们将回顾一些最佳实践和高级配置以升级连接器。这些实践是对当前文档的补充,将在索引阶段使用。

运行连接器只是第一步。当你想要索引大量数据时,每个细节都很重要,当你从 Jira 索引文档时,你可以使用许多优化点。

优化点

  1. 通过应用高级同步过滤器仅索引你需要的文档
  2. 仅索引你将使用的字段
  3. 根据你的需求优化映射
  4. 自动化文档级别安全性
  5. 卸载附件提取
  6. 监控连接器的日志

1. 通过应用高级同步过滤器仅索引你需要的文档

默认情况下,Jira 会发送所有项目、问题和附件。如果你只对其中一些感兴趣,或者例如只对 "In Progress - 正在进行" 的问题感兴趣,我们建议不要索引所有内容。

在将文档放入 Elasticsearch 之前,有三个实例可以过滤文档:

  1. 远程:我们可以使用原生 Jira 过滤器来获取我们需要的内容。这是最好的选择,你应该尽可能尝试使用此选项,因为这样,文档在进入 Elasticsearch 之前甚至不会从源中出来。我们将为此使用高级同步规则。
  2. 集成:如果源没有原生过滤器来提供我们需要的内容,我们仍然可以使用基本同步规则在集成级别进行过滤,然后再将其导入 Elasticsearch。
  3. 摄入管道:在索引数据之前处理数据的最后一个选项是使用 Elasticsearch 摄入管道(ingest pipeline)。通过使用 Painless 脚本,我们可以非常灵活地过滤或操作文档。这样做的缺点是数据已经离开源并通过连接器,因此可能会给系统带来沉重的负担并产生安全问题。

让我们快速回顾一下 Jira 问题:

复制代码
GET bank/_search
{
  "_source": ["Issue.status.name", "Issue.summary"],
  "query": {
    "exists": {
      "field": "Issue.status.name"
    }
  }
}

注意:我们使用 "exists" 查询仅返回具有我们过滤的字段的文档。

你可以看到 "To Do" 中有很多我们不需要的问题:

复制代码
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 6,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-1",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquer Mars",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquering Earth",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-2",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquer the moon",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Intergalactic Security and Compliance",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-2",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Bank Application Frontend",
            "status": {
              "name": "To Do"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-1",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Development of API for International Transfers",
            "status": {
              "name": "To Do"
            }
          }
        }
      }
    ]
  }
}

为了仅获取 "In Progress" 的问题,我们将使用 JQL 查询(Jira 查询语言)创建高级同步规则:

转到连接器并单击 sync rules 选项卡,然后单击 Draft Rules 。进入后,转到 Advanced Sync Rules并添加以下内容:

复制代码
  [
    {
      "query": "status IN ('In Progress')"
    }
  ]

应用规则后,运行 Full Content Sync

此规则将排除所有非 "In Progress" 的问题。你可以通过再次运行查询来检查:

复制代码
GET bank/_search
{
  "_source": ["Issue.status.name", "Issue.summary"],
  "query": {
    "exists": {
      "field": "Issue.status.name"
    }
  }
}

以下是新的回应:

复制代码
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_id": "Marketing Mars-MM-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Conquering Earth",
            "status": {
              "name": "In Progress"
            }
          }
        }
      },
      {
        "_index": "bank",
        "_id": "Galactic Banking Project-GBP-3",
        "_score": 1,
        "_source": {
          "Issue": {
            "summary": "Intergalactic Security and Compliance",
            "status": {
              "name": "In Progress"
            }
          }
        }
      }
    ]
  }
}

2. 仅索引你将使用的字段

现在我们只有我们想要的文档,你可以看到我们仍然会得到很多我们不需要的字段。我们可以在运行查询时使用 _source 隐藏它们,但最好的选择是不索引它们。

为此,我们将使用摄取管道(ingest pipeline)。我们可以创建一个删除所有我们不会使用的字段的管道。假设我们只想要来自问题的以下信息:

  • Assignee
  • Title
  • Status

我们可以创建一个新的摄取管道,仅使用摄取管道的 Content UI 获取这些字段:

单击复 Copy and customize ,然后修改名为 index-name@custom 的管道,该管道应该刚刚创建且为空。我们可以使用 Kibana DevTools 控制台执行此操作,运行以下命令:

复制代码
PUT _ingest/pipeline/bank@custom
{
  "description": "Only keep needed fields for jira issues and move them to root",
  "processors": [
    {
      "remove": {
        "keep": [
          "Issue.assignee.displayName",
          "Issue.summary",
          "Issue.status.name"
        ],
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.assignee.displayName",
        "target_field": "assignee",
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.summary",
        "target_field": "summary",
        "ignore_missing": true
      }
    },
    {
      "rename": {
        "field": "Issue.status.name",
        "target_field": "status",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "Issue"
      }
    }
  ]
}

让我们删除不需要的字段,并将需要的字段移至文档的根目录。

带有 keep 参数的 remove 处理器将从文档中删除除 keep 数组中的字段之外的所有字段。

我们可以通过运行模拟来检查这是否有效。从索引中添加其中一个文档的内容:

复制代码
POST /_ingest/pipeline/bank@custom/_simulate
{
  "docs": [
    {
      "_index": "bank",
      "_id": "Galactic Banking Project-GBP-3",
      "_score": 1,
      "_source": {
        "Type": "Epic",
        "Custom_Fields": {
          "Satisfaction": null,
          "Approvals": null,
          "Change reason": null,
          "Epic Link": null,
          "Actual end": null,
          "Design": null,
          "Campaign assets": null,
          "Story point estimate": null,
          "Approver groups": null,
          "[CHART] Date of First Response": null,
          "Request Type": null,
          "Campaign goals": null,
          "Project overview key": null,
          "Related projects": null,
          "Campaign type": null,
          "Impact": null,
          "Request participants": [],
          "Locked forms": null,
          "Time to first response": null,
          "Work category": null,
          "Audience": null,
          "Open forms": null,
          "Details": null,
          "Sprint": null,
          "Stakeholders": null,
          "Marketing asset type": null,
          "Submitted forms": null,
          "Start date": null,
          "Actual start": null,
          "Category": null,
          "Change risk": null,
          "Target start": null,
          "Issue color": "purple",
          "Parent Link": {
            "hasEpicLinkFieldDependency": false,
            "showField": false,
            "nonEditableReason": {
              "reason": "EPIC_LINK_SHOULD_BE_USED",
              "message": "To set an epic as the parent, use the epic link instead"
            }
          },
          "Format": null,
          "Target end": null,
          "Approvers": null,
          "Team": null,
          "Change type": null,
          "Satisfaction date": null,
          "Request language": null,
          "Amount": null,
          "Rank": "0|i0001b:",
          "Affected services": null,
          "Type": null,
          "Time to resolution": null,
          "Total forms": null,
          "[CHART] Time in Status": null,
          "Organizations": [],
          "Flagged": null,
          "Project overview status": null
        },
        "Issue": {
          "statuscategorychangedate": "2024-11-07T16:59:54.786-0300",
          "issuetype": {
            "avatarId": 10307,
            "hierarchyLevel": 1,
            "name": "Epic",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issuetype/10008",
            "description": "Epics track collections of related bugs, stories, and tasks.",
            "entityId": "f5637521-ec75-48b8-a1b8-de18520807ca",
            "id": "10008",
            "iconUrl": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/issuetype/avatar/10307?size=medium",
            "subtask": false
          },
          "components": [],
          "timespent": null,
          "timeoriginalestimate": null,
          "project": {
            "simplified": true,
            "avatarUrls": {
              "48x48": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415",
              "24x24": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=small",
              "16x16": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=xsmall",
              "32x32": "https://tomasmurua.atlassian.net/rest/api/2/universal_avatar/view/type/project/avatar/10415?size=medium"
            },
            "name": "Galactic Banking Project",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/project/10001",
            "id": "10001",
            "projectTypeKey": "software",
            "key": "GBP"
          },
          "description": null,
          "fixVersions": [],
          "aggregatetimespent": null,
          "resolution": null,
          "timetracking": {},
          "security": null,
          "aggregatetimeestimate": null,
          "attachment": [],
          "resolutiondate": null,
          "workratio": -1,
          "summary": "Intergalactic Security and Compliance",
          "watches": {
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/watchers",
            "isWatching": true,
            "watchCount": 1
          },
          "issuerestriction": {
            "issuerestrictions": {},
            "shouldDisplay": true
          },
          "lastViewed": "2024-11-08T02:04:25.247-0300",
          "creator": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "subtasks": [],
          "created": "2024-10-29T15:52:40.306-0300",
          "reporter": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "aggregateprogress": {
            "total": 0,
            "progress": 0
          },
          "priority": {
            "name": "Medium",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/priority/3",
            "iconUrl": "https://tomasmurua.atlassian.net/images/icons/priorities/medium.svg",
            "id": "3"
          },
          "labels": [],
          "environment": null,
          "timeestimate": null,
          "aggregatetimeoriginalestimate": null,
          "versions": [],
          "duedate": null,
          "progress": {
            "total": 0,
            "progress": 0
          },
          "issuelinks": [],
          "votes": {
            "hasVoted": false,
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/GBP-3/votes",
            "votes": 0
          },
          "comment": {
            "total": 0,
            "comments": [],
            "maxResults": 0,
            "self": "https://tomasmurua.atlassian.net/rest/api/2/issue/10008/comment",
            "startAt": 0
          },
          "assignee": {
            "accountId": "712020:88983800-6c97-469a-9451-79c2dd3732b5",
            "emailAddress": "contornan_cliche.0y@icloud.com",
            "avatarUrls": {
              "48x48": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "24x24": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "16x16": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png",
              "32x32": "https://secure.gravatar.com/avatar/f098101294d1a0da282bb2388df8c257?d=https%3A%2F%2Favatar-management--avatars.us-west-2.prod.public.atl-paas.net%2Finitials%2FTM-3.png"
            },
            "displayName": "Tomas Murua",
            "accountType": "atlassian",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/user?accountId=712020%3A88983800-6c97-469a-9451-79c2dd3732b5",
            "active": true,
            "timeZone": "Chile/Continental"
          },
          "worklog": {
            "total": 0,
            "maxResults": 20,
            "startAt": 0,
            "worklogs": []
          },
          "updated": "2024-11-07T16:59:54.786-0300",
          "status": {
            "name": "In Progress",
            "self": "https://tomasmurua.atlassian.net/rest/api/2/status/10004",
            "description": "",
            "iconUrl": "https://tomasmurua.atlassian.net/",
            "id": "10004",
            "statusCategory": {
              "colorName": "yellow",
              "name": "In Progress",
              "self": "https://tomasmurua.atlassian.net/rest/api/2/statuscategory/4",
              "id": 4,
              "key": "indeterminate"
            }
          }
        },
        "id": "Galactic Banking Project-GBP-3",
        "_timestamp": "2024-11-07T16:59:54.786-0300",
        "Key": "GBP-3",
        "_allow_access_control": [
          "account_id:63c04b092341bff4fff6e0cb",
          "account_id:712020:88983800-6c97-469a-9451-79c2dd3732b5",
          "name:Gustavo",
          "name:Tomas-Murua"
          ]
      }
    }
    ]
}

响应将是:

复制代码
{
  "docs": [
    {
      "doc": {
        "_index": "bank",
        "_version": "-3",
        "_id": "Galactic Banking Project-GBP-3",
        "_source": {
          "summary": "Intergalactic Security and Compliance",
          "assignee": "Tomas Murua",
          "status": "In Progress"
        },
        "_ingest": {
          "timestamp": "2024-11-10T06:58:25.494057572Z"
        }
      }
    }
  ]
}

这看起来好多了!现在,让我们运行full content sync 来应用更改。

3. 根据你的需求优化映射

文档很干净。但是,我们可以进一步优化。我们可以进入 "it depends" 的领域。有些映射可以适用于你的用例,而其他映射则不行。找出答案的最佳方法是进行实验。

假设我们测试并得到了这个映射设计:

  • assignee:全文搜索和过滤器
  • summary:全文搜索
  • status:过滤器和排序

默认情况下,连接器将使用 dynamic_templates 创建映射,这些映射将配置所有文本字段以进行全文搜索、过滤和排序,这是一个坚实的基础,但如果我们知道我们想要用我们的字段做什么,它可以进行优化。

这是规则:

复制代码
{
  "all_text_fields": {
    "match_mapping_type": "string",
    "mapping": {
      "analyzer": "iq_text_base",
      "fields": {
        "delimiter": {
          "analyzer": "iq_text_delimiter",
          "type": "text",
          "index_options": "freqs"
        },
        "joined": {
          "search_analyzer": "q_text_bigram",
          "analyzer": "i_text_bigram",
          "type": "text",
          "index_options": "freqs"
        },
        "prefix": {
          "search_analyzer": "q_prefix",
          "analyzer": "i_prefix",
          "type": "text",
          "index_options": "docs"
        },
        "enum": {
          "ignore_above": 2048,
          "type": "keyword"
        },
        "stem": {
          "analyzer": "iq_text_stem",
          "type": "text"
        }
      }
    }
  }
}

让我们为所有文本字段创建用于不同目的的不同子字段。你可以在文档中找到有关分析器的其他信息。

要使用这些映射,你必须:

  1. 在创建连接器之前创建索引
  2. 创建连接器时,选择该索引而不是创建新索引
  3. 创建摄取管道以获取所需的字段
  4. 运行 Full Content Sync*

*Full Content Sync 会将所有文档发送到 Elasticsearch。*Incremental Sync*只会将上次增量或完整内容同步后更改的文档发送到 Elasticsearch。这两种方法都将从数据源获取所有数据。

我们的优化映射如下:

复制代码
PUT bank-optimal
{
  "mappings": {
    "properties": {
      "assignee": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "summary": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "status": {
        "type": "keyword"
      }
    }
  }
}

对于 assignee ,我们保留了原有的映射,因为我们希望此字段针对搜索和过滤器进行优化。对于 summary ,我们删除了 "enum" 关键字字段,因为我们不打算过滤摘要。我们将 status 映射为关键字,因为我们只打算过滤该字段。

注意:如果你不确定如何使用字段,基线分析器应该没问题。

4. 自动化文档级安全性

在第一部分中,我们学习了使用文档级安全性 (Document Level Security - DLS) 为用户手动创建 API 密钥并根据该密钥限制访问权限。但是,如果你想在每次用户访问我们的网站时自动创建具有权限的 API 密钥,则需要创建一个脚本来接收请求,使用用户 ID 生成 API 密钥,然后使用它在 Elasticsearch 中搜索。

这是 Python 中的参考文件:

复制代码
import os
import requests
class ElasticsearchKeyGenerator:
   def __init__(self):
       self.es_url = "https://xxxxxxx.es.us-central1.gcp.cloud.es.io" # Your Elasticsearch URL
       self.es_user = "" # Your Elasticsearch User
       self.es_password = "" # Your Elasticsearch password

       # Basic configuration for requests
       self.auth = (self.es_user, self.es_password)
       self.headers = {'Content-Type': 'application/json'}

   def create_api_key(self, user_id, index, expiration='1d', metadata=None):
       """
       Create an Elasticsearch API key for a single index with user-specific filters.

       Args:
           user_id (str): User identifier on the source system
           index (str): Index name
           expiration (str): Key expiration time (default: '1d')
           metadata (dict): Additional metadata for the API key

       Returns:
           str: Encoded API key if successful, None if failed
       """
       try:
           # Get user-specific ACL filters
           acl_index = f'.search-acl-filter-{index}'
           response = requests.get(
               f'{self.es_url}/{acl_index}/_doc/{user_id}',
               auth=self.auth,
               headers=self.headers
           )
           response.raise_for_status()

           # Build the query
           query = {
               'bool': {
                   'must': [
                       {'term': {'_index': index}},
                       response.json()['_source']['query']
                   ]
               }
           }

           # Set default metadata if none provided
           if not metadata:
               metadata = {'created_by': 'create-api-key'}

           # Prepare API key request body
           api_key_body = {
               'name': user_id,
               'expiration': expiration,
               'role_descriptors': {
                   f'jira-role': {
                       'index': [{
                           'names': [index],
                           'privileges': ['read'],
                           'query': query
                       }]
                   }
               },
               'metadata': metadata
           }

           print(api_key_body)

           # Create API key
           api_key_response = requests.post(
               f'{self.es_url}/_security/api_key',
               json=api_key_body,
               auth=self.auth,
               headers=self.headers
           )
           api_key_response.raise_for_status()

           return api_key_response.json()['encoded']

       except requests.exceptions.RequestException as e:
           print(f"Error creating API key: {str(e)}")
           return None

# Example usage
if __name__ == "__main__":
   key_generator = ElasticsearchKeyGenerator()

   encoded_key = key_generator.create_api_key(
       user_id="63c04b092341bff4fff6e0cb", # User id on Jira
       index="bank",
       expiration="1d",
       metadata={
           "application": "my-search-app",
           "namespace": "dev",
           "foo": "bar"
       }
   )

   if encoded_key:
       print(f"Generated API key: {encoded_key}")
   else:
       print("Failed to generate API key")

你可以在每个 API 请求上调用此 create_api_key 函数来生成 API 密钥,用户可以在后续请求中使用该密钥查询 Elasticsearch。你可以设置到期时间,还可以设置任意元数据,以防你想要注册有关用户或生成密钥的 API 的一些信息。

5. 卸载附件提取

对于内容提取,例如从 PDF 和 Powerpoint 文件中提取文本,Elastic 提供了一种开箱即用的服务,该服务运行良好,但有大小限制。

默认情况下,本机连接器的提取服务支持每个附件最大 10MB。如果你有更大的附件,例如里面有大图像的 PDF,或者你想要托管提取服务,Elastic 提供了一个工具,可让你部署自己的提取服务。

此选项仅与连接器客户端兼容,因此如果你使用的是本机连接器,则需要将其转换为连接器客户端并将其托管在你自己的基础架构中。

请按照以下步骤操作:

a. 配置自定义提取服务并使用 Docker 运行

复制代码
docker run \
  -p 8090:8090 \
  -it \
  --name extraction-service \
  docker.elastic.co/enterprise-search/data-extraction-service:$EXTRACTION_SERVICE_VERSION

EXTRACTION_SERVICE_VERSION 你应该使用 Elasticsearch 8.15 的 0.3.x。

b. 配置 yaml con 提取服务自定义并运行

转到连接器客户端并将以下内容添加到 config.yml 文件以使用提取服务:

复制代码
extraction_service:
  host: http://localhost:8090

c. 按照步骤运行连接器客户端

配置完成后,你可以使用要使用的连接器运行连接器客户端。

复制代码
docker run \
-v "</absolute/path/to>/connectors-config:/config" \ # NOTE: change absolute path to match where config.yml is located on your machine
--tty \
--rm \
docker.elastic.co/enterprise-search/elastic-connectors:{version}.0 \
/app/bin/elastic-ingest \
-c /config/config.yml # Path to your configuration file in the container

你可以参考文档中的完整流程。

6. 监控连接器的日志

在出现问题时,查看连接器的日志非常重要,Elastic 提供了开箱即用的功能。

第一步是在集群中激活日志记录。建议将日志发送到其他集群(监控部署),但在开发环境中,你也可以将日志发送到索引文档的同一集群。

默认情况下,连接器会将日志发送到 elastic-cloud-logs-8 索引。如果你使用的是 Cloud,则可以在新的 Logs Explorer 中检查日志:

结论

在本文中,我们了解了在生产环境中使用连接器时需要考虑的不同策略。优化资源、自动化安全性和集群监控是正确运行大型系统的关键机制。

想要获得 Elastic 认证?了解下一期 Elasticsearch 工程师培训的时间!

Elasticsearch 包含许多新功能,可帮助你为你的用例构建最佳搜索解决方案。深入了解我们的示例笔记本以了解更多信息,开始免费云试用,或立即在你的本地机器上试用 Elastic。

原文:Jira connector tutorial part II: 6 optimization tips - Elasticsearch Labs

相关推荐
666和7775 分钟前
Struts2 工作总结
java·数据库
还听珊瑚海吗10 分钟前
SpringMVC(一)
数据库
数智顾问1 小时前
【73页PPT】美的简单高效的管理逻辑(附下载方式)
大数据·人工智能·产品运营
和科比合砍81分1 小时前
ES模块(ESM)、CommonJS(CJS)和UMD三种格式
大数据·elasticsearch·搜索引擎
星期天要睡觉1 小时前
MySQL 综合练习
数据库·mysql
Y4090011 小时前
数据库基础知识——聚合函数、分组查询
android·数据库
瓦哥架构实战2 小时前
从 Prompt 到 Context:LLM OS 时代的核心工程范式演进
大数据
JosieBook2 小时前
【数据库】MySQL 数据库创建存储过程及使用场景详解
数据库·mysql