Elasticsearch:使用 LangChain 文档拆分器进行文档分块

使用 Elasticsearch 嵌套密集向量支持

这个交互式笔记本将:

  • 将模型 "sentence-transformers__all-minilm-l6-v2" 从 Hugging Face 加载到 Elasticsearch ML Node 中
  • 使用 LangChain 分割器将段落分块成句子,并使用嵌套密集向量将它们索引到 Elasticsearch 中
  • 执行搜索并返回包含最相关段落的文档

依赖关系

在本笔记本中,我们将使用 Langchain 和 Elasticsearch python 客户端。

我们还需要一个正在运行的 Elasticsearch 实例,并在其中部署了 ML 节点和模型。

python3 -m pip install -qU langchain elasticsearch eland load_dotenv jq

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考文章:

安装 Elasticsearch 及 Kibana

如果你还没有安装好自己的 Elasticsearch 及 Kibana,那么请参考一下的文章来进行安装:

在安装的时候,请选择 Elastic Stack 8.x 进行安装。在安装的时候,我们可以看到如下的安装信息:

环境变量

在启动 Jupyter 之前,我们设置如下的环境变量:

ini 复制代码
1.  export ES_USER="elastic"
2.  export ES_PASSWORD="xnLj56lTrH98Lf_6n76y"
3.  export ES_ENDPOINT="localhost"

请在上面修改相应的变量的值。

拷贝 Elasticsearch 证书

我们把 Elasticsearch 的证书拷贝到当前的目录下:

bash 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ cp ~/elastic/elasticsearch-8.12.0/config/certs/http_ca.crt .
4.  $ ls http_ca.crt 
5.  http_ca.crt

准备数据

我们在项目的根目录下创建如下的文件:

workplace-docs.json

swift 复制代码
1.  [
2.    {
3.      "content": "Effective: March 2020\nPurpose\n\nThe purpose of this full-time work-from-home policy is to provide guidelines and support for employees to conduct their work remotely, ensuring the continuity and productivity of business operations during the COVID-19 pandemic and beyond.\nScope\n\nThis policy applies to all employees who are eligible for remote work as determined by their role and responsibilities. It is designed to allow employees to work from home full time while maintaining the same level of performance and collaboration as they would in the office.\nEligibility\n\nEmployees who can perform their work duties remotely and have received approval from their direct supervisor and the HR department are eligible for this work-from-home arrangement.\nEquipment and Resources\n\nThe necessary equipment and resources will be provided to employees for remote work, including a company-issued laptop, software licenses, and access to secure communication tools. Employees are responsible for maintaining and protecting the company's equipment and data.\nWorkspace\n\nEmployees working from home are responsible for creating a comfortable and safe workspace that is conducive to productivity. This includes ensuring that their home office is ergonomically designed, well-lit, and free from distractions.\nCommunication\n\nEffective communication is vital for successful remote work. Employees are expected to maintain regular communication with their supervisors, colleagues, and team members through email, phone calls, video conferences, and other approved communication tools.\nWork Hours and Availability\n\nEmployees are expected to maintain their regular work hours and be available during normal business hours, unless otherwise agreed upon with their supervisor. Any changes to work hours or availability must be communicated to the employee's supervisor and the HR department.\nPerformance Expectations\n\nEmployees working from home are expected to maintain the same level of performance and productivity as if they were working in the office. Supervisors and team members will collaborate to establish clear expectations and goals for remote work.\nTime Tracking and Overtime\n\nEmployees are required to accurately track their work hours using the company's time tracking system. Non-exempt employees must obtain approval from their supervisor before working overtime.\nConfidentiality and Data Security\n\nEmployees must adhere to the company's confidentiality and data security policies while working from home. This includes safeguarding sensitive information, securing personal devices and internet connections, and reporting any security breaches to the IT department.\nHealth and Well-being\n\nThe company encourages employees to prioritize their health and well-being while working from home. This includes taking regular breaks, maintaining a work-life balance, and seeking support from supervisors and colleagues when needed.\nPolicy Review and Updates\n\nThis work-from-home policy will be reviewed periodically and updated as necessary, taking into account changes in public health guidance, business needs, and employee feedback.\nQuestions and Concerns\n\nEmployees are encouraged to direct any questions or concerns about this policy to their supervisor or the HR department.\n",
4.      "summary": "This policy outlines the guidelines for full-time remote work, including eligibility, equipment and resources, workspace requirements, communication expectations, performance expectations, time tracking and overtime, confidentiality and data security, health and well-being, and policy reviews and updates. Employees are encouraged to direct any questions or concerns",
5.      "name": "Work From Home Policy",
6.      "url": "./sharepoint/Work from home policy.txt",
7.      "created_on": "2020-03-01",
8.      "category": "teams",
9.      "_run_ml_inference": true,
10.      "rolePermissions": ["demo", "manager"]
11.    },
12.    {
13.      "content": "Starting May 2022, the company will be implementing a two-day in-office work requirement per week for all eligible employees. Please coordinate with your supervisor and HR department to schedule your in-office workdays while continuing to follow all safety protocols.\n",
14.      "summary": "Starting May 2022, employees will need to work two days a week in the office. Coordinate with your supervisor and HR department for these days while following safety protocols.",
15.      "name": "April Work From Home Update",
16.      "url": "./sharepoint/April work from home update.txt",
17.      "created_on": "2022-04-29",
18.      "category": "teams",
19.      "_run_ml_inference": true,
20.      "rolePermissions": ["demo", "manager"]
21.    },
22.    {
23.      "content": "As we continue to prioritize the well-being of our employees, we are making a slight adjustment to our hybrid work policy. Starting May 1, 2023, employees will be required to work from the office three days a week, with two days designated for remote work. Please communicate with your supervisor and HR department to establish your updated in-office workdays.\n",
24.      "summary": "Starting May 1, 2023, our hybrid work policy will require employees to work from the office three days a week and two days remotely.",
25.      "name": "Wfh Policy Update May 2023",
26.      "url": "./sharepoint/WFH policy update May 2023.txt",
27.      "created_on": "2023-05-01",
28.      "category": "teams",
29.      "_run_ml_inference": true,
30.      "rolePermissions": ["demo", "manager"]
31.    },
32.    {
33.      "content": "Executive Summary:\nThis sales strategy document outlines the key objectives, focus areas, and action plans for our tech company's sales operations in fiscal year 2024. Our primary goal is to increase revenue, expand market share, and strengthen customer relationships in our target markets.\n\nI. Objectives for Fiscal Year 2024\n\nIncrease revenue by 20% compared to fiscal year 2023.\nExpand market share in key segments by 15%.\nRetain 95% of existing customers and increase customer satisfaction ratings.\nLaunch at least two new products or services in high-demand market segments.\n\nII. Focus Areas\nA. Target Markets:\nContinue to serve existing markets with a focus on high-growth industries.\nIdentify and penetrate new markets with high potential for our products and services.\n\nB. Customer Segmentation:\nStrengthen relationships with key accounts and strategic partners.\nPursue new customers in underserved market segments.\nDevelop tailored offerings for different customer segments based on their needs and preferences.\n\nC. Product/Service Portfolio:\nOptimize the existing product/service portfolio by focusing on high-demand solutions.\nDevelop and launch innovative products/services in emerging technology areas.\nEnhance post-sales support and customer service to improve customer satisfaction.\n\nIII. Action Plans\nA. Sales Team Development:\nExpand the sales team to cover new markets and industries.\nProvide ongoing training to sales staff on product knowledge, sales techniques, and industry trends.\nImplement a performance-based incentive system to reward top performers.\n\nB. Marketing and Promotion:\nDevelop targeted marketing campaigns for different customer segments and industries.\nLeverage digital marketing channels to increase brand visibility and lead generation.\nParticipate in industry events and trade shows to showcase our products and services.\n\nC. Partner Ecosystem:\nStrengthen existing partnerships and establish new strategic alliances to expand market reach.\nCollaborate with partners on joint marketing and sales initiatives.\nProvide partner training and support to ensure they effectively represent our products and services.\n\nD. Customer Success:\nImplement a proactive customer success program to improve customer retention and satisfaction.\nDevelop a dedicated customer support team to address customer inquiries and concerns promptly.\nCollect and analyze customer feedback to identify areas for improvement in our products, services, and processes.\n\nIV. Monitoring and Evaluation\nEstablish key performance indicators (KPIs) to track progress toward our objectives.\nConduct regular sales team meetings to review performance, share best practices, and address challenges.\nConduct quarterly reviews of our sales strategy to ensure alignment with market trends and adjust as needed.\n\nBy following this sales strategy for fiscal year 2024, our tech company aims to achieve significant growth and success in our target markets, while also providing exceptional value and service to our customers.\n",
34.      "summary": "This sales strategy document outlines objectives, focus areas, and action plans for our tech company's sales operations in fiscal year 2024. Our primary goal is to increase revenue, expand market share, and strengthen customer relationships in our target markets. Focus areas include targeting new markets, segmenting customers, enhancing",
35.      "name": "Fy2024 Company Sales Strategy",
36.      "url": "./sharepoint/FY2024 Company Sales Strategy.txt",
37.      "category": "teams",
38.      "created_on": "2023-04-15",
39.      "_run_ml_inference": true,
40.      "rolePermissions": ["demo", "manager"]
41.    },
42.    {
43.      "content": "Purpose\n\nThe purpose of this vacation policy is to outline the guidelines and procedures for requesting and taking time off from work for personal and leisure purposes. This policy aims to promote a healthy work-life balance and encourage employees to take time to rest and recharge.\nScope\n\nThis policy applies to all full-time and part-time employees who have completed their probationary period.\nVacation Accrual\n\nFull-time employees accrue vacation time at a rate of [X hours] per month, equivalent to [Y days] per year. Part-time employees accrue vacation time on a pro-rata basis, calculated according to their scheduled work hours.\n\nVacation time will begin to accrue from the first day of employment, but employees are eligible to take vacation time only after completing their probationary period. Unused vacation time will be carried over to the next year, up to a maximum of [Z days]. Any additional unused vacation time will be forfeited.\nVacation Scheduling\n\nEmployees are required to submit vacation requests to their supervisor at least [A weeks] in advance, specifying the start and end dates of their vacation. Supervisors will review and approve vacation requests based on business needs, ensuring adequate coverage during the employee's absence.\n\nEmployees are encouraged to plan their vacations around the company's peak and non-peak periods to minimize disruptions. Vacation requests during peak periods may be subject to limitations and require additional advance notice.\nVacation Pay\n\nEmployees will receive their regular pay during their approved vacation time. Vacation pay will be calculated based on the employee's average earnings over the [B weeks] preceding their vacation.\nUnplanned Absences and Vacation Time\n\nIn the event of an unplanned absence due to illness or personal emergencies, employees may use their accrued vacation time, subject to supervisor approval. Employees must inform their supervisor as soon as possible and provide any required documentation upon their return to work.\nVacation Time and Termination of Employment\n\nIf an employee's employment is terminated, they will be paid out for any unused vacation time, calculated based on their current rate of pay.\nPolicy Review and Updates\n\nThis vacation policy will be reviewed periodically and updated as necessary, taking into account changes in labor laws, business needs, and employee feedback.\nQuestions and Concerns\n\nEmployees are encouraged to direct any questions or concerns about this policy to their supervisor or the HR department.\n",
44.      "summary": ": This policy outlines the guidelines and procedures for requesting and taking time off from work for personal and leisure purposes. Full-time employees accrue vacation time at a rate of [X hours] per month, equivalent to [Y days] per year. Vacation requests must be submitted to supervisors at least",
45.      "name": "Company Vacation Policy",
46.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ES6rw9bKZxVBobG1WUoJpikBF9Bhx1pw_GvJWbsg-Z_HNA?e=faSHVt",
47.      "created_on": "2018-04-15",
48.      "category": "sharepoint",
49.      "_run_ml_inference": true,
50.      "rolePermissions": ["demo", "manager"]
51.    },
52.    {
53.      "content": "This career leveling matrix provides a framework for understanding the various roles and responsibilities of Software Engineers, as well as the skills and experience required for each level. This matrix is intended to support employee development, facilitate performance evaluations, and provide a clear career progression path.\nJunior Software Engineer\n\nResponsibilities:\nCollaborate with team members to design, develop, and maintain software applications and components.\nWrite clean, well-structured, and efficient code following established coding standards.\nParticipate in code reviews, providing and receiving constructive feedback.\nTroubleshoot and resolve software defects and issues.\nAssist with the creation of technical documentation.\nContinuously learn and stay up-to-date with new technologies and best practices.\n\nSkills & Experience:\nBachelor\u2019s degree in Computer Science or a related field, or equivalent work experience.\nBasic understanding of software development principles and methodologies.\nProficiency in at least one programming language.\nStrong problem-solving and analytical skills.\nEffective communication and collaboration skills.\nEagerness to learn and grow within the field.\nSenior Software Engineer\n\nResponsibilities:\nDesign, develop, and maintain complex software applications and components.\nLead and mentor junior team members in software development best practices and techniques.\nConduct code reviews and ensure adherence to coding standards and best practices.\nCollaborate with cross-functional teams to define, design, and deliver software solutions.\nIdentify, troubleshoot, and resolve complex software defects and issues.\nContribute to the creation and maintenance of technical documentation.\nEvaluate and recommend new technologies, tools, and practices to improve software quality and efficiency.\n\nSkills & Experience:\nBachelor\u2019s degree in Computer Science or a related field, or equivalent work experience.\n5+ years of software development experience.\nProficiency in multiple programming languages and technologies.\nDemonstrated ability to design and implement complex software solutions.\nStrong leadership, mentoring, and collaboration skills.\nExcellent problem-solving, analytical, and communication skills.\nPrincipal Software Engineer\n\nResponsibilities:\nLead the design, development, and maintenance of large-scale, mission-critical software applications and components.\nProvide technical leadership and mentorship to software engineering teams.\nDrive the adoption of advanced software development practices and technologies.\nCollaborate with product management, architecture, and other stakeholders to define and deliver strategic software initiatives.\nIdentify, troubleshoot, and resolve the most complex software defects and issues.\nCreate and maintain technical documentation, including architectural designs and best practice guidelines.\nRepresent [Company Name] as a thought leader in the software engineering community, including speaking at conferences, publishing articles, and contributing to open-source projects.\n\nSkills & Experience:\nBachelor\u2019s degree in Computer Science or a related field, or equivalent work experience.\n10+ years of software development experience, with a focus on large-scale, mission-critical applications.\nExpertise in multiple programming languages, technologies, and software development methodologies.\nProven ability to lead and mentor high-performing software engineering teams.\nExceptional problem-solving, analytical, and communication skills.\nStrong business acumen and ability to influence decision-making at the executive level.\n\nBy following this career leveling matrix, we aim to support the growth and development of Software Engineers, enabling them to reach their full potential and contribute meaningfully to the success of the organization.\n",
54.      "summary": "\nThis career leveling matrix provides a framework for understanding the various roles and responsibilities of Software Engineers, as well as the skills and experience required for each level. It is intended to support employee development, facilitate performance evaluations, and provide a clear career progression path.",
55.      "name": "Swe Career Matrix",
56.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EVYuEyRhHh5Aqc3a39sqbGcBkqKIHRWtJBjjUjNs6snpMg?e=nv1mf4",
57.      "created_on": "2018-04-15",
58.      "category": "sharepoint",
59.      "_run_ml_inference": true,
60.      "rolePermissions": ["demo", "manager"]
61.    },
62.    {
63.      "content": "Title: Working with the Sales Team as an Engineer in a Tech Company\n\nIntroduction:\nAs an engineer in a tech company, collaboration with the sales team is essential to ensure the success of the company's products and services. This guidance document aims to provide an overview of how engineers can effectively work with the sales team, fostering a positive and productive working environment.\nUnderstanding the Sales Team's Role:\nThe sales team is responsible for promoting and selling the company's products and services to potential clients. Their role involves establishing relationships with customers, understanding their needs, and ensuring that the offered solutions align with their requirements.\n\nAs an engineer, it is important to understand the sales team's goals and objectives, as this will help you to provide them with the necessary information, tools, and support to successfully sell your company's products and services.\nCommunication:\nEffective communication is key to successfully working with the sales team. Make sure to maintain open lines of communication, and be responsive to their questions and concerns. This includes:\n\na. Attending sales meetings and conference calls when required.\nb. Providing regular product updates and training sessions to the sales team.\nc. Being available to answer technical questions and clarifications.\nCollaboration:\nCollaborate with the sales team in developing and refining sales materials, such as product presentations, demos, and technical documents. This will ensure that the sales team has accurate and up-to-date information to present to clients.\n\nAdditionally, work closely with the sales team on customer projects or product customizations, providing technical guidance, and ensuring that the solutions meet the customer's requirements.\nCustomer Engagement:\nAt times, engineers may be asked to join sales meetings or calls with potential clients to provide technical expertise. In these situations, it is important to:\n\na. Be prepared and understand the customer's needs and pain points.\nb. Clearly explain the technical aspects of the product or solution in a simple language that the customer can understand.\nc. Address any concerns or questions the customer may have.\nContinuous Improvement:\nActively seek feedback from the sales team regarding product performance, customer experiences, and market trends. Use this feedback to identify areas of improvement and collaborate with other engineers to enhance the product or service offerings.\nMutual Respect and Support:\nIt is essential to treat your colleagues in the sales team with respect and professionalism. Recognize and appreciate their efforts in promoting and selling the company's products and services. In turn, the sales team should also respect and appreciate the technical expertise and knowledge of the engineering team.\n\nBy working together, both the engineering and sales teams can contribute to the overall success of the company.\n\nConclusion:\nCollaboration between engineers and the sales team is crucial for a tech company's success. By understanding each other's roles, maintaining effective communication, collaborating on projects, and supporting one another, both teams can work together to achieve the company's goals and ensure customer satisfaction.\n",
64.      "summary": ": This guide provides an overview of how engineers can effectively collaborate with the sales team to ensure the success of a tech company. It includes understanding the sales team's role, communicating and collaborating on projects, engaging customers, and providing mutual respect and support.",
65.      "name": "Sales Engineering Collaboration",
66.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EW21-KJnfHBFoRiF49_uJMcBfHyPKimuPOFsCcJypQWaBQ?e=mGdIqe",
67.      "created_on": "2019-04-15",
68.      "category": "sharepoint",
69.      "_run_ml_inference": true,
70.      "rolePermissions": ["demo", "manager"]
71.    },
72.    {
73.      "content": "Purpose\nThe purpose of this Intellectual Property Policy is to establish guidelines and procedures for the ownership, protection, and utilization of intellectual property generated by employees during their employment. This policy aims to encourage creativity and innovation while ensuring that the interests of both the company and its employees are protected.\n\nScope\nThis policy applies to all employees, including full-time, part-time, temporary, and contract employees.\n\nDefinitions\na. Intellectual Property (IP): Refers to creations of the mind, such as inventions, literary and artistic works, designs, symbols, and images, that are protected by copyright, trademark, patent, or other forms of legal protection.\nb. Company Time: Refers to the time during which an employee is actively engaged in performing their job duties.\nc. Outside Company Time: Refers to the time during which an employee is not engaged in performing their job duties.\n\nOwnership of Intellectual Property\na. Work Generated on Company Time\ni. Any intellectual property created, conceived, or developed by an employee during company time or using company resources, equipment, or facilities shall be considered the property of the Company.\nii. Employees are required to promptly disclose any such intellectual property to their supervisor or the appropriate department head.\nb. Work Generated Outside Company Time\ni. Intellectual property created, conceived, or developed by an employee outside of company time and without the use of company resources, equipment, or facilities shall generally remain the property of the employee.\nii. However, if the intellectual property is directly related to the employee's job responsibilities, or if the employee has used company resources, equipment, or facilities in its creation, it may be considered the property of the Company.\nProtection and Utilization of Intellectual Property\na. The Company shall have the right to protect, license, and commercialize any intellectual property owned by the company as it deems appropriate.\nb. Employees are expected to cooperate with the Company in obtaining any necessary legal protection for intellectual property owned by the company, including by signing any documents or providing any necessary information or assistance.\nConfidentiality\nEmployees are expected to maintain the confidentiality of any intellectual property owned by the Company and not disclose it to any third parties without the express written consent of an authorized representative of the company.\nEmployee Acknowledgment\nAll employees are required to sign an acknowledgment of this Intellectual Property Policy as a condition of their employment with [Company Name]. By signing the acknowledgment, employees agree to abide by the terms of this policy and understand that any violations may result in disciplinary action, up to and including termination of employment.\nPolicy Review\nThis Intellectual Property Policy shall be reviewed periodically and may be amended as necessary to ensure its continued effectiveness and compliance with applicable laws and regulations. Employees will be notified of any significant changes to this policy.\n",
74.      "summary": "This Intellectual Property Policy outlines guidelines and procedures for the ownership, protection, and utilization of intellectual property generated by employees during their employment. It establishes the company's ownership of work generated on company time, while recognizing employee ownership of work generated outside of company time without the use of company resources. The policy",
75.      "name": "Intellectual Property Policy",
76.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EWz3cYEVdzBNsiHsYbKhms4BVYGhravyrUw3T3lzxL4pTg?e=mPIgbO",
77.      "created_on": "2021-06-15",
78.      "category": "sharepoint",
79.      "_run_ml_inference": true,
80.      "rolePermissions": ["demo", "manager"]
81.    },
82.    {
83.      "content": "Code of Conduct\nPurpose\n\nThe purpose of this code of conduct is to establish guidelines for professional and ethical behavior in the workplace. It outlines the principles and values that all employees are expected to uphold in their interactions with colleagues, customers, partners, and other stakeholders.\nScope\n\nThis code of conduct applies to all employees, contractors, and volunteers within the organization, regardless of their role or seniority.\nCore Values\n\nEmployees are expected to adhere to the following core values:\n\na. Integrity: Act honestly, ethically, and in the best interests of the organization at all times.\nb. Respect: Treat all individuals with dignity, courtesy, and fairness, regardless of their background, beliefs, or position.\nc. Accountability: Take responsibility for one's actions and decisions, and be willing to learn from mistakes.\nd. Collaboration: Work cooperatively with colleagues and partners to achieve shared goals and promote a positive work environment.\ne. Excellence: Strive for the highest standards of performance and continuously seek opportunities for improvement.\nCompliance with Laws and Regulations\n\nEmployees must comply with all applicable laws, regulations, and organizational policies in the course of their work. This includes, but is not limited to, employment laws, data protection regulations, and industry-specific guidelines.\nConflicts of Interest\n\nEmployees should avoid situations where their personal interests may conflict with or influence their professional judgment. If a potential conflict of interest arises, employees must disclose it to their supervisor or the appropriate authority within the organization.\nConfidentiality and Information Security\n\nEmployees are responsible for safeguarding the organization's confidential information, as well as any sensitive information entrusted to them by clients, partners, or other third parties. This includes adhering to data protection policies and using secure communication channels.\nHarassment and Discrimination\n\nThe organization is committed to providing a workplace free from harassment, discrimination, and bullying. Employees are expected to treat others with respect and report any incidents of inappropriate behavior to their supervisor or the human resources department.\nHealth and Safety\n\nEmployees must follow all health and safety guidelines and procedures to maintain a safe and healthy work environment. This includes reporting any hazards or unsafe conditions to the appropriate personnel.\nUse of Company Resources\n\nEmployees are expected to use company resources, including time, equipment, and funds, responsibly and for their intended purposes. Misuse or theft of company resources is strictly prohibited.\nReporting Violations\n\nEmployees have a responsibility to report any suspected violations of this code of conduct, as well as any illegal or unethical behavior, to their supervisor or the appropriate authority within the organization. The organization will protect the confidentiality of employees who report violations and will not tolerate retaliation against those who raise concerns.\nConsequences of Non-Compliance\n\nFailure to adhere to this code of conduct may result in disciplinary action, up to and including termination of employment. The organization reserves the right to take legal action against individuals who engage in illegal or unethical conduct.\nPolicy Review and Updates\n\nThis code of conduct will be reviewed periodically and updated as necessary to ensure it remains relevant and effective in promoting ethical behavior and professional standards within the organization.\nQuestions and Concerns\n\nEmployees are encouraged to seek guidance from their supervisor or the human resources department if they have questions or concerns about this code of conduct or its application to specific situations.\n",
84.      "summary": "This code of conduct outlines the principles and values that all employees are expected to uphold in their interactions with colleagues, customers, partners, and other stakeholders. It sets out core values such as integrity, respect, accountability, collaboration and excellence. Employees must comply with all applicable laws, regulations, and organizational",
85.      "name": "Code Of Conduct",
86.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ER3xmeKaZ_pAqPeJWyyNR0QBg6QmoWIGPhwfEyCABWHrPA?e=cvzrgV",
87.      "created_on": "2018-01-12",
88.      "category": "sharepoint",
89.      "_run_ml_inference": true,
90.      "rolePermissions": ["demo", "manager"]
91.    },
92.    {
93.      "content": "Content:\nThe purpose of this office pet policy is to outline the guidelines and procedures for bringing pets into the workplace. This policy aims to create a positive and inclusive work environment while ensuring the comfort, safety, and well-being of all employees, visitors, and pets.\nScope\n\nThis policy applies to all employees who wish to bring their pets to the office. Pets covered under this policy include dogs, cats, and other small, non-exotic animals, subject to approval by the HR department.\nPet Approval Process\n\nEmployees must obtain prior approval from their supervisor and the HR department before bringing their pets to the office. The approval process includes:\n\na. Submitting a written request, including a description of the pet, its breed, age, and temperament.\nb. Providing proof of up-to-date vaccinations and any required licenses or permits.\nc. Obtaining written consent from all employees who share the workspace with the pet owner.\n\nThe HR department reserves the right to deny or revoke pet approval based on the specific circumstances or concerns raised by other employees.\nPet Behavior and Supervision\n\nEmployees are responsible for the behavior and well-being of their pets while in the office. Pets must be:\n\na. Well-behaved, non-aggressive, and not disruptive to the work environment.\nb. House-trained and able to eliminate waste in designated areas outside the office.\nc. Kept on a leash or in a secure enclosure when not in the employee's immediate work area.\n\nEmployees must closely supervise their pets and promptly address any issues or concerns raised by other staff members.\nAllergies and Phobias\n\nEmployees with allergies or phobias related to pets must inform the HR department, which will work with the affected employees and pet owners to find a suitable solution. This may include adjusting workspaces, limiting the number or types of pets allowed, or implementing additional safety measures.\nCleanliness and Hygiene\n\nEmployees are responsible for maintaining a clean and hygienic work environment. This includes:\n\na. Cleaning up after their pets, both indoors and outdoors.\nb. Regularly grooming their pets to minimize shedding and odors.\nc. Ensuring their pets are free of pests, such as fleas and ticks.\nLiability\n\nPet owners are liable for any damage or injury caused by their pets. Employees are encouraged to obtain pet liability insurance to cover potential incidents.\nRestricted Areas\n\nPets are not allowed in certain areas of the office, including meeting rooms, restrooms, kitchen and dining areas, and any other designated spaces. Signage will be posted to indicate these restricted areas.\nPolicy Review and Updates\n\nThis office pet policy will be reviewed periodically and updated as necessary, taking into account employee feedback, changes in legislation, and best practices for maintaining a safe and inclusive work environment.\nQuestions and Concerns\n\nEmployees are encouraged to direct any questions or concerns about this policy to their supervisor or the HR department.\n",
94.      "summary": "This policy outlines the guidelines and procedures for bringing pets into the workplace. It covers approval process, pet behavior and supervision, allergies and phobias, cleanliness and hygiene, liability, restricted areas, and policy review. Employees must obtain prior approval from their supervisor and the HR department before bringing their",
95.      "name": "Office Pet Policy",
96.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ETf-69wBeaZJpAn3CY7ExRABQWvav-p24VOnB6C0A4l2pQ?e=X72WuK",
97.      "created_on": "2018-01-12",
98.      "category": "sharepoint",
99.      "_run_ml_inference": true,
100.      "rolePermissions": ["demo", "manager"]
101.    },
102.    {
103.      "content": "Performance Management Policy\nPurpose and Scope\nThe purpose of this Performance Management Policy is to establish a consistent and transparent process for evaluating, recognizing, and rewarding employee performance. This policy applies to all employees and aims to foster a culture of continuous improvement, professional growth, and open communication between employees and management.\nPerformance Planning and Goal Setting\nAt the beginning of each performance cycle, employees and their supervisors will collaborate to set clear, achievable, and measurable performance goals. These goals should align with the company\u2019s strategic objectives and take into account the employee\u2019s job responsibilities, professional development, and career aspirations.\nOngoing Feedback and Communication\nThroughout the performance cycle, employees and supervisors are encouraged to engage in regular, constructive feedback and open communication. This includes discussing progress towards goals, addressing challenges, and identifying opportunities for improvement or additional support. Regular check-ins and updates help ensure that employees stay on track and receive the guidance they need to succeed.\nPerformance Evaluation\nAt the end of each performance cycle, employees will participate in a formal performance evaluation with their supervisor. This evaluation will assess the employee\u2019s overall performance, including their achievements, areas for improvement, and progress towards goals. Both the employee and supervisor should come prepared to discuss specific examples, accomplishments, and challenges from the performance period.\nPerformance Ratings\nBased on the performance evaluation, employees will receive a performance rating that reflects their overall performance during the cycle. The rating system should be clearly defined and consistently applied across the organization. Performance ratings will be used to inform decisions regarding promotions, salary increases, and other rewards or recognition.\nPromotions and Advancements\nHigh-performing employees who consistently demonstrate strong performance, leadership, and a commitment to the company\u2019s values may be considered for promotions or other advancement opportunities. Promotions will be based on factors such as performance ratings, skills, experience, and the needs of the organization. Employees interested in pursuing a promotion should discuss their career goals and development plans with their supervisor.\nPerformance Improvement Plans\nEmployees who receive a low performance rating or are struggling to meet their performance goals may be placed on a Performance Improvement Plan (PIP). A PIP is a structured plan designed to help the employee address specific areas of concern, set achievable improvement goals, and receive additional support or resources as needed. Employees on a PIP will be closely monitored and re-evaluated at the end of the improvement period to determine if satisfactory progress has been made.\nRecognition and Rewards\nOur company believes in recognizing and rewarding employees for their hard work and dedication. In addition to promotions and salary increases, employees may be eligible for other forms of recognition or rewards based on their performance. This may include bonuses, awards, or other incentives designed to motivate and celebrate employee achievements. The specific criteria and eligibility for these rewards will be communicated by the HR department or management.\n",
104.      "summary": "This Performance Management Policy outlines a consistent and transparent process for evaluating, recognizing, and rewarding employees. It includes goal setting, ongoing feedback, performance evaluations, ratings, promotions, and rewards. The policy applies to all employees and encourages open communication and professional growth.",
105.      "name": "Performance Management Policy",
106.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ERsxt9p1uehJqeJu4JlxkakBavbKwcldrYv_hpv3xHikAw?e=pf5R2C",
107.      "created_on": "2018-01-12",
108.      "category": "sharepoint",
109.      "_run_ml_inference": true,
110.      "rolePermissions": ["demo", "manager"]
111.    },
112.    {
113.      "content": "Our sales organization is structured to effectively serve our customers and achieve our business objectives across multiple regions. The organization is divided into the following main regions:\n\nThe Americas: This region includes the United States, Canada, Mexico, as well as Central and South America. The North America South America region (NASA) has two Area Vice-Presidents: Laura Martinez is the Area Vice-President of North America, and Gary Johnson is the Area Vice-President of South America.\n\nEurope: Our European sales team covers the entire continent, including the United Kingdom, Germany, France, Spain, Italy, and other countries. The team is responsible for understanding the unique market dynamics and cultural nuances, enabling them to effectively target and engage with customers across the region. The Area Vice-President for Europe is Rajesh Patel.\nAsia-Pacific: This region encompasses countries such as China, Japan, South Korea, India, Australia, and New Zealand. Our sales team in the Asia-Pacific region works diligently to capitalize on growth opportunities and address the diverse needs of customers in this vast and rapidly evolving market. The Area Vice-President for Asia-Pacific is Mei Li.\nMiddle East & Africa: This region comprises countries across the Middle East and Africa, such as the United Arab Emirates, Saudi Arabia, South Africa, and Nigeria. Our sales team in this region is responsible for navigating the unique market challenges and identifying opportunities to expand our presence and better serve our customers. The Area Vice-President for Middle East & Africa is Jamal Abdi.\n\nEach regional sales team consists of dedicated account managers, sales representatives, and support staff, led by their respective Area Vice-Presidents. They are responsible for identifying and pursuing new business opportunities, nurturing existing client relationships, and ensuring customer satisfaction. The teams collaborate closely with other departments, such as marketing, product development, and customer support, to ensure we consistently deliver high-quality products and services to our clients.\n",
114.      "summary": "\nOur sales organization is divided into four regions: The Americas, Europe, Asia-Pacific, and Middle East & Africa. Each region is led by an Area Vice-President and consists of dedicated account managers, sales representatives, and support staff. They collaborate with other departments to ensure the delivery of high",
115.      "name": "Sales Organization Overview",
116.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EYsr1eqgn9hMslMJFLR-k54BBX-O3iC26bK7xNEBtYIBkg?e=xeAjiT",
117.      "created_on": "2018-01-15",
118.      "category": "sharepoint",
119.      "_run_ml_inference": true,
120.      "rolePermissions": ["demo", "manager"]
121.    },
122.    {
123.      "content": "Introduction:\nThis document outlines the compensation bands strategy for the various teams within our IT company. The goal is to establish a fair and competitive compensation structure that aligns with industry standards, rewards performance, and attracts top talent. By implementing this strategy, we aim to foster employee satisfaction and retention while ensuring the company's overall success.\n\nPurpose:\nThe purpose of this compensation bands strategy is to:\na. Define clear guidelines for salary ranges based on job levels and market benchmarks.\nb. Support equitable compensation practices across different teams.\nc. Encourage employee growth and performance.\nd. Enable effective budgeting and resource allocation.\n\nJob Levels:\nTo establish a comprehensive compensation structure, we have defined distinct job levels within each team. These levels reflect varying degrees of skills, experience, and responsibilities. The levels include:\na. Entry-Level: Employees with limited experience or early career professionals.\nb. Intermediate-Level: Employees with moderate experience and demonstrated competence.\nc. Senior-Level: Experienced employees with advanced skills and leadership capabilities.\nd. Leadership-Level: Managers and team leaders responsible for strategic decision-making.\n\nCompensation Bands:\nBased on the job levels, the following compensation bands have been established:\na. Entry-Level Band: This band encompasses salary ranges for employees in entry-level positions. It aims to provide competitive compensation for individuals starting their careers within the company.\n\nb. Intermediate-Level Band: This band covers salary ranges for employees who have gained moderate experience and expertise in their respective roles. It rewards employees for their growing skill set and contributions.\n\nc. Senior-Level Band: The senior-level band includes salary ranges for experienced employees who have attained advanced skills and have a proven track record of delivering results. It reflects the increased responsibilities and expectations placed upon these individuals.\n\nd. Leadership-Level Band: This band comprises salary ranges for managers and team leaders responsible for guiding and overseeing their respective teams. It considers their leadership abilities, strategic thinking, and the impact they have on the company's success.\n\nMarket Benchmarking:\nTo ensure our compensation remains competitive, regular market benchmarking will be conducted. This involves analyzing industry salary trends, regional compensation data, and market demand for specific roles. The findings will inform periodic adjustments to our compensation bands to maintain alignment with the market.\n\nPerformance-Based Compensation:\nIn addition to the defined compensation bands, we emphasize a performance-based compensation model. Performance evaluations will be conducted regularly, and employees exceeding performance expectations will be eligible for bonuses, incentives, and salary increases. This approach rewards high achievers and motivates employees to excel in their roles.\n\nConclusion:\nBy implementing this compensation bands strategy, our IT company aims to establish fair and competitive compensation practices that align with market standards and foster employee satisfaction. Regular evaluations and market benchmarking will enable us to adapt and refine the strategy to meet the evolving needs of our organization.",
124.      "summary": "This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.",
125.      "name": "Compensation Framework For It Teams",
126.      "url": "https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EaAFec6004tAg21g4i67rfgBBRqCm1yY7AZLLQyyaMtsEQ?e=wTMb4z",
127.      "created_on": "2018-01-12",
128.      "category": "sharepoint",
129.      "restricted": true,
130.      "_run_ml_inference": true,
131.      "rolePermissions": ["manager"]
132.    },
133.    {
134.      "content": "As an employee in Canada, it's essential to understand how to update your tax elections forms to ensure accurate tax deductions from your pay. This guide will help you navigate the process of updating your TD1 Personal Tax Credits Return form.\n\nStep 1: Access the TD1 form\nThe TD1 form is available on the Canada Revenue Agency (CRA) website. Your employer might provide you with a paper copy or a link to the online form. You can access the form directly through the following link: https://www.canada.ca/en/revenue-agency/services/forms-publications/td1-personal-tax-credits-returns.html\n\nStep 2: Choose the correct form version\nYou'll need to fill out the federal TD1 form and, if applicable, the provincial or territorial TD1 form. Select the appropriate version based on your province or territory of residence.\n\nStep 3: Download and open the form\nFor the best experience, download and open the TD1 form in Adobe Reader. If you have visual impairments, consider using the large print version available on the CRA website.\n\nStep 4: Complete the form\nFill out the form by entering your personal information, such as your name, Social Insurance Number (SIN), and address. Then, go through each section to claim any personal tax credits that apply to you. These credits may include:\nBasic personal amount\nAmount for an eligible dependant\nAmount for infirm dependants age 18 or older\nCaregiver amount\nDisability amount\nTuition and education amounts\n\nRead the instructions carefully for each section to ensure you claim the correct amounts.\n\nStep 5: Sign and date the form\nOnce you've completed the form, sign and date it at the bottom.\n\nStep 6: Submit the form to your employer\nSubmit the completed and signed TD1 form to your employer. You can either scan and send it electronically, or provide a printed copy. Your employer will use the information on your TD1 form to calculate the correct amount of tax to be deducted from your pay.\n\nStep 7: Update your TD1 form as needed\nIt's essential to update your TD1 form whenever your personal circumstances change, such as getting married, having a child, or becoming eligible for a new tax credit. Inform your employer of these changes and submit an updated TD1 form to ensure accurate tax deductions.\n\nUpdating your tax elections forms is a crucial step in ensuring the correct tax deductions from your pay as a new employee in Canada. Follow this guide and keep your TD1 form up to date to avoid any discrepancies in your tax filings.\n",
135.      "summary": ": This guide gives a step-by-step explanation of how to update your TD1 Personal Tax Credits Return form. Access the form from the CRA website and choose the correct version based on your province or territory of residence. Download and open the form in Adobe Reader, fill out the form by entering",
136.      "name": "Updating Your Tax Elections Forms",
137.      "url": "./github/Updating Your Tax Elections Forms.txt",
138.      "created_on": "2022-12-20",
139.      "category": "github",
140.      "_run_ml_inference": true,
141.      "rolePermissions": ["demo", "manager"]
142.    },
143.    {
144.      "content": "Welcome to our team! We are excited to have you on board and look forward to your valuable contributions. This onboarding guide is designed to help you get started by providing essential information about our policies, procedures, and resources. Please read through this guide carefully and reach out to the HR department if you have any questions.\nIntroduction to Our Company Culture and Values\nOur company is committed to creating a diverse, inclusive, and supportive work environment. We believe that our employees are our most valuable asset and strive to foster a culture of collaboration, innovation, and continuous learning. Our core values include:\nIntegrity: We act ethically and honestly in all our interactions.\nTeamwork: We work together to achieve common goals and support each other's growth.\nExcellence: We strive for the highest quality in our products, services, and relationships.\nInnovation: We encourage creativity and embrace change to stay ahead in the market.\nRespect: We treat each other with dignity and value the unique perspectives of all our colleagues.\nKey Onboarding Steps\nTo ensure a smooth onboarding process, please complete the following steps within your first week:\nAttend orientation: You will be invited to an orientation session to meet your colleagues and learn more about our company's history, mission, and values.\nReview policies and procedures: Familiarize yourself with our employee handbook, which contains important information about our policies and procedures. Please read it thoroughly and adhere to the guidelines.\nComplete required training: You may be required to complete mandatory training sessions, such as safety training or anti-harassment training. Ensure that you attend and complete these sessions as soon as possible.\nUpdating Tax Elections and Documents\nIt is crucial to ensure your tax information is accurate and up-to-date, regardless of the country you work in. Please follow these steps to update your tax elections and documents:\nComplete tax forms: Fill out the necessary tax forms for your country or region, which determine the amount of income tax withheld from your paycheck. You should complete new tax forms if your personal or financial situation changes, such as marriage, divorce, or a change in the number of dependents.\nSubmit regional tax forms: Depending on your location, you may be required to complete additional regional or local tax forms. Check with the HR department to determine which forms are necessary.\nUpdate your address: If you move, make sure to update your address with the HR department to ensure accurate tax reporting.\nBenefits Enrollment\nAs a new employee, you are eligible for various benefits, including health insurance, retirement plans, and paid time off. You will receive detailed information about our benefits package during orientation. To enroll in the benefits, please follow these steps:\nReview benefits options: Carefully review the benefits package and choose the options that best meet your needs.\nComplete enrollment forms: Fill out the necessary forms to enroll in your chosen benefits. Submit these forms to the HR department within 30 days of your start date.\nDesignate beneficiaries: If applicable, designate beneficiaries for your life insurance and retirement plans.\nGetting Settled in Your Workspace\nTo help you feel comfortable and productive in your new workspace, take the following steps:\nSet up your workstation: Organize your desk, chair, and computer according to your preferences. If you require any additional equipment or accommodations, please contact the HR department.\nObtain necessary supplies: Request any necessary office supplies, such as pens, notepads, or folders, from the designated supply area or by contacting the appropriate department.\nFamiliarize yourself with office resources: Locate common areas, such as break rooms, restrooms, and meeting rooms. Familiarize yourself with office equipment, including printers, scanners, and telephones.\n",
145.      "summary": "\nThis onboarding guide provides essential information to new employees on our company culture and values, key onboarding steps, tax elections and documents, benefits enrollment, and setting up their workspace.",
146.      "name": "New Employee Onboarding Guide",
147.      "url": "./github/New Employee Onboarding guide.txt",
148.      "created_on": "2018-01-12",
149.      "category": "github",
150.      "_run_ml_inference": true,
151.      "rolePermissions": ["demo", "manager"]
152.    }
153.  ]
markdown 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ ls workplace-docs.json 
4.  workplace-docs.json

创建应用并展示

我们在当前的目录下打入如下的命令来创建 notebook:

markdown 复制代码
1.  $ pwd
2.  /Users/liuxg/python/elser
3.  $ jupyter notebook

连接到 Elasticsearch

python 复制代码
1.  from elasticsearch import Elasticsearch
2.  from dotenv import load_dotenv
3.  import os
4.  from elasticsearch import Elasticsearch

6.  load_dotenv()

8.  elastic_user=os.getenv('ES_USER')
9.  elastic_password=os.getenv('ES_PASSWORD')
10.  elastic_endpoint=os.getenv("ES_ENDPOINT")

12.  url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
13.  client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

15.  print(client.info())

从上面的输出我们可以看到连接是成功的。

准备数据集

我们将使用 Langchain 的工具来摄取原始文档并将其分割成更小的块。 我们正在使用示例工作场所搜索数据集。

LangChain 还有许多其他加载器可以从其他来源获取数据。 有关更多信息,请参阅其核心加载程序加载程序集成

python 复制代码
1.  import json

3.  # Load data into a JSON object
4.  with open('workplace-docs.json') as f:
5.     data = json.load(f)

7.  print(f"Successfully loaded {len(data)} documents")

9.  with open('temp.json', 'w') as json_file:
10.      json.dump(data, json_file)

上面的代码在项目的根目录下生成一个叫做 temp.json 文件。

从 huggingface 加载模型

你需要的第一件事是一个模型,用于从块中创建文本嵌入,你可以使用任何您想要的东西,但此示例将在 minilm-l6-v2 模型上端到端运行。我们可以使用 eland 库上传文本嵌入模型。

arduino 复制代码
1.  MODEL_ID = "sentence-transformers__all-minilm-l6-v2"

3.  !eland_import_hub_model --url https://elastic:xnLj56lTrH98Lf_6n76y@localhost:9200 \
4.  	--hub-model-id sentence-transformers/all-MiniLM-L6-v2 \
5.  	--task-type text_embedding \
6.  	--ca-cert ./http_ca.crt \
7.  	--clear-previous \
8.  	--start

你需要根据自己的 Elasticsearch 配置修改上面的用户名及密码。整个下载过程需要一定的时间。我们可以打开 Kibana 来查看模型的下载:

从上面的输出中,我们可以看到部署是成功的。

创建 Elasticsearch 索引

在此示例中,我们将使用管道进行推理并将嵌入存储在索引中。

在此示例中,我们使用句子 Transformers minilm-l6-v2 模型,你需要在 ML 节点上运行该模型。 通过这个模型,我们设置一个 index_pipeline 来进行推理并将嵌入存储在我们的索引中。

python 复制代码
1.  PIPELINE_ID = "chunk_text_to_passages"
2.  MODEL_DIMS = 384
3.  INDEX_NAME = "nb_parent_retriever_index"

5.  # Create the pipeline
6.  client.ingest.put_pipeline(
7.    id=PIPELINE_ID, 
8.    processors=[
9.      {
10.        "foreach": {
11.          "field": "passages",
12.          "processor": {
13.            "inference": {
14.              "field_map": {
15.                "_ingest._value.text": "text_field"
16.              },
17.              "model_id": MODEL_ID,
18.              "target_field": "_ingest._value.vector",
19.              "on_failure": [
20.                {
21.                  "append": {
22.                    "field": "_source._ingest.inference_errors",
23.                    "value": [
24.                      {
25.                        "message": "Processor 'inference' in pipeline 'ml-inference-title-vector' failed with message '{{ _ingest.on_failure_message }}'",
26.                        "pipeline": "ml-inference-title-vector",
27.                        "timestamp": "{{{ _ingest.timestamp }}}"
28.                      }
29.                    ]
30.                  }
31.                }
32.              ]
33.            }
34.          }
35.        }
36.      }
37.    ]
38.  )

40.  # Create the index
41.  client.indices.create( 
42.    index=INDEX_NAME, 
43.    settings={
44.      "index": {
45.        "default_pipeline": PIPELINE_ID
46.      }
47.    },
48.    mappings={
49.      "dynamic": "true",
50.      "properties": {
51.        "passages": {
52.          "type": "nested",
53.          "properties": {
54.            "vector": {
55.              "properties": {
56.                "predicted_value": {
57.                  "type": "dense_vector",
58.                  "index": True,
59.                  "dims": MODEL_DIMS,
60.                  "similarity": "dot_product"
61.                }
62.              }
63.            }
64.          }
65.        }
66.      }
67.    }
68.  )

请注意上面的 nested 类型字段。

实用工具:父子分割函数

该函数将一个文档拆分为多个段落,并返回父文档和子段落。

它还可以选择将父文档分块为更小的文档,这意味着父文档将被拆分为多个索引文档。 我们将在示例 2 中使用它。

css 复制代码
1.  from langchain.text_splitter import RecursiveCharacterTextSplitter

3.  def parent_child_splitter(documents, chunk_size: int = 200):

5.    child_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size)

7.    docs = []
8.    for i, doc in enumerate(documents):
9.      passages = []

11.      for _doc in child_splitter.split_documents([doc]):
12.          passages.append({
13.              "text": _doc.page_content,
14.          })

16.      doc = {
17.          "content": doc.page_content,
18.          "metadata": doc.metadata,
19.          "passages": passages
20.      }
21.      docs.append(doc)

23.    return docs

实用工具:漂亮的响应

此函数将以更易于阅读的格式打印出 Elasticsearch 的响应。

ini 复制代码
1.  def pretty_response(response, show_parent_text=False):
2.    if len(response['hits']['hits']) == 0:
3.        print('Your search returned no results.')
4.    else:
5.      for hit in response['hits']['hits']:
6.        id = hit['_id']
7.        score = hit['_score']
8.        doc_title = hit['_source']["metadata"]['name']
9.        parent_text = ""

11.        if show_parent_text:
12.            parent_text = hit['_source']["content"]

14.        passage_text = ""

16.        for passage in hit['inner_hits']['passages']['hits']['hits']:
17.            passage_text += passage["fields"]["passages"][0]['text'][0] + "\n\n"

19.        pretty_output = (f"\nID: {id}\nDoc Title: {doc_title}\nparent text:\n{parent_text}\nPassage Text:\n{passage_text}\nScore: {score}\n")
20.        print(pretty_output)
21.        print("---")

完整文档,嵌套段落

在此示例中,我们将文档拆分为多个段落,并将完整文档存储为父文档。 然后,我们将这些段落存储为嵌套文档,并带有返回父文档的链接。

下面我们使用父子拆分器将完整文档拆分为段落。 Parent_child_splitter fn 返回文档列表,其中包含嵌套段落数组。

然后我们将这些文档索引到 Elasticsearch 中。 这将为完整文档建立索引,并且段落将存储在嵌套字段中。

然后,我们的索引管道处理器将对段落运行推理,并将嵌入存储在索引中。

markdown 复制代码
1.  from elasticsearch import helpers

3.  chunked_docs = parent_child_splitter(loader.load(), chunk_size=600)

5.  count, errors = helpers.bulk(
6.    client, 
7.    chunked_docs,
8.    index=INDEX_NAME
9.  )

11.  print(f"Indexed {count} documents with {errors} errors")

13.  import time
14.  time.sleep(5)

我们可以在 Kibana 中查看摄入文档的格式:

做 nested 搜索

我们现在可以执行嵌套搜索,以查找与我们的查询匹配的段落,这些段落将在 inner_hits 中返回。 在下面的示例中,每个父文档仅请求一个段落。

vbscript 复制代码
1.  response = client.search(
2.    index=INDEX_NAME, 
3.    knn={
4.      "inner_hits": {
5.        "size": 1,
6.        "_source": False,
7.        "fields": [
8.          "passages.text"
9.        ]
10.      },
11.      "field": "passages.vector.predicted_value",
12.      "k": 5,
13.      "num_candidates": 100,
14.      "query_vector_builder": {
15.        "text_embedding": {
16.          "model_id": "sentence-transformers__all-minilm-l6-v2",
17.          "model_text": "Whats the work from home policy?"
18.        }
19.      }
20.    }
21.  )

23.  pretty_response(response)

使用 LangChain 来搜索

我们还可以通过调整查询在 Langchain 内执行此搜索。

我们还重写 doc_builder 以使用段落而不是完整文档填充 site_content。

python 复制代码
1.  from langchain.vectorstores.elasticsearch import ElasticsearchStore, ApproxRetrievalStrategy
2.  from typing import List, Union
3.  from langchain_core.documents import Document

5.  class CustomRetrievalStrategy(ApproxRetrievalStrategy):

7.      def query(
8.        self,
9.        query: Union[str, None],
10.        filter: List[dict],
11.        **kwargs,
12.      ):

14.        es_query = {
15.          "knn": {
16.            "inner_hits": {
17.                "_source": False,
18.                "fields": [
19.                    "passages.text"
20.                ]
21.            },
22.            "field": "passages.vector.predicted_value",
23.            "filter": filter,
24.            "k": 5,
25.            "num_candidates": 100,
26.            "query_vector_builder": {
27.              "text_embedding": {
28.                "model_id": "sentence-transformers__all-minilm-l6-v2",
29.                "model_text": query
30.              }
31.            }
32.          }
33.        }

35.        return es_query

38.  vector_store = ElasticsearchStore(
39.      index_name=INDEX_NAME,
40.      es_connection=client,
41.      query_field="content",
42.      strategy=CustomRetrievalStrategy(),
43.  )

45.  def doc_builder(hit):
46.    passage_hits = hit.get("inner_hits", {}).get("passages", {}).get("hits", {}).get("hits", [])
47.    page_content = ""
48.    for passage_hit in passage_hits:
49.      passage_fields = passage_hit.get("fields", {}).get("passages", [])[0]
50.      page_content += passage_fields.get("text", [])[0] + "\n\n"

52.      return Document(
53.        page_content=page_content,
54.        metadata=hit["_source"]["metadata"],
55.      )

57.  results = vector_store.similarity_search(query="Whats the work from home policy?", doc_builder=doc_builder)
58.  for result in results:
59.      print(f'Doc title: {result.metadata["name"]}')
60.      print(f'Text:\n{result.page_content}')

整个 notebook 的源代码可以在地址下载:github.com/liu-xiao-gu...

相关推荐
SafePloy安策7 小时前
ES信息防泄漏:策略与实践
大数据·elasticsearch·开源
涔溪7 小时前
Ecmascript(ES)标准
前端·elasticsearch·ecmascript
csdn56597385010 小时前
Elasticsearch 重建索引 数据迁移
elasticsearch·数据迁移·重建索引
天幕繁星10 小时前
docker desktop es windows解决vm.max_map_count [65530] is too low 问题
windows·elasticsearch·docker·docker desktop
Elastic 中国社区官方博客10 小时前
Elasticsearch 8.16:适用于生产的混合对话搜索和创新的向量数据量化,其性能优于乘积量化 (PQ)
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
m1chiru10 小时前
Elasticsearch 实战应用:高效搜索与数据分析
elasticsearch
飞翔的佩奇10 小时前
ElasticSearch:使用dsl语句同时查询出最近2小时、最近1天、最近7天、最近30天的数量
大数据·elasticsearch·搜索引擎·dsl
Elastic 中国社区官方博客17 小时前
Elasticsearch 和 Kibana 8.16:Kibana 获得上下文和 BBQ 速度并节省开支!
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
一个处女座的程序猿17 小时前
LLMs之VDB:Elasticsearch的简介、安装和使用方法、案例应用之详细攻略
大数据·elasticsearch·搜索引擎
未 顾1 天前
day12:版本控制器
大数据·elasticsearch·搜索引擎