基于Microsoft.Extensions.AI 和 Microsoft.Extensions.VectorData构建向量搜索

今天简单看了看,基于Microsoft.Extensions.AI 和 Microsoft.Extensions.VectorData构建向量搜索。给大家分享分享。

首先,创建.NET 控制台应用,然后执行以下开发步骤

  • 通过为数据集生成嵌入内容来创建和填充向量存储。
  • 为用户提示生成嵌入内容。
  • 使用用户提示嵌入查询矢量存储。
  • 显示矢量搜索的相关结果。
复制代码
dotnet new console -o VectorDataAIDemo

切换到以上

复制代码
VectorDataAIDemo

目录下,安装以下Nuget包

复制代码
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.VectorData.Abstractions
dotnet add package Microsoft.SemanticKernel.Connectors.InMemory --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
dotnet add package System.Linq.AsyncEnumerable

新建Class:CloudServiceWiki

复制代码
using Microsoft.Extensions.VectorData;

namespace VectorDataAIDemo;

internal class CloudServiceWiki
{
    [VectorStoreKey]
    public int Key { get; set; }

    [VectorStoreData]
    public string Name { get; set; }

    [VectorStoreData]
    public string Description { get; set; }

    [VectorStoreVector(
        Dimensions: 384,
        DistanceFunction = DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float> Vector { get; set; }
}

Microsoft.Extensions.VectorData 属性,例如 VectorStoreKeyAttribute,会影响在向量存储中使用时每个属性的处理方式。

Vector 属性存储生成的嵌入,表示 Description 值在矢量搜索中的语义含义。

Program.cs 文件中,添加以下代码以创建描述云服务知识库集合的数据集:

复制代码
List<CloudServiceWiki> cloudServices =
[
    new() {
            Key = 0,
            Name = "Azure App Service",
            Description = "Host .NET, Java, Node.js, and Python web applications and APIs in a fully managed Azure service. You only need to deploy your code to Azure. Azure takes care of all the infrastructure management like high availability, load balancing, and autoscaling."
    },
    new() {
            Key = 1,
            Name = "Azure Service Bus",
            Description = "A fully managed enterprise message broker supporting both point to point and publish-subscribe integrations. It's ideal for building decoupled applications, queue-based load leveling, or facilitating communication between microservices."
    },
    new() {
            Key = 2,
            Name = "Azure Blob Storage",
            Description = "Azure Blob Storage allows your applications to store and retrieve files in the cloud. Azure Storage is highly scalable to store massive amounts of data and data is stored redundantly to ensure high availability."
    },
    new() {
            Key = 3,
            Name = "Microsoft Entra ID",
            Description = "Manage user identities and control access to your apps, data, and resources."
    },
    new() {
            Key = 4,
            Name = "Azure Key Vault",
            Description = "Store and access application secrets like connection strings and API keys in an encrypted vault with restricted access to make sure your secrets and your application aren't compromised."
    },
    new() {
            Key = 5,
            Name = "Azure AI Search",
            Description = "Information retrieval at scale for traditional and conversational search applications, with security and options for AI enrichment and vectorization."
    }
];

创建和配置 IEmbeddingGenerator 实现以将请求发送到嵌入 AI 模型:

复制代码
// Load the configuration values.
IConfigurationRoot config = new ConfigurationBuilder().AddUserSecrets<Program>().Build();
string model = config["ModelName"];
string key = config["OpenAIKey"];

// Create the embedding generator.
IEmbeddingGenerator<string, Embedding<float>> generator =
    new OpenAIClient(new ApiKeyCredential(key))
      .GetEmbeddingClient(model: model)
      .AsIEmbeddingGenerator();

使用云服务知识库数据创建和填充向量存储。 使用 IEmbeddingGenerator 实现为云服务知识库数据中的每个记录创建和分配嵌入向量:

复制代码
// Create and populate the vector store.
var vectorStore = new InMemoryVectorStore();
VectorStoreCollection<int, CloudServiceWiki> cloudServicesStore =
    vectorStore.GetCollection<int, CloudServiceWiki>("cloudServices");
await cloudServicesStore.EnsureCollectionExistsAsync();

foreach (CloudServiceWiki service in cloudServices)
{
    service.Vector = await generator.GenerateVectorAsync(service.Description);
    await cloudServicesStore.UpsertAsync(service);
}

嵌入是每个数据记录语义含义的数字表示形式,这使得它们与向量搜索功能兼容。

搜索查询

复制代码
// Convert a search query to a vector
// and search the vector store.
string query = "Which Azure service should I use to store my Word documents?";
ReadOnlyMemory<float> queryEmbedding = await generator.GenerateVectorAsync(query);

IAsyncEnumerable<VectorSearchResult<CloudServiceWiki>> results =
    cloudServicesStore.SearchAsync(queryEmbedding, top: 1);

await foreach (VectorSearchResult<CloudServiceWiki> result in results)
{
    Console.WriteLine($"Name: {result.Record.Name}");
    Console.WriteLine($"Description: {result.Record.Description}");
    Console.WriteLine($"Vector match score: {result.Score}");
}

周国庆

20260309