在 MacOS 场景下体验 seekdb embeded

目前（Nov 2025）seekdb 还不能直接在 MacOS 里原生执行。如果你希望在 Mac 下体验 seekdb embeded 模式，可以通过安装 Docker 镜像来实现。

TLDR;

.

seekdb 总体处在概念 demo 阶段，上手体验非常流畅，SDK 提供的接口功能十分丰富，给开发者提供了非常便利的开发界面。

.

在 AI 能力方面，需要增加支持更多自定义模型的接口，增加更丰富的 embeding 函数，甚至支持自定义 embeding；在数据支持方面，需要增加对图片、视频等多媒体资源的处理示例。

.

概念方面，seekdb 应该努力成为开发者与 AI 之间的桥梁，成为 AI 操作的入口。例如，访问 GPT 获取内容摘要，分析图片特征，等等，开发这通过调用 seekdb 接口，seekdb 在后端智能地完成任务。

安装 seekdb 的 Docker 镜像

如果你是 Linux 环境，就不需要 docker 了，MacOS 暂时得委屈下，通过运行 Docker 容器来体验 seekdb。

bash 复制代码

docker run -d --name seekdb -p 2881:2881 -v ./data:/var/lib/oceanbase oceanbase/seekdb:latest

安装依赖

接下来，你需要 ssh 登录到 docker 里进行体验。

在 docker 里安装 python

bash 复制代码

sh-5.1# yum install python3
Rocky Linux 9 - BaseOS                                                                    538 kB/s | 2.6 MB     00:04    
Rocky Linux 9 - AppStream                                                                 1.4 MB/s | 8.1 MB     00:05    
Rocky Linux 9 - Extras                                                                    5.2 kB/s |  18 kB     00:03    
Package python3-3.9.18-3.el9.aarch64 is already installed.
Dependencies resolved.
==========================================================================================================================
 Package                       Architecture             Version                             Repository               Size
==========================================================================================================================
Upgrading:
 python3                       aarch64                  3.9.21-2.el9_6.2                    baseos                   26 k
 python3-libs                  aarch64                  3.9.21-2.el9_6.2                    baseos                  7.5 M

Transaction Summary
==========================================================================================================================
Upgrade  2 Packages

Total download size: 7.5 M
Is this ok [y/N]: y

安装pip3:

bash 复制代码

sh-5.1# yum install python3-pip
Last metadata expiration check: 0:03:24 ago on Thu Nov 20 11:57:37 2025.
Dependencies resolved.
==========================================================================================================================
 Package                           Architecture          Version                           Repository                Size
==========================================================================================================================
Installing:
 python3-pip                       noarch                21.3.1-1.el9                      appstream                1.7 M
Installing weak dependencies:
 python3-setuptools                noarch                53.0.0-13.el9_6.1                 baseos                   837 k

Transaction Summary
==========================================================================================================================
Install  2 Packages

Total download size: 2.6 M
Installed size: 13 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): python3-setuptools-53.0.0-13.el9_6.1.noarch.rpm                                    534 kB/s | 837 kB     00:01    
(2/2): python3-pip-21.3.1-1.el9.noarch.rpm                                                938 kB/s | 1.7 MB     00:01

接下来就可以安装 pyseekdb 了

bash 复制代码

sh-5.1# pip3 install pyseekdb
Collecting pyseekdb
  Downloading pyseekdb-1.0.0b2-py3-none-any.whl (96 kB)
     |████████████████████████████████| 96 kB 261 kB/s            
Collecting sentence-transformers
  Downloading sentence_transformers-5.1.2-py3-none-any.whl (488 kB)
     |████████████████████████████████| 488 kB 699 kB/s            
Collecting pylibseekdb
  Downloading pylibseekdb-1.0.0.post1-cp39-cp39-manylinux_2_28_aarch64.whl (59.9 MB)
     |████████████████████████████████| 59.9 MB 82 kB/s             
Collecting pymysql<2.0.0,>=1.1.1
  Downloading pymysql-1.1.2-py3-none-any.whl (45 kB)
     |████████████████████████████████| 45 kB 2.0 MB/s            
Collecting pylibseekdb_runtime==1.0.0.post1
  Downloading pylibseekdb_runtime-1.0.0.post1-cp39-cp39-manylinux_2_28_aarch64.whl (84.2 MB)
     |████████████████████████████████| 84.2 MB 126 kB/s            
Collecting scipy
  Downloading scipy-1.13.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (33.7 MB)
     |████████████████████████████████| 33.7 MB 4.1 MB/s            
Collecting transformers<5.0.0,>=4.41.0
  Downloading transformers-4.57.1-py3-none-any.whl (12.0 MB)
     |████████████████████████████████| 12.0 MB 1.4 MB/s            
Collecting typing_extensions>=4.5.0
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     |████████████████████████████████| 44 kB 349 kB/s            
Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (12.7 MB)
     |████████████████████████████████| 12.7 MB 4.6 MB/s            
Collecting tqdm
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     |████████████████████████████████| 78 kB 300 kB/s            
Collecting huggingface-hub>=0.20.0
  Downloading huggingface_hub-1.1.4-py3-none-any.whl (515 kB)
     |████████████████████████████████| 515 kB 401 kB/s            
Collecting Pillow
  Downloading pillow-11.3.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (6.0 MB)
     |████████████████████████████████| 6.0 MB 2.2 MB/s            
Collecting torch>=1.11.0
  Downloading torch-2.8.0-cp39-cp39-manylinux_2_28_aarch64.whl (102.1 MB)
     |████████████████████████████████| 102.1 MB 3.1 MB/s            
Collecting shellingham
  Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Collecting filelock
  Downloading filelock-3.19.1-py3-none-any.whl (15 kB)
Collecting httpx<1,>=0.23.0
  Downloading httpx-0.28.1-py3-none-any.whl (73 kB)
     |████████████████████████████████| 73 kB 296 kB/s            
Collecting pyyaml>=5.1
  Downloading pyyaml-6.0.3-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (737 kB)
     |████████████████████████████████| 737 kB 380 kB/s            
Collecting hf-xet<2.0.0,>=1.2.0
  Downloading hf_xet-1.2.0-cp37-abi3-manylinux_2_28_aarch64.whl (3.2 MB)
     |████████████████████████████████| 3.2 MB 2.6 MB/s            
Collecting typer-slim
  Downloading typer_slim-0.20.0-py3-none-any.whl (47 kB)
     |████████████████████████████████| 47 kB 3.3 MB/s            
Collecting fsspec>=2023.5.0
  Downloading fsspec-2025.10.0-py3-none-any.whl (200 kB)
     |████████████████████████████████| 200 kB 3.7 MB/s            
Collecting packaging>=20.9
  Downloading packaging-25.0-py3-none-any.whl (66 kB)
     |████████████████████████████████| 66 kB 3.0 MB/s            
Collecting sympy>=1.13.3
  Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
     |████████████████████████████████| 6.3 MB 3.2 MB/s            
Collecting jinja2
  Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
     |████████████████████████████████| 134 kB 5.5 MB/s            
Collecting networkx
  Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)
     |████████████████████████████████| 1.6 MB 5.2 MB/s            
Collecting numpy>=1.17
  Downloading numpy-2.0.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.9 MB)
     |████████████████████████████████| 13.9 MB 4.1 MB/s            
Collecting safetensors>=0.4.3
  Downloading safetensors-0.7.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (491 kB)
     |████████████████████████████████| 491 kB 258 kB/s            
Collecting requests
  Downloading requests-2.32.5-py3-none-any.whl (64 kB)
     |████████████████████████████████| 64 kB 2.5 MB/s            
Collecting regex!=2019.12.17
  Downloading regex-2025.11.3-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (781 kB)
     |████████████████████████████████| 781 kB 264 kB/s            
Collecting tokenizers<=0.23.0,>=0.22.0
  Downloading tokenizers-0.22.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB)
     |████████████████████████████████| 3.3 MB 729 kB/s            
Collecting huggingface-hub>=0.20.0
  Downloading huggingface_hub-0.36.0-py3-none-any.whl (566 kB)
     |████████████████████████████████| 566 kB 7.5 MB/s            
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.5.2-py3-none-any.whl (308 kB)
     |████████████████████████████████| 308 kB 2.8 MB/s            
Collecting anyio
  Downloading anyio-4.11.0-py3-none-any.whl (109 kB)
     |████████████████████████████████| 109 kB 4.1 MB/s            
Collecting certifi
  Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
     |████████████████████████████████| 159 kB 2.8 MB/s            
Collecting idna
  Downloading idna-3.11-py3-none-any.whl (71 kB)
     |████████████████████████████████| 71 kB 3.1 MB/s            
Collecting httpcore==1.*
  Downloading httpcore-1.0.9-py3-none-any.whl (78 kB)
     |████████████████████████████████| 78 kB 650 kB/s            
Collecting h11>=0.16
  Downloading h11-0.16.0-py3-none-any.whl (37 kB)
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     |████████████████████████████████| 536 kB 405 kB/s            
Collecting MarkupSafe>=2.0
  Downloading markupsafe-3.0.3-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (21 kB)
Collecting charset_normalizer<4,>=2
  Downloading charset_normalizer-3.4.4-cp39-cp39-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (149 kB)
     |████████████████████████████████| 149 kB 391 kB/s            
Collecting urllib3<3,>=1.21.1
  Downloading urllib3-2.5.0-py3-none-any.whl (129 kB)
     |████████████████████████████████| 129 kB 780 kB/s            
Collecting click>=8.0.0
  Downloading click-8.1.8-py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 2.3 MB/s            
Collecting exceptiongroup>=1.0.2
  Downloading exceptiongroup-1.3.0-py3-none-any.whl (16 kB)
Collecting sniffio>=1.1
  Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Installing collected packages: urllib3, idna, charset-normalizer, certifi, typing-extensions, tqdm, requests, pyyaml, packaging, hf-xet, fsspec, filelock, numpy, mpmath, MarkupSafe, huggingface-hub, tokenizers, threadpoolctl, sympy, scipy, safetensors, regex, networkx, joblib, jinja2, transformers, torch, scikit-learn, pylibseekdb-runtime, Pillow, sentence-transformers, pymysql, pylibseekdb, pyseekdb
Successfully installed MarkupSafe-3.0.3 Pillow-11.3.0 certifi-2025.11.12 charset-normalizer-3.4.4 filelock-3.19.1 fsspec-2025.10.0 hf-xet-1.2.0 huggingface-hub-0.36.0 idna-3.11 jinja2-3.1.6 joblib-1.5.2 mpmath-1.3.0 networkx-3.2.1 numpy-2.0.2 packaging-25.0 pylibseekdb-1.0.0.post1 pylibseekdb-runtime-1.0.0.post1 pymysql-1.1.2 pyseekdb-1.0.0b2 pyyaml-6.0.3 regex-2025.11.3 requests-2.32.5 safetensors-0.7.0 scikit-learn-1.6.1 scipy-1.13.1 sentence-transformers-5.1.2 sympy-1.14.0 threadpoolctl-3.6.0 tokenizers-0.22.1 torch-2.8.0 tqdm-4.67.1 transformers-4.57.1 typing-extensions-4.15.0 urllib3-2.5.0

实验代码

使用官网的例子，在 docker 内执行。

py 复制代码

"""
Simple Example: Basic usage of SeekDBClient with Embedding Functions

This example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""
import pyseekdb

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client()
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     database="test",
#     user="root",
#     password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
#     host="127.0.0.1",
#     port=2881,
#     tenant="test",  # OceanBase default tenant
#     database="test",
#     user="root",
#     password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
    name=collection_name,
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
    "Machine learning is a subset of artificial intelligence",
    "Python is a popular programming language",
    "Vector databases enable semantic search",
    "Neural networks are inspired by the human brain",
    "Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
    ids=ids,
    documents=documents,  # embeddings will be automatically generated
    metadatas=[
        {"category": "AI", "index": 0},
        {"category": "Programming", "index": 1},
        {"category": "Database", "index": 2},
        {"category": "AI", "index": 3},
        {"category": "NLP", "index": 4}
    ]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
    query_texts=query_text,  # Query text - will be embedded automatically
    n_results=3  # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
    print(f"\nResult {i+1}:")
    print(f"  ID: {results['ids'][0][i]}")
    print(f"  Distance: {results['distances'][0][i]:.4f}")
    if results.get('documents'):
        print(f"  Document: {results['documents'][0][i]}")
    if results.get('metadatas'):
        print(f"  Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

这段代码展示了如何使用 SeekDB 客户端（pyseekdb）与嵌入函数进行基础操作。简单来说，它演示了如何使用向量数据库处理文本数据，包括如何创建集合、添加文档、查询数据和清理数据。

代码中的重要概念和特点：

自动生成嵌入：

这个代码示例的关键点在于，文档和查询文本的嵌入（即转换为向量表示）是自动完成的。你无需手动计算嵌入向量，SeekDB 会根据文档内容自动生成嵌入向量，并在数据库中存储。这使得使用向量数据库变得非常简单，用户只需要提供文本，而不需要担心如何生成向量。

嵌入函数：

代码中使用了默认的嵌入函数。嵌入函数负责将文档和查询文本转换为向量。SeekDB 提供了这种功能，允许开发者专注于数据存储和查询，而无需自己实现文本嵌入算法。

查询：

查询操作不再是基于传统的字符串匹配，而是基于向量之间的距离（通常是余弦相似度或欧几里得距离）。这使得查询变得更加智能，可以基于语义相似性找到相关的文档，而不仅仅是关键词匹配。

元数据：

每个文档除了存储文本内容和嵌入向量外，还可以附加一些元数据。代码中的文档包含一个 "category" 字段，这有助于在查询中附加更多的上下文信息。

实验过程

下载模型 model.safetensors 比较耗时，在我的环境里用了 52 秒。下载+执行，总共花费了1分30秒。

运行结果如下：

复制代码

sh-5.1# mkdir embedded
sh-5.1# cd embedded/
sh-5.1# vi test.py
sh-5.1# python3 test.py
Preparing seekdb cache environment ...
Successfully cached to /root/.seekdb/cache/py3.9/1.0.0.post1/libseekdb_python.so
modules.json: 349B [00:00, 1.97MB/s]
config_sentence_transformers.json: 116B [00:00, 270kB/s]
README.md: 10.5kB [00:00, 11.9MB/s]
sentence_bert_config.json: 53.0B [00:00, 116kB/s]
config.json: 612B [00:00, 431kB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████| 90.9M/90.9M [00:52<00:00, 1.73MB/s]
tokenizer_config.json: 350B [00:00, 680kB/s]
vocab.txt: 232kB [00:00, 557kB/s] 
tokenizer.json: 466kB [00:00, 1.30MB/s]
special_tokens_map.json: 112B [00:00, 231kB/s]
config.json: 190B [00:00, 489kB/s]
Created collection 'my_simple_collection' with dimension: 384
Embedding function: DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')

Added 5 documents to collection
Note: Embeddings were automatically generated from documents using the embedding function

Query: 'artificial intelligence and machine learning'
Query results: 3 items found

Result 1:
  ID: id1
  Distance: 0.3008
  Document: Machine learning is a subset of artificial intelligence
  Metadata: {'index': 0, 'category': 'AI'}

Result 2:
  ID: id4
  Distance: 0.5983
  Document: Neural networks are inspired by the human brain
  Metadata: {'index': 3, 'category': 'AI'}

Result 3:
  ID: id5
  Distance: 0.6856
  Document: Natural language processing helps computers understand text
  Metadata: {'index': 4, 'category': 'NLP'}

Deleted collection 'my_simple_collection'

执行完成后，看看当前目录下的内容，多了一个 seekdb.db 文件。

复制代码

sh-5.1# ls
seekdb.db  test.py

再次执行 python3 test.py 会有些不一样，不再有模型下载过程。这是因为第一次执行时已经 Cache 了 seekdb 的依赖，如模型、动态库等，再次执行 test.py 时就快一些了，耗时 16s。

这里面显然还有很大的优化空间，相信下一版本就能解决。

动手

写一个简单的 AI 记事本，支持录入、摘要和智能搜索。

Note: 摘要需要依赖外部 AI 模型，所以简化成取前两句。

py 复制代码

# python3 notebook.py 
import uuid
import random
import pyseekdb
import json
import sys

# Local text embedding generator - simplified version (random vector simulation)
def generate_embedding(text, dimension=384):
    """
    This function generates a random embedding of the specified dimension.
    In a real scenario, you would use an actual NLP model to generate embeddings.
    """
    return [random.random() for _ in range(dimension)]  # Use random vectors for simulation

# Simple summary generator - generates a summary based on the first two sentences
def generate_summary(text):
    """
    This function generates a simple summary by taking the first two sentences.
    You could use a more advanced summarization model here.
    """
    sentences = text.split(".")
    return ". ".join(sentences[:2]) + "."

# Create SeekDB client
client = pyseekdb.Client()

# Collection name
collection_name = "offline_notes_with_summary"

# Create the collection with dimension=384 for the embeddings
config = pyseekdb.HNSWConfiguration(dimension=384, distance='cosine')
collection = client.get_or_create_collection(name=collection_name, configuration=config)

# Function to read multi-line input until Ctrl + X is pressed
def read_multiline_input():
    print("Start typing your note (press Ctrl + X to finish):")
    
    note_lines = []
    while True:
        try:
            # Read character by character
            char = sys.stdin.read(1)
            if char == '\x18':  # Ctrl + X (EOF)
                break
            elif char == '\n':  # Handle newlines
                note_lines.append('\n')
            else:
                note_lines.append(char)
        except KeyboardInterrupt:
            break

    return ''.join(note_lines)

# Function to add a new note to the database
def add_note_to_db(note_text):
    note_id = str(uuid.uuid4())
    embedding = generate_embedding(note_text)  # Generate embedding locally
    summary = generate_summary(note_text)  # Generate summary

    # Ensure proper JSON format for metadata
    metadata = {"summary": summary, "category": "general"}

    # Ensure that metadata and document are properly encoded/escaped
    try:
        metadata_json = json.dumps(metadata)  # Ensure the metadata is a valid JSON string
    except json.JSONDecodeError as e:
        print(f"Error encoding metadata to JSON: {e}")
        return

    # Add the note to the collection
    try:
        # Ensure everything passed to collection is correctly serialized
        collection.add(
            ids=note_id,
            documents=note_text,  # Store the full note text
            embeddings=[embedding],
            metadatas=[metadata]  # Use the dictionary directly without JSON.loads
        )
        print(f"Added note with ID: {note_id}")
    except Exception as e:
        print(f"Error adding note to the collection: {e}")

# Search function
def search_notes(query_text):
    query_embedding = generate_embedding(query_text)
    results = collection.query(query_embeddings=query_embedding, n_results=3)

    if results["ids"]:  # Check if there are any results
        print(f"Search results for: {query_text}")
        for i in range(len(results["ids"][0])):
            note_id = results["ids"][0][i]
            summary = results["metadatas"][0][i]["summary"] if "summary" in results["metadatas"][0][i] else "No summary"
            print(f"ID: {note_id}, Summary: {summary}")
    else:
        print("No results found for your query.")

# List all notes
def list_notes():
    # Fetch all notes with a limit of 100
    results = collection.get(limit=100)
    
    # Debugging: print the structure of results to see its contents
    print("Results Structure:", results)

    # If no notes are found, inform the user
    if not results.get("ids"):
        print("No notes found.")
        return
    
    # Allow the user to choose a note ID to view the full text
    note_id = input("\nEnter note ID to view full text or press Enter to go back: ").strip()
    if note_id:
        display_note_by_id(note_id)

# Display full text of a note
def display_note_by_id(note_id):
    try:
        result = collection.get(ids=[note_id])
        if result["ids"]:
            full_text = result["documents"][0]
            print(f"\nFull Text of Note {note_id}:")
            print(full_text)
        else:
            print("Note ID not found.")
    except Exception as e:
        print(f"Error: {e}")

# Main loop for CLI
def main():
    while True:
        print("\n--- AI Notepad ---")
        print("1. New Note")
        print("2. Search Note")
        print("3. List Notes")
        print("4. Exit")

        choice = input("Enter your choice: ").strip()

        if choice == '1':
            note_text = read_multiline_input()  # Capture multi-line input for new note
            add_note_to_db(note_text)
        elif choice == '2':
            query_text = input("Enter your search query: ").strip()
            search_notes(query_text)
        elif choice == '3':
            list_notes()
        elif choice == '4':
            print("Exiting...")
            break
        else:
            print("Invalid choice. Please try again.")

if __name__ == "__main__":
    main()

先通过 New Note 增加了如下新闻内容验证搜索效果。

复制代码

1. Technology: World's First Quantum Computer Chip Achieved

On November 20, 2025, a research team from MIT announced the successful development of the world's first commercially viable quantum computer chip. This groundbreaking achievement is considered a significant milestone in quantum computing, bringing the potential for real-world applications closer than ever before. The chip can operate in extremely low temperatures and offers computational power far beyond the most advanced supercomputers. Experts predict that this technology could revolutionize fields such as cryptography, pharmaceuticals, and artificial intelligence in the coming years.

2. Environment: Arctic Ice Melting at Accelerated Rate

A recent climate report, released on November 20, 2025, reveals that global warming is causing the ice sheets in both the Arctic and Antarctic to melt at twice the rate compared to the last five decades. Scientists warn that this accelerated melting is not only contributing to rising sea levels but also significantly affecting climate patterns worldwide. Projections suggest that by the end of this century, global sea levels could rise by at least 1 meter, threatening coastal cities. Governments worldwide are urged to implement stronger measures to curb greenhouse gas emissions and invest in climate research.

3. Culture: Digital Art Market on the Rise

With the growth of NFTs (non-fungible tokens) and virtual reality technology, the digital art market is experiencing a surge in popularity. At the 2025 Paris International Art Expo, numerous digital artists showcased VR and blockchain-based works. These new forms of art, made possible by advances in technology, are garnering significant interest from both creators and collectors. Experts predict that digital art will become a mainstream sector of the art market, significantly transforming how art is created, traded, and consumed in the coming years.

4. Health: Breakthrough in New Vaccine Development

On November 20, 2025, Pfizer announced a breakthrough in the development of a new vaccine that has shown promising results in clinical trials. This vaccine is designed to combat multiple strains of flu and COVID-19, offering broad-spectrum immunity with minimal side effects. Experts believe that this vaccine could become a crucial tool in preventing future pandemics, as it targets a variety of viral mutations. This development is seen as a major step forward in the global public health fight against emerging infectious diseases.

5. Economy: Global Inflation Eases, But Economic Outlook Varies

On November 20, 2025, the International Monetary Fund (IMF) released its latest global economic report, revealing that inflation rates have dropped from 8.5% in 2023 to around 5%. While this indicates a recovery from the pandemic-induced economic crisis, the outlook remains uneven. Developed nations in Europe and North America are seeing strong economic growth, but emerging markets in Asia are still struggling with inflationary pressures and high unemployment. Experts recommend tailored monetary policies for each country to stabilize growth and prevent financial instability.

6. Technology: AI Makes Strides in the Legal Industry

On November 20, 2025, OpenAI announced the launch of a new AI-powered legal assistant developed in collaboration with several international law firms. The tool can analyze legal documents, predict case outcomes, and help lawyers develop litigation strategies. Early tests show that it significantly improves efficiency and reduces labor costs for legal teams. This development is expected to revolutionize the legal industry, enabling firms to handle larger caseloads while reducing reliance on human resources.

7. Education: Global Online Education Market Sees Growth

According to a report released on November 20, 2025, the global online education market is expected to reach $300 billion by 2027, marking a significant growth period. Advances in technology have allowed online education to expand beyond K-12 education, with adult learning and professional development becoming key growth areas. The shift to remote learning during the pandemic accelerated this trend, and experts predict that online education will continue to transform the global education system, offering more personalized and accessible learning experiences.

搜索效果如下，怎么说呢，还差点意思。这也体现了内置模型的局限性。

复制代码

--- AI Notepad ---
1. New Note
2. Search Note
3. List Notes
4. Exit
Enter your choice: 2
Enter your search query: news about global warming
Search results for: news about global warming
ID: c98c5a4c-54ca-48c9-82c1-99897edb283b, Summary: Environment: Arctic Ice Melting at Accelerated Rate.
ID: a13329f3-e853-45f9-9205-1bfe980dbfaf, Summary: On November 20, 2025, a research team from MIT announced the successful development of the world's first commercially viable quantum computer chip.  This groundbreaking achievement is considered a significant milestone in quantum computing, bringing the potential for real-world applications closer than ever before.
ID: 67969422-c7c7-4d31-aa5b-69baeb53eecd, Summary: Technology: World's First Quantum Computer Chip Achieved.


--- AI Notepad ---
1. New Note
2. Search Note
3. List Notes
4. Exit
Enter your choice: 2
Enter your search query: what happened in Euro
Search results for: what happened in Euro
ID: 2be93dd2-0cf6-4abc-8ae5-b948a92a0603, Summary: On November 20, 2025, the International Monetary Fund (IMF) released its latest global economic report, revealing that inflation rates have dropped from 8. 5% in 2023 to around 5%.
ID: 142f8b2a-e29c-415e-b24a-476345847cb1, Summary: On November 20, 2025, Pfizer announced a breakthrough in the development of a new vaccine that has shown promising results in clinical trials.  This vaccine is designed to combat multiple strains of flu and COVID-19, offering broad-spectrum immunity with minimal side effects.
ID: c98c5a4c-54ca-48c9-82c1-99897edb283b, Summary: Environment: Arctic Ice Melting at Accelerated Rate.

感受

缺点和不足，特别是针对 agent 开发这一场景：

嵌入生成和精度问题

SeekDB 在处理文本数据时，依赖嵌入函数（embedding function）自动生成向量。嵌入函数的效果很大程度上取决于其训练数据和模型的质量。如果嵌入函数生成的向量不够精准，可能会导致搜索结果的准确性和相关性较低，影响智能 agent 的决策和行动。

问题：若生成的向量没有很好的表示文本的语义，agent 在执行任务时会面临低效的搜索结果，可能导致错误的决策或不准确的查询结果。

改进建议：如果 agent 的核心任务需要高精度的语义理解，可能需要更强大的定制化嵌入模型（例如，使用领域特定的模型进行训练）或者手动优化嵌入的计算方式，或者接入云端模型。

依赖预定义的嵌入函数

SeekDB 提供的是默认的嵌入函数，这对于快速上手非常有用。然而，针对特定的 agent 开发，有时你需要定制化嵌入生成函数，以适应特定领域或任务。

问题：如果你的 agent 需要处理一些非常专业的任务（例如医疗、法律、金融等领域的知识），默认的嵌入函数可能不适用，无法精准地捕捉领域特定的语义差异。

改进建议：需要考虑是否可以定制嵌入函数，或者考虑结合外部预训练模型（例如 GPT-3、BERT）来增强其语义表达能力。