在AWS Lambda上部署 tokenizers

在AWS Lambda上部署sentence-transformers的优化方案

原始方案的问题

最初我尝试在AWS Lambda上直接使用完整的sentence-transformers库来处理文本嵌入任务。这个方案的主要问题在于:

  1. 包体积过大:sentence-transformers及其依赖项(如PyTorch)的总大小超过250MB,而Lambda的部署包限制为250MB(未压缩)

  2. 冷启动时间长:大体积的依赖导致Lambda冷启动时需要较长的初始化时间

  3. 内存消耗高:完整版模型需要较大的内存,增加了Lambda运行成本

优化后的方案

技术栈调整

我最终采用了以下替代方案:

  1. ONNX Runtime:替换PyTorch作为推理引擎

    • 显著减小包体积(ONNX Runtime约50MB)
    • 提供优化的推理性能
    • 支持多种硬件加速选项
  2. HuggingFace Tokenizers:单独使用tokenizers库

    • 仅包含必要的文本预处理功能
    • 体积小且高效(约10MB)
  3. 模型选择:all-MiniLM-L6-v2的ONNX版本

    • 轻量级模型(约90MB)
    • 专为生产环境优化的ONNX格式

实现细节

  1. Lambda配置

    • Python 3.12运行时
    • 内存设置为512MB(实测256MB也可运行)
    • 超时时间15秒(足够处理批量请求)
  2. 部署包优化

    bash 复制代码
    pip install --target ./package onnxruntime tokenizers
    # 手动添加预转换的ONNX模型文件
    cd package && zip -r ../function.zip .
  3. 代码结构

    python 复制代码
    import onnxruntime as ort
    from tokenizers import Tokenizer
    
    # 初始化模型和tokenizer
    sess = ort.InferenceSession("model.onnx")
    tokenizer = Tokenizer.from_file("tokenizer.json")
    
    def lambda_handler(event, context):
        # 文本预处理
        inputs = tokenizer.encode(event["text"]).ids
        # ONNX推理
        outputs = sess.run(None, {"input_ids": [inputs]})
        return {"embedding": outputs[0].tolist()}

性能对比

指标 原始方案 优化方案
包体积 ~250MB ~150MB
冷启动时间 3-5秒 1-2秒
每次调用延迟 300-500ms 150-300ms
内存使用 700MB+ 300-400MB

适用场景

这个优化方案特别适合:

  1. 需要实时文本嵌入的服务
  2. 无服务器架构下的语义搜索应用
  3. 大规模文本相似度计算任务
  4. 成本敏感的AI服务部署

未来还可以考虑进一步优化,如使用Quantized ONNX模型或尝试更小的蒸馏模型。

安装记录

bash 复制代码
# 1. 宿主机创建 layer 目录(必做)
mkdir -p $(pwd)/layer && \

# 2. 仅安装依赖,不做任何清理(先保证依赖装成功)
docker run --rm -v $(pwd)/layer:/layer --entrypoint "" public.ecr.aws/lambda/python:3.12 \
  bash -c "
  # 强制创建容器内路径
  mkdir -p /layer/python/lib/python3.12/site-packages/ && \
  
  # 安装核心依赖(NumPy 1.x + ONNX Runtime + tokenizers)
  pip install numpy==1.26.4 onnxruntime==1.17.0 tokenizers==0.15.2 \
    --no-cache-dir -t /layer/python/lib/python3.12/site-packages/ \
    --force-reinstall
  "


Collecting numpy==1.26.4
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting onnxruntime==1.17.0
  Downloading onnxruntime-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.2 kB)
Collecting tokenizers==0.15.2
  Downloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting coloredlogs (from onnxruntime==1.17.0)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime==1.17.0)
  Downloading flatbuffers-25.12.19-py2.py3-none-any.whl.metadata (1.0 kB)
Collecting packaging (from onnxruntime==1.17.0)
  Downloading packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
Collecting protobuf (from onnxruntime==1.17.0)
  Downloading protobuf-7.34.0-cp310-abi3-manylinux2014_x86_64.whl.metadata (595 bytes)
Collecting sympy (from onnxruntime==1.17.0)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting huggingface_hub<1.0,>=0.16.4 (from tokenizers==0.15.2)
  Downloading huggingface_hub-0.36.2-py3-none-any.whl.metadata (15 kB)
Collecting filelock (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading filelock-3.25.0-py3-none-any.whl.metadata (2.0 kB)
Collecting fsspec>=2023.5.0 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading fsspec-2026.2.0-py3-none-any.whl.metadata (10 kB)
Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading hf_xet-1.3.2-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (4.9 kB)
Collecting pyyaml>=5.1 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.4 kB)
Collecting requests (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting tqdm>=4.42.1 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading tqdm-4.67.3-py3-none-any.whl.metadata (57 kB)
Collecting typing-extensions>=3.7.4.3 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.17.0)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->onnxruntime==1.17.0)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting charset_normalizer<4,>=2 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading charset_normalizer-3.4.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (39 kB)
Collecting idna<4,>=2.5 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading idna-3.11-py3-none-any.whl.metadata (8.4 kB)
Collecting urllib3<3,>=1.21.1 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading urllib3-2.6.3-py3-none-any.whl.metadata (6.9 kB)
Collecting certifi>=2017.4.17 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading certifi-2026.2.25-py3-none-any.whl.metadata (2.5 kB)
Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.0/18.0 MB 8.6 MB/s eta 0:00:00
Downloading onnxruntime-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 9.6 MB/s eta 0:00:00
Downloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 11.2 MB/s eta 0:00:00
Downloading huggingface_hub-0.36.2-py3-none-any.whl (566 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 566.4/566.4 kB 11.8 MB/s eta 0:00:00
Downloading packaging-26.0-py3-none-any.whl (74 kB)
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Downloading flatbuffers-25.12.19-py2.py3-none-any.whl (26 kB)
Downloading protobuf-7.34.0-cp310-abi3-manylinux2014_x86_64.whl (324 kB)
Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 9.7 MB/s eta 0:00:00
Downloading fsspec-2026.2.0-py3-none-any.whl (202 kB)
Downloading hf_xet-1.3.2-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 11.3 MB/s eta 0:00:00
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 11.3 MB/s eta 0:00:00
Downloading pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (807 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 807.9/807.9 kB 13.0 MB/s eta 0:00:00
Downloading tqdm-4.67.3-py3-none-any.whl (78 kB)
Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
Downloading filelock-3.25.0-py3-none-any.whl (26 kB)
Downloading requests-2.32.5-py3-none-any.whl (64 kB)
Downloading certifi-2026.2.25-py3-none-any.whl (153 kB)
Downloading charset_normalizer-3.4.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (196 kB)
Downloading idna-3.11-py3-none-any.whl (71 kB)
Downloading urllib3-2.6.3-py3-none-any.whl (131 kB)
Installing collected packages: mpmath, flatbuffers, urllib3, typing-extensions, tqdm, sympy, pyyaml, protobuf, packaging, numpy, idna, humanfriendly, hf-xet, fsspec, filelock, charset_normalizer, certifi, requests, coloredlogs, onnxruntime, huggingface_hub, tokenizers
Successfully installed certifi-2026.2.25 charset_normalizer-3.4.5 coloredlogs-15.0.1 filelock-3.25.0 flatbuffers-25.12.19 fsspec-2026.2.0 hf-xet-1.3.2 huggingface_hub-0.36.2 humanfriendly-10.0 idna-3.11 mpmath-1.3.0 numpy-1.26.4 onnxruntime-1.17.0 packaging-26.0 protobuf-7.34.0 pyyaml-6.0.3 requests-2.32.5 sympy-1.14.0 tokenizers-0.15.2 tqdm-4.67.3 typing-extensions-4.15.0 urllib3-2.6.3

瘦身

bash 复制代码
# 进入宿主机的 layer 目录
cd $(pwd)/layer/python/lib/python3.12/site-packages/ && \

# 清理冗余文件(宿主机操作,路径绝对存在)
rm -rf tests test examples docs doc *.egg-info && \
find . -name '*.pyc' -delete && \
find . -type d -name '__pycache__' -exec rm -rf {} + && \
rm -rf onnxruntime/{tools,experimental,contrib} && \
rm -rf tokenizers/{bindings,scripts}
相关推荐
清水白石0082 小时前
Python 虚拟环境完全指南:venv、virtualenv、conda、pipenv 深度对比与实战选择
python·conda·virtualenv
sichuanwuyi2 小时前
wydevops——最佳应用场景解析
java·开发语言·云原生·云计算·paas·devops
明月(Alioo)2 小时前
手撕 Agent 教程 - 打造一个轻量级个人智能助手
python·ai
阿部多瑞 ABU2 小时前
Python爬虫实战:话本小说网通用爬虫开发指南
开发语言·爬虫·python
杜子不疼.2 小时前
Python+AI 实战:搭建属于你的智能问答机器人
人工智能·python·机器人
wangjing_05222 小时前
使用python编程贪吃蛇单机小游戏(超详细讲解)
python·pygame
会员源码网8 小时前
Python中生成器函数与普通函数的区别
python
Java水解8 小时前
Python开发从入门到精通:Web框架Django实战
后端·python
曲幽10 小时前
FastAPI + PostgreSQL 实战:给应用装上“缓存”和“日志”翅膀
redis·python·elasticsearch·postgresql·logging·fastapi·web·es·fastapi-cache