在AWS Lambda上部署 tokenizers

在AWS Lambda上部署sentence-transformers的优化方案

原始方案的问题

最初我尝试在AWS Lambda上直接使用完整的sentence-transformers库来处理文本嵌入任务。这个方案的主要问题在于:

  1. 包体积过大:sentence-transformers及其依赖项(如PyTorch)的总大小超过250MB,而Lambda的部署包限制为250MB(未压缩)

  2. 冷启动时间长:大体积的依赖导致Lambda冷启动时需要较长的初始化时间

  3. 内存消耗高:完整版模型需要较大的内存,增加了Lambda运行成本

优化后的方案

技术栈调整

我最终采用了以下替代方案:

  1. ONNX Runtime:替换PyTorch作为推理引擎

    • 显著减小包体积(ONNX Runtime约50MB)
    • 提供优化的推理性能
    • 支持多种硬件加速选项
  2. HuggingFace Tokenizers:单独使用tokenizers库

    • 仅包含必要的文本预处理功能
    • 体积小且高效(约10MB)
  3. 模型选择:all-MiniLM-L6-v2的ONNX版本

    • 轻量级模型(约90MB)
    • 专为生产环境优化的ONNX格式

实现细节

  1. Lambda配置

    • Python 3.12运行时
    • 内存设置为512MB(实测256MB也可运行)
    • 超时时间15秒(足够处理批量请求)
  2. 部署包优化

    bash 复制代码
    pip install --target ./package onnxruntime tokenizers
    # 手动添加预转换的ONNX模型文件
    cd package && zip -r ../function.zip .
  3. 代码结构

    python 复制代码
    import onnxruntime as ort
    from tokenizers import Tokenizer
    
    # 初始化模型和tokenizer
    sess = ort.InferenceSession("model.onnx")
    tokenizer = Tokenizer.from_file("tokenizer.json")
    
    def lambda_handler(event, context):
        # 文本预处理
        inputs = tokenizer.encode(event["text"]).ids
        # ONNX推理
        outputs = sess.run(None, {"input_ids": [inputs]})
        return {"embedding": outputs[0].tolist()}

性能对比

指标 原始方案 优化方案
包体积 ~250MB ~150MB
冷启动时间 3-5秒 1-2秒
每次调用延迟 300-500ms 150-300ms
内存使用 700MB+ 300-400MB

适用场景

这个优化方案特别适合:

  1. 需要实时文本嵌入的服务
  2. 无服务器架构下的语义搜索应用
  3. 大规模文本相似度计算任务
  4. 成本敏感的AI服务部署

未来还可以考虑进一步优化,如使用Quantized ONNX模型或尝试更小的蒸馏模型。

安装记录

bash 复制代码
# 1. 宿主机创建 layer 目录(必做)
mkdir -p $(pwd)/layer && \

# 2. 仅安装依赖,不做任何清理(先保证依赖装成功)
docker run --rm -v $(pwd)/layer:/layer --entrypoint "" public.ecr.aws/lambda/python:3.12 \
  bash -c "
  # 强制创建容器内路径
  mkdir -p /layer/python/lib/python3.12/site-packages/ && \
  
  # 安装核心依赖(NumPy 1.x + ONNX Runtime + tokenizers)
  pip install numpy==1.26.4 onnxruntime==1.17.0 tokenizers==0.15.2 \
    --no-cache-dir -t /layer/python/lib/python3.12/site-packages/ \
    --force-reinstall
  "


Collecting numpy==1.26.4
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting onnxruntime==1.17.0
  Downloading onnxruntime-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.2 kB)
Collecting tokenizers==0.15.2
  Downloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting coloredlogs (from onnxruntime==1.17.0)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime==1.17.0)
  Downloading flatbuffers-25.12.19-py2.py3-none-any.whl.metadata (1.0 kB)
Collecting packaging (from onnxruntime==1.17.0)
  Downloading packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
Collecting protobuf (from onnxruntime==1.17.0)
  Downloading protobuf-7.34.0-cp310-abi3-manylinux2014_x86_64.whl.metadata (595 bytes)
Collecting sympy (from onnxruntime==1.17.0)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting huggingface_hub<1.0,>=0.16.4 (from tokenizers==0.15.2)
  Downloading huggingface_hub-0.36.2-py3-none-any.whl.metadata (15 kB)
Collecting filelock (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading filelock-3.25.0-py3-none-any.whl.metadata (2.0 kB)
Collecting fsspec>=2023.5.0 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading fsspec-2026.2.0-py3-none-any.whl.metadata (10 kB)
Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading hf_xet-1.3.2-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (4.9 kB)
Collecting pyyaml>=5.1 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.4 kB)
Collecting requests (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting tqdm>=4.42.1 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading tqdm-4.67.3-py3-none-any.whl.metadata (57 kB)
Collecting typing-extensions>=3.7.4.3 (from huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.17.0)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->onnxruntime==1.17.0)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting charset_normalizer<4,>=2 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading charset_normalizer-3.4.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (39 kB)
Collecting idna<4,>=2.5 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading idna-3.11-py3-none-any.whl.metadata (8.4 kB)
Collecting urllib3<3,>=1.21.1 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading urllib3-2.6.3-py3-none-any.whl.metadata (6.9 kB)
Collecting certifi>=2017.4.17 (from requests->huggingface_hub<1.0,>=0.16.4->tokenizers==0.15.2)
  Downloading certifi-2026.2.25-py3-none-any.whl.metadata (2.5 kB)
Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.0/18.0 MB 8.6 MB/s eta 0:00:00
Downloading onnxruntime-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 9.6 MB/s eta 0:00:00
Downloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 11.2 MB/s eta 0:00:00
Downloading huggingface_hub-0.36.2-py3-none-any.whl (566 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 566.4/566.4 kB 11.8 MB/s eta 0:00:00
Downloading packaging-26.0-py3-none-any.whl (74 kB)
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Downloading flatbuffers-25.12.19-py2.py3-none-any.whl (26 kB)
Downloading protobuf-7.34.0-cp310-abi3-manylinux2014_x86_64.whl (324 kB)
Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 9.7 MB/s eta 0:00:00
Downloading fsspec-2026.2.0-py3-none-any.whl (202 kB)
Downloading hf_xet-1.3.2-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (4.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 11.3 MB/s eta 0:00:00
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 11.3 MB/s eta 0:00:00
Downloading pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (807 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 807.9/807.9 kB 13.0 MB/s eta 0:00:00
Downloading tqdm-4.67.3-py3-none-any.whl (78 kB)
Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
Downloading filelock-3.25.0-py3-none-any.whl (26 kB)
Downloading requests-2.32.5-py3-none-any.whl (64 kB)
Downloading certifi-2026.2.25-py3-none-any.whl (153 kB)
Downloading charset_normalizer-3.4.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (196 kB)
Downloading idna-3.11-py3-none-any.whl (71 kB)
Downloading urllib3-2.6.3-py3-none-any.whl (131 kB)
Installing collected packages: mpmath, flatbuffers, urllib3, typing-extensions, tqdm, sympy, pyyaml, protobuf, packaging, numpy, idna, humanfriendly, hf-xet, fsspec, filelock, charset_normalizer, certifi, requests, coloredlogs, onnxruntime, huggingface_hub, tokenizers
Successfully installed certifi-2026.2.25 charset_normalizer-3.4.5 coloredlogs-15.0.1 filelock-3.25.0 flatbuffers-25.12.19 fsspec-2026.2.0 hf-xet-1.3.2 huggingface_hub-0.36.2 humanfriendly-10.0 idna-3.11 mpmath-1.3.0 numpy-1.26.4 onnxruntime-1.17.0 packaging-26.0 protobuf-7.34.0 pyyaml-6.0.3 requests-2.32.5 sympy-1.14.0 tokenizers-0.15.2 tqdm-4.67.3 typing-extensions-4.15.0 urllib3-2.6.3

瘦身

bash 复制代码
# 进入宿主机的 layer 目录
cd $(pwd)/layer/python/lib/python3.12/site-packages/ && \

# 清理冗余文件(宿主机操作,路径绝对存在)
rm -rf tests test examples docs doc *.egg-info && \
find . -name '*.pyc' -delete && \
find . -type d -name '__pycache__' -exec rm -rf {} + && \
rm -rf onnxruntime/{tools,experimental,contrib} && \
rm -rf tokenizers/{bindings,scripts}
相关推荐
zzb158022 分钟前
Agent记忆与检索
java·人工智能·python·学习·ai
wzl2026121332 分钟前
从0到1搭建私域数据中台——公域引流的数据采集与分析
python·自动化·企业微信
源码之家44 分钟前
大数据毕业设计汽车推荐系统 Django框架 可视化 协同过滤算法 数据分析 大数据 机器学习(建议收藏)✅
大数据·python·算法·django·汽车·课程设计·美食
HealthScience1 小时前
COM Surrogate的dllhost.exe高占用(磁盘)解决
python
站大爷IP1 小时前
用 Python 30 分钟做出自己的记事本
python
曲幽1 小时前
FastAPI里玩转Redis和数据库的正确姿势,别让异步任务把你坑哭了!
redis·python·mysql·fastapi·web·celery·sqlalchemy·task·backgroundtask
未知鱼1 小时前
Python安全开发之简易csrf检测工具
python·安全·csrf
何政@1 小时前
Agent Skills 完全指南:从概念到自定义实践
人工智能·python·大模型·claw·404 not found 罗
AmyLin_20011 小时前
【pdf2md-3:实现揭秘】福昕PDF SDK Python 开发实战:从逐字符提取到 LR 版面分析
开发语言·python·pdf·sdk·markdown·pdf2md
IP老炮不瞎唠1 小时前
Scrapy 高效采集:优化方案与指南
网络·爬虫·python·scrapy·安全