OSError: Can‘t load tokenizer for ‘bert-base-uncased‘.

CoderXiu2024-10-29 8:08

一、具体报错：

报错如下：
OSError: Can't load tokenizer for 'bert-base-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bert-base-uncased' is the correct path to a dir

二、报错原因：

模型调用bert时，由于huggingface有墙导致模型无法下载

三、解决方法：

1、通过镜像站下载

直接命令行通过huggingface镜像运行脚本，从而下载对应bert模型权重
HF_ENDPOINT=https://hf-mirror.com python 模型脚本.py

与之相同，通过设置环境变量也可以，以linux为例
export HF_ENDPOINT=https://hf-mirror.com

2、直接下载权重

地址：https://huggingface.co/google-bert/bert-base-uncased/tree/main
镜像地址 ：https://hf-mirror.com/google-bert/bert-base-uncased/tree/main
下载内容：

xml 复制代码

config.json
pytorch_model.bin
tokenizer.json
tokenizer_config.json
vocab.txt

将对应文件放入一个文件夹内，如bert-base-uncased;查看报错所在的文件具体位置，以及对应引用模型的位置

如
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

将from_pretrained()中的内容换成模型所在文件夹的路径，建议填写绝对路径
tokenizer = BertTokenizer.from_pretrained('./bert-base-uncased')