Transformers Pipeline 文本情感分类

大家好，常用huggingface的同学们都知道，pipline自动下载模型，这模型，都是从huggingface网站下载，鉴于目前被😈制裁，没有办法访问，只能通过曲线救国，通过镜像站实现模型下载。下面以文本情感分类为例讲述。

Transformers Pipeline 文本分类示例

这个项目展示了如何使用Hugging Face Transformers库的pipeline功能进行简单的文本情感分类。该示例使用了预训练的DistilBERT模型，可以快速对文本进行积极/消极情感的分类。

环境要求

Python 3.7+
transformers
torch (PyTorch)

安装说明

创建并激活虚拟环境（推荐）：

bash 复制代码

# 创建虚拟环境
python -m venv venv

# 激活虚拟环境
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

安装依赖：

bash 复制代码

pip install transformers torch

Hugging Face镜像站设置技巧

为了加速模型和tokenizer的下载，可以使用以下方法配置国内镜像：

方法一：使用环境变量（推荐）

bash 复制代码

# Linux/macOS
export HF_ENDPOINT=https://hf-mirror.com

# Windows (CMD)
set HF_ENDPOINT=https://hf-mirror.com

# Windows (PowerShell)
$env:HF_ENDPOINT = "https://hf-mirror.com"

方法二：在代码中设置

python 复制代码

from huggingface_hub import set_endpoint
set_endpoint("https://hf-mirror.com")

方法三：创建配置文件（永久生效）

bash 复制代码

# 创建配置文件
mkdir -p ~/.huggingface
echo '{"endpoint": "https://hf-mirror.com"}' > ~/.huggingface/config.json

常用的镜像站点：

官方镜像：hf-mirror.com
智源社区镜像：hf-mirror.com
北京外国语大学镜像：huggingface.modelscope.cn

代码说明

text-classification.py 文件包含了一个简单的文本分类示例：

python 复制代码

from transformers import pipeline

# 初始化文本分类pipeline
classifier = pipeline("text-classification", 
                     model="distilbert-base-uncased-finetuned-sst-2-english")

# 准备要分类的文本
text = "I love using Hugging Face Transformers! It's amazing!"

# 执行预测
results = classifier(text)

# 打印结果
print(results)

代码解析

模型说明：
- 使用的是 distilbert-base-uncased-finetuned-sst-2-english 模型
- DistilBERT 是 BERT 的轻量级版本，保持了 95% 的性能，但速度提升了 60%
- 模型在 SST-2 数据集上进行了微调，专门用于英文情感分析
Pipeline 功能：
- pipeline() 是 Transformers 库提供的高级API
- 自动处理模型加载、tokenization 和预测过程
- 支持多种NLP任务，本例中使用文本分类任务
输出格式：
- 返回一个包含 label 和 score 的列表
- label: 'POSITIVE' 或 'NEGATIVE'
- score: 0-1 之间的置信度分数

使用示例

运行代码：

bash 复制代码

python text-classification.py

预期输出：

python 复制代码

[{'label': 'POSITIVE', 'score': 0.9998}]

自定义文本：

python 复制代码

# 修改 text 变量为你想要分析的文本
text = "Your text here"

常见问题解决

模型下载速度慢
- 使用上述镜像站设置
- 确保网络连接稳定
- 可以手动下载模型文件并放置在缓存目录

CUDA相关错误

确保安装了正确版本的 PyTorch
检查 CUDA 版本兼容性

bash 复制代码

# 查看 PyTorch 是否正确使用 CUDA
python -c "import torch; print(torch.cuda.is_available())"

内存不足
- 使用更小的模型（如当前使用的 DistilBERT）
- 减小 batch_size
- 使用 CPU 版本的 PyTorch
模型加载错误
- 检查网络连接
- 清除缓存后重试：
python 复制代码
```
from transformers import pipeline
pipeline.cache_clear()
```

进阶使用

批量处理：

python 复制代码

texts = [
    "I love this!",
    "This is terrible.",
    "Not bad at all."
]
results = classifier(texts)

设置阈值：

python 复制代码

# 只显示置信度超过0.9的结果
results = classifier(text, threshold=0.9)

使用其他预训练模型：

python 复制代码

# 使用其他情感分析模型
classifier = pipeline("text-classification", 
                     model="nlptown/bert-base-multilingual-uncased-sentiment")

Transformers Pipeline 文本情感分类

Transformers Pipeline 文本分类示例

环境要求

安装说明

Hugging Face镜像站设置技巧

代码说明

代码解析

使用示例

常见问题解决

进阶使用

参考资源