GitHub上Transformers项目中推理函数pipeline的使用

Transformers是文本、计算机视觉、音频、视频和多模态模型(text, computer vision, audio, video, and multimodal model)领域最先进的机器学习模型的模型定义框架(model-definition framework),可用于推理和训练。源码:https://github.com/huggingface/transformers,最新发布版本v4.57.1,license为Apache-2.0。

Transformers集中了模型定义,以便整个生态系统都认可该定义。Transformers是跨框架的枢纽:如果支持某个模型定义,它将与大多数训练框架(Axolotl、Unsloth、DeepSpeed、FSDP、PyTorch-Lightning等)、推理引擎(vLLM、SGLang、TGI等)以及相关建模库(llama.cpp、mlx等)兼容,这些库都利用了Transformers 的模型定义。

Transformers优点

1.支持三个热门的深度学习库:Jax, PyTorch以及TensorFlow,并与之无缝整合

2.模型易于使用,统一的API可用于所有的预训练模型

3.共享已训练好的模型,无需从头开始训练

4.只需3行代码即可训练模型

5.模型文件可以独立于库使用

Transformers中的Pipeline API是一个高级推理类,支持文本、音频、视觉和多模态任务。它负责预处理输入并返回相应的输出。

函数pipeline的声明在src/transformers/pipelines/init.py文件中,声明如下:返回Pipeline类

python 复制代码
def pipeline(
    task: Optional[str] = None,
    model: Optional[Union[str, "PreTrainedModel", "TFPreTrainedModel"]] = None,
    config: Optional[Union[str, PretrainedConfig]] = None,
    tokenizer: Optional[Union[str, PreTrainedTokenizer, "PreTrainedTokenizerFast"]] = None,
    feature_extractor: Optional[Union[str, PreTrainedFeatureExtractor]] = None,
    image_processor: Optional[Union[str, BaseImageProcessor]] = None,
    processor: Optional[Union[str, ProcessorMixin]] = None,
    framework: Optional[str] = None,
    revision: Optional[str] = None,
    use_fast: bool = True,
    token: Optional[Union[str, bool]] = None,
    device: Optional[Union[int, str, "torch.device"]] = None,
    device_map: Optional[Union[str, dict[str, Union[int, str]]]] = None,
    dtype: Optional[Union[str, "torch.dtype"]] = "auto",
    trust_remote_code: Optional[bool] = None,
    model_kwargs: Optional[dict[str, Any]] = None,
    pipeline_class: Optional[Any] = None,
    **kwargs: Any,
) -> Pipeline

当前支持的task包括:除task参数必须指定外,其它大部分参数都有默认值

python 复制代码
"audio-classification": will return a [AudioClassificationPipeline]
"automatic-speech-recognition": will return a [AutomaticSpeechRecognitionPipeline]
"depth-estimation": will return a [DepthEstimationPipeline]
"document-question-answering": will return a [DocumentQuestionAnsweringPipeline]
"feature-extraction": will return a [FeatureExtractionPipeline]
"fill-mask": will return a [FillMaskPipeline]
"image-classification": will return a [ImageClassificationPipeline]
"image-feature-extraction": will return an [ImageFeatureExtractionPipeline]
"image-segmentation": will return a [ImageSegmentationPipeline]
"image-text-to-text": will return a [ImageTextToTextPipeline]
"image-to-image": will return a [ImageToImagePipeline]
"image-to-text": will return a [ImageToTextPipeline]
"keypoint-matching": will return a [KeypointMatchingPipeline]
"mask-generation": will return a [MaskGenerationPipeline]
"object-detection": will return a [ObjectDetectionPipeline]
"question-answering": will return a [QuestionAnsweringPipeline]
"summarization": will return a [SummarizationPipeline]
"table-question-answering": will return a [TableQuestionAnsweringPipeline]
"text2text-generation": will return a [Text2TextGenerationPipeline]
"text-classification" (alias "sentiment-analysis" available): will return a [TextClassificationPipeline]
"text-generation": will return a [TextGenerationPipeline]
"text-to-audio" (alias "text-to-speech" available): will return a [TextToAudioPipeline]
"token-classification" (alias "ner" available): will return a [TokenClassificationPipeline]
"translation": will return a [TranslationPipeline]
"translation_xx_to_yy": will return a [TranslationPipeline]
"video-classification": will return a [VideoClassificationPipeline]
"visual-question-answering": will return a [VisualQuestionAnsweringPipeline]
"zero-shot-classification": will return a [ZeroShotClassificationPipeline]
"zero-shot-image-classification": will return a [ZeroShotImageClassificationPipeline]
"zero-shot-audio-classification": will return a [ZeroShotAudioClassificationPipeline]
"zero-shot-object-detection": will return a [ZeroShotObjectDetectionPipeline]

通过Anaconda创建虚拟环境,依次执行如下命令:

bash 复制代码
conda create --name ollama python=3.10 -y
conda activate ollama
pip install ollama
pip install colorama chromadb tqdm
pip install sentence-transformers
pip install langchain==1.0.0 langchain-huggingface langchain-chroma langchain-ollama jq
pip install  torch==2.6.0 transformers==4.57.0 torchvision==0.21.0 opencv-python==4.10.0.84 timm sentencepiece scikit-image

以下为测试代码:

python 复制代码
from transformers import pipeline
from transformers.utils import logging
import argparse
import colorama
import torch
from pathlib import Path

def parse_args():
	parser = argparse.ArgumentParser(description="transformers test")
	parser.add_argument("--task", required=True, type=str, choices=["document-question-answering", "image-classification", "image-feature-extraction", "image-segmentation", "object-detection"], help="specify what kind of task")
	parser.add_argument("--model", type=str, help="model name, for example: naver-clova-ix/donut-base-finetuned-docvqa")
	parser.add_argument("--file_name", type=str, help="image or pdf file name")
	parser.add_argument("--text", type=str, help="text")

	args = parser.parse_args()
	return args

def document_question_answering(model, file_name, text):
	pipe = pipeline("document-question-answering", model=model)
	result = pipe(image=file_name, question=text)
	if result is not None and isinstance(result, list) and len(result) > 0:
		print(f'model name: {model}; file name {Path(file_name).name}; answer: {result[0]["answer"]}')

def image_classification(model, file_name):
	pipe = pipeline("image-classification", model=model)
	result = pipe(file_name)
	if result is not None and isinstance(result, list) and len(result) > 0:
		print(f'model name: {model}; file name: {Path(file_name).name}; label: {result[0]["label"]}; score: {result[0]["score"]:.4f}')

def image_feature_extraction(model, file_name):
	pipe = pipeline("image-feature-extraction", model=model)
	result = pipe(file_name)
	if result is not None and isinstance(result, list) and len(result) > 0:
		features = torch.tensor(result[0]).mean(dim=0)
		print(f"features length: {len(features)}; features[0:5]: {features[0:5]}")

def image_segmentation(model, file_name):
	pipe = pipeline("image-segmentation", model=model, trust_remote_code=True)
	result = pipe(file_name)
	if result is not None:
		result.save("result_image_segmentation.png")

def object_detection(model, file_name):
	pipe = pipeline("object-detection", model=model)
	result = pipe(file_name)
	if result is not None and isinstance(result, list) and len(result) > 0:
		print(f'label: {result[0]["label"]}, score: {result[0]["score"]:.4f}, box: {result[0]["box"]}')

if __name__ == "__main__":
	colorama.init(autoreset=True)
	args = parse_args()

	logging.set_verbosity_error()

	if args.task == "document-question-answering":
		document_question_answering(args.model, args.file_name, args.text)
	elif args.task == "image-classification":
		image_classification(args.model, args.file_name)
	elif args.task == "image-feature-extraction":
		image_feature_extraction(args.model, args.file_name)
	elif args.task == "image-segmentation":
		image_segmentation(args.model, args.file_name)
	elif args.task == "object-detection":
		object_detection(args.model, args.file_name)

	print(colorama.Fore.GREEN + "====== execution completed ======")

测试代码中包含的task:document-question-answering(naver-clova-ix/donut-base-finetuned-docvqa)、image-classification(timm/mobilenetv3_small_100.lamb_in1k)、image-feature-extraction(facebook/dinov2-base)、image-segmentation(briaai/RMBG-1.4)、object-detection(hustvl/yolos-small)

(1).并不是Hugging Face上的所有模型都支持pipeline。

(2).除了使用pipeline外,每个task或model都可以调用各自内部类,如:from transformers import YolosFeatureExtractor, YolosForObjectDetection。

部分执行结果如下图所示:

GitHubhttps://github.com/fengbingchun/NN_Test

相关推荐
Stara051112 天前
DeepSeek-OCR私有化部署—从零构建OCR服务环境
计算机视觉·docker·ocr·transformers·vllm·deepseek·光学符号识别
linmoo19864 个月前
Spring AI 系列之十五 - RAG-ETL之二
人工智能·spring·etl·transformers·rag·springai
shao9185167 个月前
Gradio全解20——Streaming:流式传输的多媒体应用(3)——实时语音识别技术
人工智能·ffmpeg·语音识别·transformers·gradio·asr
Jackilina_Stone8 个月前
transformers:打造的先进的自然语言处理
人工智能·自然语言处理·transformers
ai_lian_shuo9 个月前
Hugging Face的Transformers核心模块:Pipelines(参数说明,各种模型类型调用案例)
人工智能·pipeline·transformers·hugging face
Yongqiang Cheng10 个月前
Hugging Face Transformers and Meta Llama
transformers·hugging face·meta llama
诸神缄默不语10 个月前
Re78 读论文:GPT-4 Technical Report
chatgpt·llm·论文·openai·transformers·大规模预训练语言模型·gpt-4
养一只Trapped_beast10 个月前
pip install transformers教程
pip·transformers
NLP工程化1 年前
LLM模型的generate和chat函数区别
llm·chat·transformers·generate