xinference docker 部署方式

文章目录

简绍

Xorbits Inference (Xinference) 是一个开源平台,用于简化各种 AI 模型的运行和集成。借助 Xinference,您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理,并创建强大的 AI 应用。

docker 安装方式

docker 下载对应的 xinference

docker pull xprobe/xinference

docker 运行,注意 路径改成自己的,

docker run  -d  --name xinference --gpus all  -v E:/docker/xinference/models:/root/models  -v E:/docker/xinference/.xinference:/root/.xinference -v E:/docker/xinference/.cache/huggingface:/root/.cache/huggingface -e XINFERENCE_HOME=/root/models  -p 9997:9997 xprobe/xinference:latest xinference-local -H 0.0.0.0
  • -d: 让容器在后台运行。
  • --name xinference: 为容器指定一个名称,这里是xinference。
  • --gpus all: 允许容器访问主机上的所有GPU,这对于需要进行大量计算的任务(如机器学习模型的推理)非常有用。
  • -v E:/docker/xinference/models:/root/models, -v E:/docker/xinference/.xinference:/root/.xinference, -v E:/docker/xinference/.cache/huggingface:/root/.cache/huggingface: 这些参数用于将主机的目录挂载到容器内部的特定路径,以便于数据持久化和共享。例如,第一个挂载是将主机的E:/docker/xinference/models目录映射到容器内的/root/models目录。
  • -e XINFERENCE_HOME=/root/models: 设置环境变量XINFERENCE_HOME,其值为/root/models,这可能是在容器内配置某些应用行为的方式。
  • -p 9997:9997: 将主机的9997端口映射到容器的9997端口,允许外部通过主机的该端口访问容器的服务。
  • xprobe/xinference:latest: 指定要使用的镜像和标签,这里使用的是xprobe/xinference镜像的latest版本。
  • xinference-local -H 0.0.0.0: 在容器启动时执行的命令,看起来像是以本地模式运行某个服务,并监听所有网络接口。

访问地址

http://127.0.0.1:9997/

对应官网

https://inference.readthedocs.io/zh-cn/latest/index.html

在 dify 中 添加 xinference 容器

docker dify 添加 docker 容器内ip 配置

http://host.docker.internal:9997

内置大语言模型

MODEL NAME ABILITIES COTNEXT_LENGTH DESCRIPTION
aquila2 generate 2048 Aquila2 series models are the base language models
aquila2-chat chat 2048 Aquila2-chat series models are the chat models
aquila2-chat-16k chat 16384 AquilaChat2-16k series models are the long-text chat models
baichuan-2 generate 4096 Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data.
baichuan-2-chat chat 4096 Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
c4ai-command-r-v01 chat 131072 C4AI Command-R(+) is a research release of a 35 and 104 billion parameter highly performant generative model.
code-llama generate 100000 Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code.
code-llama-instruct chat 100000 Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM.
code-llama-python generate 100000 Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python.
codegeex4 chat 131072 the open-source version of the latest CodeGeeX4 model series
codeqwen1.5 generate 65536 CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
codeqwen1.5-chat chat 65536 CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
codeshell generate 8194 CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
codeshell-chat chat 8194 CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
codestral-v0.1 generate 32768 Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash
cogagent chat, vision 4096 The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability.
cogvlm2 chat, vision 8192 CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models.
cogvlm2-video-llama3-chat chat, vision 8192 CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks.
csg-wukong-chat-v0.1 chat 32768 csg-wukong-1B is a 1 billion-parameter small language model(SLM) pretrained on 1T tokens.
deepseek generate 4096 DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
deepseek-chat chat 4096 DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
deepseek-coder generate 16384 Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.
deepseek-coder-instruct chat 16384 deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data.
deepseek-r1 chat 163840 DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
deepseek-r1-distill-llama chat 131072 deepseek-r1-distill-llama is distilled from DeepSeek-R1 based on Llama
deepseek-r1-distill-qwen chat 131072 deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen
deepseek-v2 generate 128000 DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference.
deepseek-v2-chat chat 128000 DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference.
deepseek-v2-chat-0628 chat 128000 DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat.
deepseek-v2.5 chat 128000 DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.
deepseek-v3 chat 163840 DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
deepseek-vl-chat chat, vision 4096 DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
gemma-2-it chat 8192 Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
gemma-it chat 8192 Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
glm-4v chat, vision 8192 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
glm-edge-chat chat 8192 The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.
glm-edge-v chat, vision 8192 The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.
glm4-chat chat, tools 131072 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
glm4-chat-1m chat, tools 1048576 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
gorilla-openfunctions-v2 chat 4096 OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
gpt-2 generate 1024 GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes.
internlm2-chat chat 32768 The second generation of the InternLM model, InternLM2.
internlm2.5-chat chat 32768 InternLM2.5 series of the InternLM model.
internlm2.5-chat-1m chat 262144 InternLM2.5 series of the InternLM model supports 1M long-context
internlm3-instruct chat, tools 32768 InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning.
internvl-chat chat, vision 32768 InternVL 1.5 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.
internvl2 chat, vision 32768 InternVL 2 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.
llama-2 generate 4096 Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data.
llama-2-chat chat 4096 Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting.
llama-3 generate 8192 Llama 3 is an auto-regressive language model that uses an optimized transformer architecture
llama-3-instruct chat 8192 The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..
llama-3.1 generate 131072 Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture
llama-3.1-instruct chat, tools 131072 The Llama 3.1 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..
llama-3.2-vision generate, vision 131072 The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image...
llama-3.2-vision-instruct chat, vision 131072 Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image...
llama-3.3-instruct chat, tools 131072 The Llama 3.3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..
marco-o1 chat, tools 32768 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
minicpm-2b-dpo-bf16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp32 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-bf16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-fp32 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-llama3-v-2_5 chat, vision 8192 MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters.
minicpm-v-2.6 chat, vision 32768 MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
minicpm3-4b chat 32768 MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
mistral-instruct-v0.1 chat 8192 Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting.
mistral-instruct-v0.2 chat 8192 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.
mistral-instruct-v0.3 chat 32768 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.
mistral-large-instruct chat 131072 Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
mistral-nemo-instruct chat 1024000 The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407
mistral-v0.1 generate 8192 Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks.
mixtral-8x22b-instruct-v0.1 chat 65536 The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting.
mixtral-instruct-v0.1 chat 32768 Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting.
mixtral-v0.1 generate 32768 The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
omnilmm chat, vision 2048 OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.
openhermes-2.5 chat 8192 Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data.
opt generate 2048 Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3.
orion-chat chat 4096 Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
orion-chat-rag chat 4096 Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
phi-2 generate 2048 Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.
phi-3-mini-128k-instruct chat 128000 The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
phi-3-mini-4k-instruct chat 4096 The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
platypus2-70b-instruct generate 4096 Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2.
qvq-72b-preview chat, vision 32768 QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities.
qwen-chat chat 32768 Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting.
qwen-vl-chat chat, vision 4096 Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.
qwen1.5-chat chat, tools 32768 Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
qwen1.5-moe-chat chat, tools 32768 Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.
qwen2-audio generate, audio 32768 Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
qwen2-audio-instruct chat, audio 32768 Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
qwen2-instruct chat, tools 32768 Qwen2 is the new series of Qwen large language models
qwen2-moe-instruct chat, tools 32768 Qwen2 is the new series of Qwen large language models.
qwen2-vl-instruct chat, vision 32768 Qwen2-VL: To See the World More Clearly.Qwen2-VL is the latest version of the vision language models in the Qwen model familities.
qwen2.5 generate 32768 Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
qwen2.5-coder generate 32768 Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).
qwen2.5-coder-instruct chat, tools 32768 Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).
qwen2.5-instruct chat, tools 32768 Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
qwen2.5-vl-instruct chat, vision 128000 Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities.
qwq-32b-preview chat 32768 QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.
seallm_v2 generate 8192 We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
seallm_v2.5 generate 8192 We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
skywork generate 4096 Skywork is a series of large models developed by the Kunlun Group · Skywork team.
skywork-math generate 4096 Skywork is a series of large models developed by the Kunlun Group · Skywork team.
starling-lm chat 4096 We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset
telechat chat 8192 The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus.
tiny-llama generate 2048 The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.
wizardcoder-python-v1.0 chat 100000
wizardmath-v1.0 chat 2048 WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math.
xverse generate 2048 XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology.
xverse-chat chat 2048 XVERSEB-Chat is the aligned version of model XVERSE.
yi generate 4096 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-1.5 generate 4096 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-1.5-chat chat 4096 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-1.5-chat-16k chat 16384 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-200k generate 262144 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-chat chat 4096 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-coder generate 131072 Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++.
yi-coder-chat chat 131072 Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++.
yi-vl-chat chat, vision 4096 Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.

嵌入模型

图像模型

音频模型

以下是 Xinference 中内置的音频模型列表:

重排序模型

以下是 Xinference 中内置的重排序模型列表:

视频模型

以下是 Xinference 中内置的视频模型列表:

相关推荐
纷飞梦雪28 分钟前
使用chroot预安装软件到ubuntu22中
linux·运维·ubuntu
海鸥813 小时前
查看k8s集群的资源使用情况
云原生·容器·kubernetes
云上艺旅3 小时前
K8S学习之基础十八:k8s的灰度发布和金丝雀部署
学习·云原生·容器·kubernetes
jiarg5 小时前
linux 内网下载 yum 依赖问题
linux·运维·服务器
yi个名字5 小时前
Linux第一课
linux·运维·服务器
菜鸟xy..6 小时前
linux 基本命令教程,巡查脚本,kali镜像
linux·运维·服务器·kali镜像·巡查脚本·nmtui
暴躁的小胡!!!6 小时前
Linux权限维持之协议后门(七)
linux·运维·服务器·网络·安全
安 当 加 密6 小时前
中小企业Radius认证服务器的低成本高安全解决方案
运维·服务器·安全
Macle_Chen7 小时前
Mac服务器上创建Docker并安装宝塔环境
服务器·macos·docker
ChinaRainbowSea7 小时前
Linux: Centos7 Cannot find a valid baseurl for repo: base/7/x86_64 解决方案
java·linux·运维·服务器·docker·架构