开发喵星球

Xinference模型详解

Xinference模型详解

模型列表

你可以列出 Xinference 中所有可以启动的、某种类型的模型:

xinference registrations --model-type <MODEL_TYPE> \
                         [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

Xinference 支持以下 MODEL_TYPE

LLM

文本生成模型或大语言模型,以下是 Xinference 中内置的 LLM 列表:

MODEL NAME ABILITIES COTNEXT_LENGTH DESCRIPTION
aquila2 generate 2048 Aquila2 series models are the base language models
aquila2-chat chat 2048 Aquila2-chat series models are the chat models
aquila2-chat-16k chat 16384 AquilaChat2-16k series models are the long-text chat models
baichuan generate 4096 Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data.
baichuan-2 generate 4096 Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data.
baichuan-2-chat chat 4096 Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
baichuan-chat chat 4096 Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
c4ai-command-r-v01 chat 131072 C4AI Command-R(+) is a research release of a 35 and 104 billion parameter highly performant generative model.
c4ai-command-r-v01-4bit generate 131072 This model is 4bit quantized version of C4AI Command-R using bitsandbytes.
chatglm chat 2048 ChatGLM is an open-source General Language Model (GLM) based LLM trained on both Chinese and English data.
chatglm2 chat 8192 ChatGLM2 is the second generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm2-32k chat 32768 ChatGLM2-32k is a special version of ChatGLM2, with a context window of 32k tokens instead of 8k.
chatglm3 chat, tools 8192 ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm3-128k chat 131072 ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm3-32k chat 32768 ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
code-llama generate 100000 Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code.
code-llama-instruct chat 100000 Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM.
code-llama-python generate 100000 Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python.
codeqwen1.5 generate 65536 CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
codeqwen1.5-chat chat 65536 CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
codeshell generate 8194 CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
codeshell-chat chat 8194 CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
codestral-v0.1 generate 32768 Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash
cogvlm2 chat, vision 8192 CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models.
deepseek generate 4096 DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
deepseek-chat chat 4096 DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
deepseek-coder generate 16384 Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.
deepseek-coder-instruct chat 16384 deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data.
deepseek-vl-chat chat, vision 4096 DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
falcon generate 2048 Falcon is an open-source Transformer based LLM trained on the RefinedWeb dataset.
falcon-instruct chat 2048 Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing in chatting.
gemma-it chat 8192 Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
glaive-coder chat 16384 A code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.
glm-4v chat, vision 8192 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
glm4-chat chat, tools 131072 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
glm4-chat-1m chat, tools 1048576 GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
gorilla-openfunctions-v1 chat 4096 OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
gorilla-openfunctions-v2 chat 4096 OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
gpt-2 generate 1024 GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes.
internlm-20b generate 16384 Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.
internlm-7b generate 8192 InternLM is a Transformer-based LLM that is trained on both Chinese and English data, focusing on practical scenarios.
internlm-chat-20b chat 16384 Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training.
internlm-chat-7b chat 4096 Internlm-chat is a fine-tuned version of the Internlm LLM, specializing in chatting.
internlm2-chat chat 204800 The second generation of the InternLM model, InternLM2.
internvl-chat chat, vision 32768 InternVL 1.5 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.
llama-2 generate 4096 Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data.
llama-2-chat chat 4096 Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting.
llama-3 generate 8192 Llama 3 is an auto-regressive language model that uses an optimized transformer architecture
llama-3-instruct chat 8192 The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..
minicpm-2b-dpo-bf16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp32 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-bf16 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-fp32 chat 4096 MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-llama3-v-2_5 chat, vision 2048 MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters.
mistral-instruct-v0.1 chat 8192 Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting.
mistral-instruct-v0.2 chat 8192 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.
mistral-instruct-v0.3 chat 32768 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.
mistral-v0.1 generate 8192 Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks.
mixtral-8x22b-instruct-v0.1 chat 65536 The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting.
mixtral-instruct-v0.1 chat 32768 Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting.
mixtral-v0.1 generate 32768 The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
omnilmm chat, vision 2048 OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.
openbuddy chat 2048 OpenBuddy is a powerful open multilingual chatbot model aimed at global users.
openhermes-2.5 chat 8192 Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data.
opt generate 2048 Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3.
orca chat 2048 Orca is an LLM trained by fine-tuning LLaMA on explanation traces obtained from GPT-4.
orion-chat chat 4096 Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
orion-chat-rag chat 4096 Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
phi-2 generate 2048 Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.
phi-3-mini-128k-instruct chat 128000 The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
phi-3-mini-4k-instruct chat 4096 The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
platypus2-70b-instruct generate 4096 Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2.
qwen-chat chat, tools 32768 Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting.
qwen-vl-chat chat, vision 4096 Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.
qwen1.5-chat chat, tools 32768 Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
qwen1.5-moe-chat chat, tools 32768 Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.
qwen2-instruct chat, tools 32768 Qwen2 is the new series of Qwen large language models
qwen2-moe-instruct chat, tools 32768 Qwen2 is the new series of Qwen large language models.
seallm_v2 generate 8192 We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
seallm_v2.5 generate 8192 We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
skywork generate 4096 Skywork is a series of large models developed by the Kunlun Group · Skywork team.
skywork-math generate 4096 Skywork is a series of large models developed by the Kunlun Group · Skywork team.
starchat-beta chat 8192 Starchat-beta is a fine-tuned version of the Starcoderplus LLM, specializing in coding assistance.
starcoder generate 8192 Starcoder is an open-source Transformer based LLM that is trained on permissively licensed data from GitHub.
starcoderplus generate 8192 Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets.
starling-lm chat 4096 We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset
telechat chat 8192 The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus.
tiny-llama generate 2048 The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.
vicuna-v1.3 chat 2048 Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.
vicuna-v1.5 chat 4096 Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.
vicuna-v1.5-16k chat 16384 Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context window of 16k tokens instead of 4k.
wizardcoder-python-v1.0 chat 100000
wizardlm-v1.0 chat 2048 WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-Instruct.
wizardmath-v1.0 chat 2048 WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math.
xverse generate 2048 XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology.
xverse-chat chat 2048 XVERSEB-Chat is the aligned version of model XVERSE.
yi generate 4096 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-1.5 generate 4096 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-1.5-chat chat 4096 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-1.5-chat-16k chat 16384 Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-200k generate 262144 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-chat chat 4096 The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-vl-chat chat, vision 4096 Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.
zephyr-7b-alpha chat 8192 Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1.
zephyr-7b-beta chat 8192 Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1

Embedding

文本嵌入模型,以下是 Xinference 中内置的嵌入模型列表:

Image

图像生成或处理模型,以下是Xinference中内置的图像模型列表

Audio

音频模型,以下是 Xinference 中内置的音频模型列表:

启动和停止模型

每个运行的模型实例将被分配一个唯一的模型uid。默认情况下,模型uid等于模型名。这个 ID 是后续使用模型实例的句柄,启动命令 --model-uid 选项可以手动指定它。

你可以通过命令行或者 Xinference 的 Python 客户端来启动一个模型。

xinference launch --model-name <MODEL_NAME> \
                  [--model-engine <MODEL_ENGINE>] \
                  [--model-type <MODEL_TYPE>] \
                  [--model-uid <MODEL_UID>] \
                  [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

对于模型类型 LLM,启动模型不仅需要指定模型名称,还需要参数的大小、模型格式以及模型引擎。

以下命令可以列出 Xinference 中正在运行的模型:

xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]

当你不再需要当前正在运行的模型时,以下列方式释放其占用的资源:

xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
   
分类:玩技术 作者:荡荡, 浩浩 发表于:2024-09-11 15:02:45 阅读量:123
  >>


powered by kaifamiao