你可以列出 Xinference 中所有可以启动的、某种类型的模型:
xinference registrations --model-type <MODEL_TYPE> \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
Xinference 支持以下 MODEL_TYPE
:
文本生成模型或大语言模型,以下是 Xinference 中内置的 LLM 列表:
MODEL NAME | ABILITIES | COTNEXT_LENGTH | DESCRIPTION |
---|---|---|---|
aquila2 | generate | 2048 | Aquila2 series models are the base language models |
aquila2-chat | chat | 2048 | Aquila2-chat series models are the chat models |
aquila2-chat-16k | chat | 16384 | AquilaChat2-16k series models are the long-text chat models |
baichuan | generate | 4096 | Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data. |
baichuan-2 | generate | 4096 | Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. |
baichuan-2-chat | chat | 4096 | Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. |
baichuan-chat | chat | 4096 | Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. |
c4ai-command-r-v01 | chat | 131072 | C4AI Command-R(+) is a research release of a 35 and 104 billion parameter highly performant generative model. |
c4ai-command-r-v01-4bit | generate | 131072 | This model is 4bit quantized version of C4AI Command-R using bitsandbytes. |
chatglm | chat | 2048 | ChatGLM is an open-source General Language Model (GLM) based LLM trained on both Chinese and English data. |
chatglm2 | chat | 8192 | ChatGLM2 is the second generation of ChatGLM, still open-source and trained on Chinese and English data. |
chatglm2-32k | chat | 32768 | ChatGLM2-32k is a special version of ChatGLM2, with a context window of 32k tokens instead of 8k. |
chatglm3 | chat, tools | 8192 | ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. |
chatglm3-128k | chat | 131072 | ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. |
chatglm3-32k | chat | 32768 | ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. |
code-llama | generate | 100000 | Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. |
code-llama-instruct | chat | 100000 | Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. |
code-llama-python | generate | 100000 | Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. |
codeqwen1.5 | generate | 65536 | CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
codeqwen1.5-chat | chat | 65536 | CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
codeshell | generate | 8194 | CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
codeshell-chat | chat | 8194 | CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
codestral-v0.1 | generate | 32768 | Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash |
cogvlm2 | chat, vision | 8192 | CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models. |
deepseek | generate | 4096 | DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
deepseek-chat | chat | 4096 | DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
deepseek-coder | generate | 16384 | Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. |
deepseek-coder-instruct | chat | 16384 | deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. |
deepseek-vl-chat | chat, vision | 4096 | DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. |
falcon | generate | 2048 | Falcon is an open-source Transformer based LLM trained on the RefinedWeb dataset. |
falcon-instruct | chat | 2048 | Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing in chatting. |
gemma-it | chat | 8192 | Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
glaive-coder | chat | 16384 | A code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform. |
glm-4v | chat, vision | 8192 | GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
glm4-chat | chat, tools | 131072 | GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
glm4-chat-1m | chat, tools | 1048576 | GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
gorilla-openfunctions-v1 | chat | 4096 | OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. |
gorilla-openfunctions-v2 | chat | 4096 | OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. |
gpt-2 | generate | 1024 | GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. |
internlm-20b | generate | 16384 | Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. |
internlm-7b | generate | 8192 | InternLM is a Transformer-based LLM that is trained on both Chinese and English data, focusing on practical scenarios. |
internlm-chat-20b | chat | 16384 | Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training. |
internlm-chat-7b | chat | 4096 | Internlm-chat is a fine-tuned version of the Internlm LLM, specializing in chatting. |
internlm2-chat | chat | 204800 | The second generation of the InternLM model, InternLM2. |
internvl-chat | chat, vision | 32768 | InternVL 1.5 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
llama-2 | generate | 4096 | Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. |
llama-2-chat | chat | 4096 | Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. |
llama-3 | generate | 8192 | Llama 3 is an auto-regressive language model that uses an optimized transformer architecture |
llama-3-instruct | chat | 8192 | The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
minicpm-2b-dpo-bf16 | chat | 4096 | MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
minicpm-2b-dpo-fp16 | chat | 4096 | MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
minicpm-2b-dpo-fp32 | chat | 4096 | MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
minicpm-2b-sft-bf16 | chat | 4096 | MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
minicpm-2b-sft-fp32 | chat | 4096 | MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
minicpm-llama3-v-2_5 | chat, vision | 2048 | MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. |
mistral-instruct-v0.1 | chat | 8192 | Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. |
mistral-instruct-v0.2 | chat | 8192 | The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
mistral-instruct-v0.3 | chat | 32768 | The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
mistral-v0.1 | generate | 8192 | Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. |
mixtral-8x22b-instruct-v0.1 | chat | 65536 | The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. |
mixtral-instruct-v0.1 | chat | 32768 | Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. |
mixtral-v0.1 | generate | 32768 | The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
omnilmm | chat, vision | 2048 | OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. |
openbuddy | chat | 2048 | OpenBuddy is a powerful open multilingual chatbot model aimed at global users. |
openhermes-2.5 | chat | 8192 | Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. |
opt | generate | 2048 | Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. |
orca | chat | 2048 | Orca is an LLM trained by fine-tuning LLaMA on explanation traces obtained from GPT-4. |
orion-chat | chat | 4096 | Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
orion-chat-rag | chat | 4096 | Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
phi-2 | generate | 2048 | Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. |
phi-3-mini-128k-instruct | chat | 128000 | The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
phi-3-mini-4k-instruct | chat | 4096 | The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
platypus2-70b-instruct | generate | 4096 | Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2. |
qwen-chat | chat, tools | 32768 | Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. |
qwen-vl-chat | chat, vision | 4096 | Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities. |
qwen1.5-chat | chat, tools | 32768 | Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. |
qwen1.5-moe-chat | chat, tools | 32768 | Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. |
qwen2-instruct | chat, tools | 32768 | Qwen2 is the new series of Qwen large language models |
qwen2-moe-instruct | chat, tools | 32768 | Qwen2 is the new series of Qwen large language models. |
seallm_v2 | generate | 8192 | We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
seallm_v2.5 | generate | 8192 | We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
skywork | generate | 4096 | Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
skywork-math | generate | 4096 | Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
starchat-beta | chat | 8192 | Starchat-beta is a fine-tuned version of the Starcoderplus LLM, specializing in coding assistance. |
starcoder | generate | 8192 | Starcoder is an open-source Transformer based LLM that is trained on permissively licensed data from GitHub. |
starcoderplus | generate | 8192 | Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets. |
starling-lm | chat | 4096 | We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset |
telechat | chat | 8192 | The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus. |
tiny-llama | generate | 2048 | The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. |
vicuna-v1.3 | chat | 2048 | Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT. |
vicuna-v1.5 | chat | 4096 | Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT. |
vicuna-v1.5-16k | chat | 16384 | Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context window of 16k tokens instead of 4k. |
wizardcoder-python-v1.0 | chat | 100000 | |
wizardlm-v1.0 | chat | 2048 | WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-Instruct. |
wizardmath-v1.0 | chat | 2048 | WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. |
xverse | generate | 2048 | XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. |
xverse-chat | chat | 2048 | XVERSEB-Chat is the aligned version of model XVERSE. |
yi | generate | 4096 | The Yi series models are large language models trained from scratch by developers at 01.AI. |
yi-1.5 | generate | 4096 | Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
yi-1.5-chat | chat | 4096 | Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
yi-1.5-chat-16k | chat | 16384 | Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
yi-200k | generate | 262144 | The Yi series models are large language models trained from scratch by developers at 01.AI. |
yi-chat | chat | 4096 | The Yi series models are large language models trained from scratch by developers at 01.AI. |
yi-vl-chat | chat, vision | 4096 | Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. |
zephyr-7b-alpha | chat | 8192 | Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1. |
zephyr-7b-beta | chat | 8192 | Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 |
文本嵌入模型,以下是 Xinference 中内置的嵌入模型列表:
图像生成或处理模型,以下是Xinference中内置的图像模型列表
音频模型,以下是 Xinference 中内置的音频模型列表:
每个运行的模型实例将被分配一个唯一的模型uid。默认情况下,模型uid等于模型名。这个 ID 是后续使用模型实例的句柄,启动命令 --model-uid
选项可以手动指定它。
你可以通过命令行或者 Xinference 的 Python 客户端来启动一个模型。
xinference launch --model-name <MODEL_NAME> \
[--model-engine <MODEL_ENGINE>] \
[--model-type <MODEL_TYPE>] \
[--model-uid <MODEL_UID>] \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
对于模型类型 LLM
,启动模型不仅需要指定模型名称,还需要参数的大小、模型格式以及模型引擎。
以下命令可以列出 Xinference 中正在运行的模型:
xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
当你不再需要当前正在运行的模型时,以下列方式释放其占用的资源:
xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
powered by kaifamiao