Ollama如何正确导入模型

今天我们一起来看如何给自己的本地ollama导入GGUF、PyTorch或Safetensors模型。

导入 GGUF

首先创建一个Modelfile。Modelfile文件昨天已经讲过是模型的模版，指定权重、参数、提示模板等。

FROM ./mistral-7b-v0.1.Q4_0.gguf

许多聊天模型需要一个提示模板，以便正确回答。可以使用Modelfile中的template指令指定默认提示模板

FROM ./mistral-7b-v0.1.Q4_0.gguf
TEMPLATE "[INST] {{ .Prompt }} [/INST]"

从Modelfile中创建一个模型

ollama create example -f Modelfile

第二步：运行你新建的模型

接下来，用ollama run测试模型

ollama run example "What is your favourite condiment?"

从PyTorch和Safetensors导入比从GGUF导入要长。为了使它们更容易导入和使用，官方正在进行改进，后续推出我也会第一时间推荐给大家。

git clone git@github.com:ollama/ollama.git ollama
cd ollama

然后获取它的llama.cpp子模块

git submodule init
git submodule update llm/llama.cpp

python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt

然后构建 quantize 工具

make -C llm/llama.cpp quantize

如果模型当前托管在HuggingFace存储库中，首先克隆该存储库以下载原始模型。安装Git LFS，确认它已经安装，然后克隆模型的存储库

git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model

注意：某些模型架构需要使用特定的转换脚本。例如，Qwen模型需要运行convert-hf-to-gguf.py而不是convert.py。

python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin

llm/llama.cpp/quantize converted.bin quantized.bin q4_0

在有了quantize模型之后就可以按照上面的步骤正常的去构建了。

FROM quantized.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"

ollama create example -f Modelfile

ollama run example "What is your favourite condiment?"