开发喵星球

Ollama发布v0.1.46版本

Ollama发布v0.1.46版本

![image-20240626103105732](/Users/mac/Library/Application Support/typora-user-images/image-20240626103105732.png)

更新内容

提高模型加载速度ollama run，仅在已加载的模型时
/api/show改进了包含大型模型性能
修复--quantize变量ollama create导致的错误问题
当模型无法完全装入Linux上的系统内存时，改进了模型加载时间
Modelfile修复所有参数无法正确解析问题

运行`ollama run`速度提升

除了速度的提升。它还删除了一个这样的情况，即在调用 generateInteractive 之前，重复获取模型信息，然后在 generateInteractive 中的第一步再次获取。

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total

加快 gguf 解码速度

以前，一些复杂繁琐的操作导致加载 GGUF 文件及其元数据信息非常缓慢：

解码字符串时分配过多内存
每次读取键和值时都要访问磁盘，导致系统调用磁盘 I/O 次数过多。

现在，在 MacBook Pro M3 上，llama3 的 show API 从 800 毫秒多降至 33 毫秒。

还防止在解码 GGUF 时收集大量的数组，如果一定要有大量数组的话，它们的值为 null，并在 JSON 中进行了相应编码。

分类：玩技术作者：荡荡, 浩浩发表于：2024-06-26 10:44:26 阅读量：140

<< spring框架中，bean的生命周期 ubuntu部署chat-ollama详细教程 >>

powered by kaifamiao