开发喵星球

Ollama发布v0.1.46版本

Ollama发布v0.1.46版本

​ ![image-20240626103105732](/Users/mac/Library/Application Support/typora-user-images/image-20240626103105732.png)

更新内容

运行ollama run速度提升

除了速度的提升。它还删除了一个这样的情况,即在调用 generateInteractive 之前,重复获取模型信息,然后在 generateInteractive 中的第一步再次获取。

This positively impacts the performance of the command:

    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
    ; time ./before run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?

    ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
    ; time ./after run llama3 'hi'
    Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?

    ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total

加快 gguf 解码速度

以前,一些复杂繁琐的操作导致加载 GGUF 文件及其元数据信息非常缓慢:

现在,在 MacBook Pro M3 上,llama3 的 show API 从 800 毫秒多降至 33 毫秒。

还防止在解码 GGUF 时收集大量的数组,如果一定要有大量数组的话,它们的值为 null,并在 JSON 中进行了相应编码。

   
分类:玩技术 作者:荡荡, 浩浩 发表于:2024-06-26 10:44:26 阅读量:81
<<   >>


powered by kaifamiao