LLaMA-Omni使用教程

安装

克隆此存储库

git clone https://github.com/ictnlp/LLaMA-Omni
cd LLaMA-Omni

安装软件包

conda create -n llama-omni python=3.10
conda activate llama-omni
pip install pip==24.0
pip install -e .

安装fairseq

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install -e . --no-build-isolation

快速入门

Llama-3.1-8B-Omni从 Huggingface下载模型。
下载Whisper-large-v3模型。

import whisper
model = whisper.load_model("large-v3", download_root="models/speech_encoder/")

下载基于单元的 HiFi-GAN 声码器。

wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000 -P vocoder/
wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json -P vocoder/

Gradio 演示

启动控制器

python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000

启动 gradio 网络服务器。

python -m omni_speech.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --model-list-mode reload --vocoder vocoder/g_00500000 --vocoder-cfg vocoder/config.json

启动模型服务

python -m omni_speech.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path Llama-3.1-8B-Omni --model-name Llama-3.1-8B-Omni --s2s

访问http://localhost:8000/并与 LLaMA-3.1-8B-Omni 互动！

注意：由于 Gradio 中的流式音频播放不稳定，仅实现了流式音频合成，而未启用自动播放。