git clone https://github.com/ictnlp/LLaMA-Omni
cd LLaMA-Omni
conda create -n llama-omni python=3.10
conda activate llama-omni
pip install pip==24.0
pip install -e .
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install -e . --no-build-isolation
Llama-3.1-8B-Omni
从 Huggingface下载模型。Whisper-large-v3
模型。import whisper
model = whisper.load_model("large-v3", download_root="models/speech_encoder/")
wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000 -P vocoder/
wget https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/config.json -P vocoder/
python -m omni_speech.serve.controller --host 0.0.0.0 --port 10000
python -m omni_speech.serve.gradio_web_server --controller http://localhost:10000 --port 8000 --model-list-mode reload --vocoder vocoder/g_00500000 --vocoder-cfg vocoder/config.json
python -m omni_speech.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path Llama-3.1-8B-Omni --model-name Llama-3.1-8B-Omni --s2s
访问http://localhost:8000/并与 LLaMA-3.1-8B-Omni 互动!
注意:由于 Gradio 中的流式音频播放不稳定,仅实现了流式音频合成,而未启用自动播放。
powered by kaifamiao