音乐生成插件 YUE

官方文档

https://github.com/multimodal-art-projection/YuE

https://github.com/smthemex/ComfyUI_YuE

功能介绍

文生音乐；音乐生音乐。

安装

使用插件管理器安装 ComfyUI_YuE

下载 ckpt_00360000.pth decoder_131000.pth 和 decoder_151000.pth 并且放置到 /ComfyUI/custom-nodes/yue/ 中
下载 pytorch_model.bin 放置到 /ComfyUI/custom_nodes/ComfyUI_YuE/inference/xcodec_mini_infer/semantic_ckpts/hf_1_325000/
StageARepo 和 StageBRepo 模型在 RTX4090 以及更高的显卡上，可在运行时自动下载，其他显卡请从此处查看下载连接
- StageARepo：YuE-s1-7B-anneal-${en/zh/...}-cot/YuE-s1-7B-anneal-${en/zh/...}-icl，en 英文歌曲，zh 中文歌曲，... 还支持其他语言歌曲；cot 用于文生音乐；icl 用于参考音乐生成音乐
- StageBRepo：YuE-s2-1B-general

开始使用

提示语工程

YuE 的提示语由三部分组成：

genre tags：类型引导标签
lyrics：歌词
ref audio：参考音频

Genre Tagging Prompt

一个稳定的标记提示通常由五个部分组成：genre 流派、instrument 乐器、mood 情绪、gender 性别和 timbre 音色, 通过空格分开.
虽然我们的标签有一个开放的词汇表，但我们提供了前200个最常用的标签。建议从该列表中选择标签以获得更稳定的结果
标签的顺序是灵活的. For example, a stable genre tagging prompt might look like: "inspiring female uplifting pop airy vocal electronic bright vocal vocal."
we have introduced the "Mandarin" and "Cantonese" tags to 区分 between Mandarin 普通话 and Cantonese 粤语, as their lyrics often share similarities.

text

inspiring female uplifting pop airy vocal electronic bright vocal vocal

Lyrics Prompt

我们支持多种语言，包括但不限于英语、普通话、广东话、日语和韩语
The lyrics prompt should be divided into sessions, with structure labels (e.g., [verse] 主歌, [chorus] 副歌, [bridge] 桥段, [outro] 尾奏) prepended. Each session should be separated by 2 newline character "\n\n".
DONOT put too many words in a single segment, since each session is around 30s (--max_new_tokens 3000 by default).
We find that [intro] 前奏 label is less stable, so we recommend starting with [verse] or [chorus].
For generating music with no vocal (instrumental only),请参考下面的例子.

带人声（vocal）的例子：

text

[verse]
Staring at the sunset, colors paint the sky
Thoughts of you keep swirling, can't deny
I know I let you down, I made mistakes
But I'm here to mend the heart I didn't break

[chorus]
Every road you take, I'll be one step behind
Every dream you chase, I'm reaching for the light
You can't fight this feeling now
I won't back down
You know you can't deny it now
I won't back down

[verse]
They might say I'm foolish, chasing after you
But they don't feel this love the way we do
My heart beats only for you, can't you see?
I won't let you slip away from me

[chorus]
Every road you take, I'll be one step behind
Every dream you chase, I'm reaching for the light
You can't fight this feeling now
I won't back down
You know you can't deny it now
I won't back down

[bridge]
No, I won't back down, won't turn around
Until you're back where you belong
I'll cross the oceans wide, stand by your side
Together we are strong

[outro]
Every road you take, I'll be one step behind
Every dream you chase, love's the tie that binds
You can't fight this feeling now
I won't back down

不带人声的例子：（Using several \n to replace lyrics will get you non-vocal result，同时 For genre.txt, you should remove the tags related to vocal.）

text

[verse]




 
[chorus]




[chorus]




[outro]

Audio Prompt

Audio prompt is optional. Providing ref audio for ICL usually increase the good case rate, 但同时会减少多样性. CoT only (no ref) 会产生更多的多样性.
We find that dual-track ICL mode gives the best musicality and prompt following.
Use the chorus副歌 part of the music as prompt will result in better musicality.
Around 30s audio is recommended for ICL.

文生音乐

核心节点：YUE_Stage_A_Loader，参数如下

stage_A_repo：输入 huggingFace 上的模型ID（例如，m-a-p/YuE-s1-7B-anneal-en-cot）或者模型下载到本地的路径
use_mmgp：显存充足的情况下不需要，例如 RTX4090，显存不足需要开启，此时也需要安装 mmgp pip install mmgp
stage1_cache_size：only takes effect when running llmav2 and is not recommended to be modified. It involves caching tokens and is intended for low memory users

核心节点：YUE_Stage_A_Sampler，参数如下

genres_prompt：风格标签
lyrics_prompt：歌词
prompt_start 和 prompt_end：歌曲的时长
use_dual_tracks_prompt：如果需要参考音乐来生成音乐，设为true，开启use_dual_tracks_prompt 会借鉴'pop.00001.Instrumental.mp3'
use_audio_prompt：如果需要参考音乐来生成音乐，设为true，开启"use_audio_prompt"并关闭"use_dual_tracks_prompt" 会借鉴 "pop.00001.mp3"
run_n_segment：high will fast but easy OOM （The number of segments to process during the generation）
repetition_penalty：use default or try（repetition_penalty ranges from 1.0 to 2.0 (or higher in some cases). It controls the diversity and coherence of the audio tokens generated. The higher the value, the greater the discouragement of repetition. Setting value to 1.0 means no penalty）
rescale：please use default （Rescale output to avoid clipping）

核心节点：YUE_Stage_B_Loader

stage_A_repo：输入 huggingFace 上的模型ID（例如，m-a-p/YuE-s2-1B-general）或者模型下载到本地的路径
stage2_cache_size：only takes effect when running llmav2 and is not recommended to be modified. It involves caching tokens and is intended for low memory users
stage2_batch_size：high will fast but easy OOM

核心节点：YUE_Stage_B_Sampler

参考其他音乐生成音乐

小插曲

protobuf 版本冲突

ComfyUI-YuE 依赖 descript-audiotools 0.7.2，该版本需要 protobuf<3.20,>=3.9.2，但是一些其他的插件，比如 comfyui_hellomeme，comfyui_custom_nodes_alekpet 以及 comfyui_pulid_flux_ll 的版本对 protobuf 的要求如下：

text

mediapipe 0.10.20 requires protobuf<5,>=4.25.3, but you have protobuf 3.19.6 which is incompatible.
onnx 1.17.0 requires protobuf>=3.20.2, but you have protobuf 3.19.6 which is incompatible.
streamlit 1.42.2 requires protobuf<6,>=3.20, but you have protobuf 3.19.6 which is incompatible.
tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible.

安装 ComfyUI-YuE 后会使上述的三个插件无法使用。故在安装 ComfyUI-YuE 之后，重新安装了 protobuf。

text

.\python_embeded\python.exe -m pip install protobuf==4.25.3

这样安装后会报错如下：

text

descript-audiotools 0.7.2 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.25.3 which is incompatible.

但是实际测试下来没有啥影响。

文章的最后，如果您觉得本文对您有用，请打赏一杯咖啡！感谢！

音乐生成插件 YUE ​

功能介绍 ​

安装 ​

开始使用 ​

提示语工程 ​

Genre Tagging Prompt ​

Lyrics Prompt ​

Audio Prompt ​

文生音乐 ​

参考其他音乐生成音乐 ​

小插曲 ​

protobuf 版本冲突 ​

音乐生成插件 YUE

功能介绍

安装

开始使用

提示语工程

Genre Tagging Prompt

Lyrics Prompt

Audio Prompt

文生音乐

参考其他音乐生成音乐

小插曲

protobuf 版本冲突