Skip to content

音乐生成插件 YUE

功能介绍

文生音乐;音乐生音乐。

安装

使用插件管理器安装 ComfyUI_YuE

开始使用

提示语工程

YuE 的提示语由三部分组成:

  • genre tags:类型引导标签
  • lyrics:歌词
  • ref audio:参考音频

Genre Tagging Prompt

  • 一个稳定的标记提示通常由五个部分组成:genre 流派、instrument 乐器、mood 情绪、gender 性别和 timbre 音色, 通过空格分开.
  • 虽然我们的标签有一个开放的词汇表,但我们提供了前200个最常用的 标签。建议从该列表中选择标签以获得更稳定的结果
  • 标签的顺序是灵活的. For example, a stable genre tagging prompt might look like: "inspiring female uplifting pop airy vocal electronic bright vocal vocal."
  • we have introduced the "Mandarin" and "Cantonese" tags to 区分 between Mandarin 普通话 and Cantonese 粤语, as their lyrics often share similarities.
text
inspiring female uplifting pop airy vocal electronic bright vocal vocal

Lyrics Prompt

  • 我们支持多种语言,包括但不限于英语、普通话、广东话、日语和韩语
  • The lyrics prompt should be divided into sessions, with structure labels (e.g., [verse] 主歌, [chorus] 副歌, [bridge] 桥段, [outro] 尾奏) prepended. Each session should be separated by 2 newline character "\n\n".
  • DONOT put too many words in a single segment, since each session is around 30s (--max_new_tokens 3000 by default).
  • We find that [intro] 前奏 label is less stable, so we recommend starting with [verse] or [chorus].
  • For generating music with no vocal (instrumental only),请参考下面的例子.

带人声(vocal)的例子:

text
[verse]
Staring at the sunset, colors paint the sky
Thoughts of you keep swirling, can't deny
I know I let you down, I made mistakes
But I'm here to mend the heart I didn't break

[chorus]
Every road you take, I'll be one step behind
Every dream you chase, I'm reaching for the light
You can't fight this feeling now
I won't back down
You know you can't deny it now
I won't back down

[verse]
They might say I'm foolish, chasing after you
But they don't feel this love the way we do
My heart beats only for you, can't you see?
I won't let you slip away from me

[chorus]
Every road you take, I'll be one step behind
Every dream you chase, I'm reaching for the light
You can't fight this feeling now
I won't back down
You know you can't deny it now
I won't back down

[bridge]
No, I won't back down, won't turn around
Until you're back where you belong
I'll cross the oceans wide, stand by your side
Together we are strong

[outro]
Every road you take, I'll be one step behind
Every dream you chase, love's the tie that binds
You can't fight this feeling now
I won't back down

不带人声的例子:(Using several \n to replace lyrics will get you non-vocal result,同时 For genre.txt, you should remove the tags related to vocal.)

text
[verse]




 
[chorus]




[chorus]




[outro]

Audio Prompt

  • Audio prompt is optional. Providing ref audio for ICL usually increase the good case rate, 但同时会减少多样性. CoT only (no ref) 会产生更多的多样性.
  • We find that dual-track ICL mode gives the best musicality and prompt following.
  • Use the chorus副歌 part of the music as prompt will result in better musicality.
  • Around 30s audio is recommended for ICL.

文生音乐

img.png

核心节点:YUE_Stage_A_Loader,参数如下

  • stage_A_repo:输入 huggingFace 上的模型ID(例如,m-a-p/YuE-s1-7B-anneal-en-cot)或者模型下载到本地的路径
  • use_mmgp:显存充足的情况下不需要,例如 RTX4090,显存不足需要开启,此时也需要安装 mmgp pip install mmgp
  • stage1_cache_size:only takes effect when running llmav2 and is not recommended to be modified. It involves caching tokens and is intended for low memory users

核心节点:YUE_Stage_A_Sampler,参数如下

  • genres_prompt:风格标签
  • lyrics_prompt:歌词
  • prompt_start 和 prompt_end:歌曲的时长
  • use_dual_tracks_prompt:如果需要参考音乐来生成音乐,设为true,开启use_dual_tracks_prompt 会借鉴'pop.00001.Instrumental.mp3'
  • use_audio_prompt:如果需要参考音乐来生成音乐,设为true,开启"use_audio_prompt"并关闭"use_dual_tracks_prompt" 会借鉴 "pop.00001.mp3"
  • run_n_segment:high will fast but easy OOM (The number of segments to process during the generation)
  • repetition_penalty:use default or try(repetition_penalty ranges from 1.0 to 2.0 (or higher in some cases). It controls the diversity and coherence of the audio tokens generated. The higher the value, the greater the discouragement of repetition. Setting value to 1.0 means no penalty)
  • rescale:please use default (Rescale output to avoid clipping)

核心节点:YUE_Stage_B_Loader

  • stage_A_repo:输入 huggingFace 上的模型ID(例如,m-a-p/YuE-s2-1B-general)或者模型下载到本地的路径
  • stage2_cache_size:only takes effect when running llmav2 and is not recommended to be modified. It involves caching tokens and is intended for low memory users
  • stage2_batch_size:high will fast but easy OOM

核心节点:YUE_Stage_B_Sampler

参考其他音乐生成音乐

小插曲

protobuf 版本冲突

ComfyUI-YuE 依赖 descript-audiotools 0.7.2,该版本需要 protobuf<3.20,>=3.9.2,但是一些其他的插件,比如 comfyui_hellomeme,comfyui_custom_nodes_alekpet 以及 comfyui_pulid_flux_ll 的版本对 protobuf 的要求如下:

text
mediapipe 0.10.20 requires protobuf<5,>=4.25.3, but you have protobuf 3.19.6 which is incompatible.
onnx 1.17.0 requires protobuf>=3.20.2, but you have protobuf 3.19.6 which is incompatible.
streamlit 1.42.2 requires protobuf<6,>=3.20, but you have protobuf 3.19.6 which is incompatible.
tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible.

安装 ComfyUI-YuE 后会使上述的三个插件无法使用。故在安装 ComfyUI-YuE 之后,重新安装了 protobuf。

text
.\python_embeded\python.exe -m pip install protobuf==4.25.3

这样安装后会报错如下:

text
descript-audiotools 0.7.2 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.25.3 which is incompatible.

但是实际测试下来没有啥影响。

文章的最后,如果您觉得本文对您有用,请打赏一杯咖啡!感谢!