高速推理引擎 nunchaku

官方文档

https://github.com/mit-han-lab/ComfyUI-nunchaku

功能介绍

Nunchaku 是一款高性能推理引擎，实现了 SVDQuant 量化技术，减少内存消耗的同时，提升了推理速度，并且可以很好地保持视觉保真度。与 FLUX.1-dev BF16 模型相比，它实现了 3.6 倍的内存减少，8.7 倍的提速。Nunchaku 0.2 发布后，flux 生图在 4090 上进入了 3s 时代（20步迭代）

安装

首先按照该文章安装 nunchaku
最后使用插件管理器安装 ComfyUI-nunchaku，重启应用。

模型下载

svdq-int4-flux.1-dev：下载整个文件夹，将文件夹 svdq-int4-flux.1-dev 放置到 models/diffusion_models/ 下
svdq-int4-flux.1-fill-dev：下载整个文件夹，将文件夹 svdq-int4-flux.1-fill-dev 放置到 models/diffusion_models/ 下
（可选）svdq-int4-flux.1-canny-dev：下载整个文件夹，将文件夹 svdq-int4-flux.1-canny-dev 放置到 models/diffusion_models/ 下
（可选）svdq-int4-flux.1-depth-dev：下载整个文件夹，将文件夹 svdq-int4-flux.1-depth-dev 放置到 models/diffusion_models/ 下
（可选）svdq-flux.1-t5：下载整个文件夹，将文件夹 svdq-flux.1-t5 放置到 models/text_encoders/ 下。默认使用 flux 的两个文本编码器就行，如果需要进一步减少内存占用，则可以使用该编码器

开始使用

四个核心节点：

Nunchaku Flux DiT Loader：A node for loading the FLUX diffusion model。
- model_path：指定模型所在的文件夹，例如 svdq-int4-flux.1-dev。
- cache_threshold：Controls the First-Block Cache tolerance, similar to residual_diff_threshold in WaveSpeed. Increasing this value improves speed but may reduce quality. A typical value is 0.12. Setting it to 0 disables the effect.
- attention：Defines the attention implementation method. You can choose between flash-attention2 or nunchaku-fp16. nunchaku-fp16 比 flash-attention2 快 1.2 倍且没有精度损失.
- cpu_offload：启用 CPU 卸载。虽然这会减少 GPU 内存使用量，但可能会减慢推理速度。
- device_id：运行该模型的 GPU ID。
- data_type：20系显卡使用 float16，其他使用 bfloat16
- i2f_mode：为 20系显卡设计，其他显卡该选项不生效
Nunchaku FLUX LoRA Loader：用于为 SVDQuant flux 模型加载 LoRA 模块的节点。
- lora_name：选择一个 lora
- lora_strength：控制 LoRA 模块的强度。
Nunchaku Text Encoder Loader：A node for loading the text encoders
- For FLUX, 使用 t5xxl_fp16.safetensors 和 clip_l.safetensors
- t5_min_length: Sets the minimum sequence length for T5 text embeddings. The default in DualCLIPLoader is hardcoded to 256, but for better image quality, use 512 here.
- use_4bit_t5: Specifies whether you need to use our quantized 4-bit T5 to save GPU memory.
- int4_model：Specifies the INT4 T5 模型所在的文件夹名称，例如 svdq-flux.1-t5. This option is only used when use_4bit_t5 is enabled.

文章的最后，如果您觉得本文对您有用，请打赏一杯咖啡！感谢！

高速推理引擎 nunchaku ​

功能介绍 ​

安装 ​

模型下载 ​

开始使用 ​

高速推理引擎 nunchaku

功能介绍

安装

模型下载

开始使用