Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
59c482f
Implement mixed precision operations with a registry design and metad…
contentis Oct 16, 2025
bc0ad9b
fix(api-nodes): remove "veo2" model from Veo3 node (#10372)
bigcat88 Oct 16, 2025
19b4661
Workaround for nvidia issue where VAE uses 3x more memory on torch 2.…
comfyanonymous Oct 16, 2025
b1293d5
workaround also works on cudnn 91200 (#10375)
comfyanonymous Oct 16, 2025
d8d60b5
Do batch_slice in EasyCache's apply_cache_diff (#10376)
Kosinkadink Oct 17, 2025
b1467da
execution: fold in dependency aware caching / Fix --cache-none with l…
rattus128 Oct 17, 2025
99ce2a1
convert nodes_controlnet.py to V3 schema (#10202)
bigcat88 Oct 17, 2025
92d9738
Update Python 3.14 installation instructions (#10385)
comfyanonymous Oct 17, 2025
9da397e
Disable torch compiler for cast_bias_weight function (#10384)
comfyanonymous Oct 18, 2025
5b80add
Turn off cuda malloc by default when --fast autotune is turned on. (#…
comfyanonymous Oct 19, 2025
0cf3395
Fix batch size above 1 giving bad output in chroma radiance. (#10394)
comfyanonymous Oct 19, 2025
dad076a
Speed up chroma radiance. (#10395)
comfyanonymous Oct 19, 2025
b4f30bd
Pytorch is stupid. (#10398)
comfyanonymous Oct 19, 2025
b5c59b7
Deprecation warning on unused files (#10387)
christian-byrne Oct 19, 2025
a4787ac
Update template to 0.2.1 (#10413)
comfyui-wiki Oct 20, 2025
2c2aa40
Log message for cudnn disable on AMD. (#10418)
comfyanonymous Oct 20, 2025
b7992f8
Revert "execution: fold in dependency aware caching / Fix --cache-non…
comfyanonymous Oct 20, 2025
560b1bd
ComfyUI version v0.3.66
comfyanonymous Oct 20, 2025
9cdc649
Only disable cudnn on newer AMD GPUs. (#10437)
comfyanonymous Oct 21, 2025
f13cff0
Add custom node published subgraphs endpoint (#10438)
Kosinkadink Oct 22, 2025
ad6c14c
Updated design using Tensor Subclasses
contentis Oct 22, 2025
7ea731e
Fix FP8 MM
contentis Oct 22, 2025
4739d77
execution: fold in dependency aware caching / Fix --cache-none with l…
rattus128 Oct 22, 2025
a1864c0
Small readme improvement. (#10442)
comfyanonymous Oct 22, 2025
1bcda6d
WIP way to support multi multi dimensional latents. (#10456)
comfyanonymous Oct 24, 2025
24188b3
Update template to 0.2.2 (#10461)
comfyui-wiki Oct 24, 2025
388b306
feat(api-nodes): network client v2: async ops, cancellation, download…
bigcat88 Oct 24, 2025
5e9f335
An actually functional POC
contentis Oct 24, 2025
dd5af0c
convert Tripo API nodes to V3 schema (#10469)
bigcat88 Oct 24, 2025
426cde3
Remove useless function (#10472)
comfyanonymous Oct 24, 2025
e86b79a
convert Gemini API nodes to V3 schema (#10476)
bigcat88 Oct 25, 2025
098a352
Add warning for torch-directml usage (#10482)
comfyanonymous Oct 26, 2025
f6bbc1a
Fix mistake. (#10484)
comfyanonymous Oct 26, 2025
9d529e5
fix(api-nodes): random issues on Windows by capturing general OSError…
bigcat88 Oct 26, 2025
c170fd2
Bump portable deps workflow to torch cu130 python 3.13.9 (#10493)
comfyanonymous Oct 27, 2025
efb3503
Remove CK reference and ensure correct compute dtype
contentis Oct 27, 2025
a7216e1
Update unit tests
contentis Oct 27, 2025
2a8b826
ruff lint
contentis Oct 27, 2025
70acf79
Implement mixed precision operations with a registry design and metad…
contentis Oct 16, 2025
3882946
Updated design using Tensor Subclasses
contentis Oct 22, 2025
19ce6b0
Fix FP8 MM
contentis Oct 22, 2025
b6e0a53
An actually functional POC
contentis Oct 24, 2025
0d20154
Remove CK reference and ensure correct compute dtype
contentis Oct 27, 2025
77d3070
Update unit tests
contentis Oct 27, 2025
e8d267b
ruff lint
contentis Oct 27, 2025
218ef4c
Merge remote-tracking branch 'origin/operator_registry_design' into o…
contentis Oct 27, 2025
f287d02
Fix missing keys
contentis Oct 27, 2025
59a2e8c
Rename quant dtype parameter
contentis Oct 28, 2025
135d302
Rename quant dtype parameter
contentis Oct 28, 2025
9d9f98c
Fix unittests for CPU build
contentis Oct 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/windows_release_dependencies.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ on:
description: 'cuda version'
required: true
type: string
default: "129"
default: "130"

python_minor:
description: 'python minor version'
Expand All @@ -29,7 +29,7 @@ on:
description: 'python patch version'
required: true
type: string
default: "6"
default: "9"
# push:
# branches:
# - master
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,10 +197,12 @@ comfy install

## Manual Install (Windows, Linux)

Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) and install pytorch nightly but it is not recommended.
Python 3.14 will work if you comment out the `kornia` dependency in the requirements.txt file (breaks the canny node) but it is not recommended.

Python 3.13 is very well supported. If you have trouble with some custom node dependencies on 3.13 you can try 3.12

### Instructions:

Git clone this repo.

Put your SD checkpoints (the huge ckpt/safetensors files) in: models/checkpoints
Expand Down
112 changes: 112 additions & 0 deletions app/subgraph_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from __future__ import annotations

from typing import TypedDict
import os
import folder_paths
import glob
from aiohttp import web
import hashlib


class Source:
custom_node = "custom_node"

class SubgraphEntry(TypedDict):
source: str
"""
Source of subgraph - custom_nodes vs templates.
"""
path: str
"""
Relative path of the subgraph file.
For custom nodes, will be the relative directory like <custom_node_dir>/subgraphs/<name>.json
"""
name: str
"""
Name of subgraph file.
"""
info: CustomNodeSubgraphEntryInfo
"""
Additional info about subgraph; in the case of custom_nodes, will contain nodepack name
"""
data: str

class CustomNodeSubgraphEntryInfo(TypedDict):
node_pack: str
"""Node pack name."""

class SubgraphManager:
def __init__(self):
self.cached_custom_node_subgraphs: dict[SubgraphEntry] | None = None

async def load_entry_data(self, entry: SubgraphEntry):
with open(entry['path'], 'r') as f:
entry['data'] = f.read()
return entry

async def sanitize_entry(self, entry: SubgraphEntry | None, remove_data=False) -> SubgraphEntry | None:
if entry is None:
return None
entry = entry.copy()
entry.pop('path', None)
if remove_data:
entry.pop('data', None)
return entry

async def sanitize_entries(self, entries: dict[str, SubgraphEntry], remove_data=False) -> dict[str, SubgraphEntry]:
entries = entries.copy()
for key in list(entries.keys()):
entries[key] = await self.sanitize_entry(entries[key], remove_data)
return entries

async def get_custom_node_subgraphs(self, loadedModules, force_reload=False):
# if not forced to reload and cached, return cache
if not force_reload and self.cached_custom_node_subgraphs is not None:
return self.cached_custom_node_subgraphs
# Load subgraphs from custom nodes
subfolder = "subgraphs"
subgraphs_dict: dict[SubgraphEntry] = {}

for folder in folder_paths.get_folder_paths("custom_nodes"):
pattern = os.path.join(folder, f"*/{subfolder}/*.json")
matched_files = glob.glob(pattern)
for file in matched_files:
# replace backslashes with forward slashes
file = file.replace('\\', '/')
info: CustomNodeSubgraphEntryInfo = {
"node_pack": "custom_nodes." + file.split('/')[-3]
}
source = Source.custom_node
# hash source + path to make sure id will be as unique as possible, but
# reproducible across backend reloads
id = hashlib.sha256(f"{source}{file}".encode()).hexdigest()
entry: SubgraphEntry = {
"source": Source.custom_node,
"name": os.path.splitext(os.path.basename(file))[0],
"path": file,
"info": info,
}
subgraphs_dict[id] = entry
self.cached_custom_node_subgraphs = subgraphs_dict
return subgraphs_dict

async def get_custom_node_subgraph(self, id: str, loadedModules):
subgraphs = await self.get_custom_node_subgraphs(loadedModules)
entry: SubgraphEntry = subgraphs.get(id, None)
if entry is not None and entry.get('data', None) is None:
await self.load_entry_data(entry)
return entry

def add_routes(self, routes, loadedModules):
@routes.get("/global_subgraphs")
async def get_global_subgraphs(request):
subgraphs_dict = await self.get_custom_node_subgraphs(loadedModules)
# NOTE: we may want to include other sources of global subgraphs such as templates in the future;
# that's the reasoning for the current implementation
return web.json_response(await self.sanitize_entries(subgraphs_dict, remove_data=True))

@routes.get("/global_subgraphs/{id}")
async def get_global_subgraph(request):
id = request.match_info.get("id", None)
subgraph = await self.get_custom_node_subgraph(id, loadedModules)
return web.json_response(await self.sanitize_entry(subgraph))
23 changes: 7 additions & 16 deletions comfy/ldm/chroma_radiance/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,15 +189,15 @@ def forward_nerf(
nerf_pixels = nn.functional.unfold(img_orig, kernel_size=patch_size, stride=patch_size)
nerf_pixels = nerf_pixels.transpose(1, 2) # -> [B, NumPatches, C * P * P]

# Reshape for per-patch processing
nerf_hidden = img_out.reshape(B * num_patches, params.hidden_size)
nerf_pixels = nerf_pixels.reshape(B * num_patches, C, patch_size**2).transpose(1, 2)

if params.nerf_tile_size > 0 and num_patches > params.nerf_tile_size:
# Enable tiling if nerf_tile_size isn't 0 and we actually have more patches than
# the tile size.
img_dct = self.forward_tiled_nerf(img_out, nerf_pixels, B, C, num_patches, patch_size, params)
img_dct = self.forward_tiled_nerf(nerf_hidden, nerf_pixels, B, C, num_patches, patch_size, params)
else:
# Reshape for per-patch processing
nerf_hidden = img_out.reshape(B * num_patches, params.hidden_size)
nerf_pixels = nerf_pixels.reshape(B * num_patches, C, patch_size**2).transpose(1, 2)

# Get DCT-encoded pixel embeddings [pixel-dct]
img_dct = self.nerf_image_embedder(nerf_pixels)

Expand Down Expand Up @@ -240,17 +240,8 @@ def forward_tiled_nerf(
end = min(i + tile_size, num_patches)

# Slice the current tile from the input tensors
nerf_hidden_tile = nerf_hidden[:, i:end, :]
nerf_pixels_tile = nerf_pixels[:, i:end, :]

# Get the actual number of patches in this tile (can be smaller for the last tile)
num_patches_tile = nerf_hidden_tile.shape[1]

# Reshape the tile for per-patch processing
# [B, NumPatches_tile, D] -> [B * NumPatches_tile, D]
nerf_hidden_tile = nerf_hidden_tile.reshape(batch * num_patches_tile, params.hidden_size)
# [B, NumPatches_tile, C*P*P] -> [B*NumPatches_tile, C, P*P] -> [B*NumPatches_tile, P*P, C]
nerf_pixels_tile = nerf_pixels_tile.reshape(batch * num_patches_tile, channels, patch_size**2).transpose(1, 2)
nerf_hidden_tile = nerf_hidden[i * batch:end * batch]
nerf_pixels_tile = nerf_pixels[i * batch:end * batch]

# get DCT-encoded pixel embeddings [pixel-dct]
img_dct_tile = self.nerf_image_embedder(nerf_pixels_tile)
Expand Down
20 changes: 17 additions & 3 deletions comfy/model_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def __init__(self, model_config, model_type=ModelType.EPS, device=None, unet_mod
if not unet_config.get("disable_unet_model_creation", False):
if model_config.custom_operations is None:
fp8 = model_config.optimizations.get("fp8", False)
operations = comfy.ops.pick_operations(unet_config.get("dtype", None), self.manual_cast_dtype, fp8_optimizations=fp8, scaled_fp8=model_config.scaled_fp8)
operations = comfy.ops.pick_operations(unet_config.get("dtype", None), self.manual_cast_dtype, fp8_optimizations=fp8, scaled_fp8=model_config.scaled_fp8, model_config=model_config)
else:
operations = model_config.custom_operations
self.diffusion_model = unet_model(**unet_config, device=device, operations=operations)
Expand Down Expand Up @@ -197,8 +197,14 @@ def _apply_model(self, x, t, c_concat=None, c_crossattn=None, control=None, tran
extra_conds[o] = extra

t = self.process_timestep(t, x=x, **extra_conds)
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
return self.model_sampling.calculate_denoised(sigma, model_output, x)
if "latent_shapes" in extra_conds:
xc = utils.unpack_latents(xc, extra_conds.pop("latent_shapes"))

model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
if len(model_output) > 1 and not torch.is_tensor(model_output):
model_output, _ = utils.pack_latents(model_output)

return self.model_sampling.calculate_denoised(sigma, model_output.float(), x)

def process_timestep(self, timestep, **kwargs):
return timestep
Expand Down Expand Up @@ -327,6 +333,14 @@ def state_dict_for_saving(self, clip_state_dict=None, vae_state_dict=None, clip_
if self.model_config.scaled_fp8 is not None:
unet_state_dict["scaled_fp8"] = torch.tensor([], dtype=self.model_config.scaled_fp8)

# Save mixed precision metadata
if hasattr(self.model_config, 'layer_quant_config') and self.model_config.layer_quant_config:
metadata = {
"format_version": "1.0",
"layers": self.model_config.layer_quant_config
}
unet_state_dict["_quantization_metadata"] = metadata

unet_state_dict = self.model_config.process_unet_state_dict_for_saving(unet_state_dict)

if self.model_type == ModelType.V_PREDICTION:
Expand Down
22 changes: 21 additions & 1 deletion comfy/model_detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,20 @@
import logging
import torch


def detect_layer_quantization(metadata):
quant_key = "_quantization_metadata"
if metadata is not None and quant_key in metadata:
quant_metadata = metadata.pop(quant_key)
quant_metadata = json.loads(quant_metadata)
if isinstance(quant_metadata, dict) and "layers" in quant_metadata:
logging.info(f"Found quantization metadata (version {quant_metadata.get('format_version', 'unknown')})")
return quant_metadata["layers"]
else:
raise ValueError("Invalid quantization metadata format")
return None


def count_blocks(state_dict_keys, prefix_string):
count = 0
while True:
Expand Down Expand Up @@ -213,7 +227,7 @@ def detect_unet_config(state_dict, key_prefix, metadata=None):
dit_config["nerf_mlp_ratio"] = 4
dit_config["nerf_depth"] = 4
dit_config["nerf_max_freqs"] = 8
dit_config["nerf_tile_size"] = 32
dit_config["nerf_tile_size"] = 512
dit_config["nerf_final_head_type"] = "conv" if f"{key_prefix}nerf_final_layer_conv.norm.scale" in state_dict_keys else "linear"
dit_config["nerf_embedder_dtype"] = torch.float32
else:
Expand Down Expand Up @@ -701,6 +715,12 @@ def model_config_from_unet(state_dict, unet_key_prefix, use_base_if_no_match=Fal
else:
model_config.optimizations["fp8"] = True

# Detect per-layer quantization (mixed precision)
layer_quant_config = detect_layer_quantization(metadata)
if layer_quant_config:
model_config.layer_quant_config = layer_quant_config
logging.info(f"Detected mixed precision quantization: {len(layer_quant_config)} layers quantized")

return model_config

def unet_prefix_from_state_dict(state_dict):
Expand Down
23 changes: 14 additions & 9 deletions comfy/model_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ def get_supported_float8_types():

directml_enabled = False
if args.directml is not None:
logging.warning("WARNING: torch-directml barely works, is very slow, has not been updated in over 1 year and might be removed soon, please don't use it, there are better options.")
import torch_directml
directml_enabled = True
device_index = args.directml
Expand Down Expand Up @@ -330,14 +331,21 @@ def amd_min_version(device=None, min_rdna_version=0):


SUPPORT_FP8_OPS = args.supports_fp8_compute

AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]

try:
if is_amd():
torch.backends.cudnn.enabled = False # Seems to improve things a lot on AMD
arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName
if not (any((a in arch) for a in AMD_RDNA2_AND_OLDER_ARCH)):
torch.backends.cudnn.enabled = False # Seems to improve things a lot on AMD
logging.info("Set: torch.backends.cudnn.enabled = False for better AMD performance.")

try:
rocm_version = tuple(map(int, str(torch.version.hip).split(".")[:2]))
except:
rocm_version = (6, -1)
arch = torch.cuda.get_device_properties(get_torch_device()).gcnArchName

logging.info("AMD arch: {}".format(arch))
logging.info("ROCm version: {}".format(rocm_version))
if args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
Expand Down Expand Up @@ -371,6 +379,9 @@ def amd_min_version(device=None, min_rdna_version=0):
except:
pass

if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
torch.backends.cudnn.benchmark = True

try:
if torch_version_numeric >= (2, 5):
torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
Expand Down Expand Up @@ -988,12 +999,6 @@ def device_supports_non_blocking(device):
return False
return True

def device_should_use_non_blocking(device):
if not device_supports_non_blocking(device):
return False
return False
# return True #TODO: figure out why this causes memory issues on Nvidia and possibly others

def force_channels_last():
if args.force_channels_last:
return True
Expand Down Expand Up @@ -1327,7 +1332,7 @@ def should_use_bf16(device=None, model_params=0, prioritize_performance=True, ma

if is_amd():
arch = torch.cuda.get_device_properties(device).gcnArchName
if any((a in arch) for a in ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"]): # RDNA2 and older don't support bf16
if any((a in arch) for a in AMD_RDNA2_AND_OLDER_ARCH): # RDNA2 and older don't support bf16
if manual_cast:
return True
return False
Expand Down
Loading