Skip to content

Fresh Installed training not working #3458

@xxl2005

Description

@xxl2005

hello

Newly installed training doesn't work

installed by: uv, ./gui-uv.sh

my system: Linux: CachyOS

GPU:NVIDIA

`❯ ./gui-uv.sh
2025-11-15 17:33:28.928086: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1763224408.938454 12465 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763224408.941816 12465 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1763224408.950823 12465 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763224408.950838 12465 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763224408.950840 12465 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763224408.950842 12465 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-11-15 17:33:28.953455: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
17:33:32-223151 WARNING Skipping requirements verification.
17:33:32-225060 INFO headless: False
17:33:32-225861 INFO Using shell=True when running external commands...

  • Running on local URL: http://127.0.0.1:7860
  • To create a public link, set share=True in launch().
    17:34:58-001817 INFO Copy /home/tobias/Schreibtisch/Training_images/ to
    /home/tobias/Schreibtisch/Destination_training/img/200_h53d man...
    17:34:58-003747 INFO Regularization images directory is missing... not copying regularisation images...
    17:34:58-004437 INFO Done creating kohya_ss training folder structure at
    /home/tobias/Schreibtisch/Destination_training/...
    17:35:12-824582 INFO Start training Dreambooth...
    17:35:12-825262 INFO Validating lr scheduler arguments...
    17:35:12-825823 INFO Validating optimizer arguments...
    17:35:12-826262 INFO Validating /home/tobias/Schreibtisch/Destination_training/log existence and writability...
    SUCCESS
    17:35:12-826749 INFO Validating /home/tobias/Schreibtisch/Destination_training/model existence and writability...
    SUCCESS
    17:35:12-827290 INFO Validating /home/tobias/AI/ComfyUI/models/checkpoints/lustifySDXLNSFW_endgame.safetensors
    existence... SUCCESS
    17:35:12-827766 INFO Validating /home/tobias/Schreibtisch/Destination_training/img existence... SUCCESS
    17:35:12-828235 INFO Folder 200_h53d man: 200 repeats found
    17:35:12-828681 INFO Folder 200_h53d man: 2 images found
    17:35:12-829091 INFO Folder 200_h53d man: 2 * 200 = 400 steps
    17:35:12-829520 INFO Regularization factor: 1
    17:35:12-829912 INFO Total steps: 400
    17:35:12-830302 INFO Train batch size: 6
    17:35:12-830693 INFO Gradient accumulation steps: 1
    17:35:12-831087 INFO Epoch: 1
    17:35:12-831457 INFO Max train steps: 1600
    17:35:12-831844 INFO lr_warmup_steps = 0.1
    17:35:12-832520 INFO Saving training config to
    /home/tobias/Schreibtisch/Destination_training/model/TEST_232_20251115-173512.json...
    17:35:12-833205 INFO Executing command: /home/tobias/AI/kohya_ss/.venv/bin/accelerate launch --dynamo_backend no
    --dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1
    --num_cpu_threads_per_process 2 /home/tobias/AI/kohya_ss/sd-scripts/sdxl_train.py --config_file
    /home/tobias/Schreibtisch/Destination_training/model/config_dreambooth-20251115-173512.toml
    ipex flag is deprecated, will be removed in Accelerate v1.10. From 2.7.0, PyTorch has all needed optimizations for Intel CPU and XPU.
    /home/tobias/AI/kohya_ss/sd-scripts/library/deepspeed_utils.py:131: SyntaxWarning: "is not" with a literal. Did you mean "!="?
    wrap_model_forward_with_torch_autocast = args.mixed_precision is not "no"
    Traceback (most recent call last):
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/utils/import_utils.py", line 920, in _get_module
    return importlib.import_module("." + module_name, self.name)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/tobias/.local/share/uv/python/cpython-3.11.13-linux-x86_64-gnu/lib/python3.11/importlib/init.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "", line 1204, in _gcd_import
    File "", line 1176, in _find_and_load
    File "", line 1126, in _find_and_load_unlocked
    File "", line 241, in _call_with_frames_removed
    File "", line 1204, in _gcd_import
    File "", line 1176, in _find_and_load
    File "", line 1147, in _find_and_load_unlocked
    File "", line 690, in _load_unlocked
    File "", line 940, in exec_module
    File "", line 241, in _call_with_frames_removed
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/models/autoencoders/init.py", line 1, in
    from .autoencoder_asym_kl import AsymmetricAutoencoderKL
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 22, in
    from ..modeling_utils import ModelMixin
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 35, in
    from ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizer
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/quantizers/init.py", line 15, in
    from .auto import DiffusersAutoQuantizer
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/quantizers/auto.py", line 22, in
    from .bitsandbytes import BnB4BitDiffusersQuantizer, BnB8BitDiffusersQuantizer
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/quantizers/bitsandbytes/init.py", line 2, in
    from .utils import dequantize_and_replace, dequantize_bnb_weight, replace_with_bnb_linear
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/quantizers/bitsandbytes/utils.py", line 32, in
    import bitsandbytes as bnb
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/bitsandbytes/init.py", line 19, in
    from .nn import modules
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/bitsandbytes/nn/init.py", line 21, in
    from .triton_based_modules import (
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/bitsandbytes/nn/triton_based_modules.py", line 6, in
    from bitsandbytes.triton.dequantize_rowwise import dequantize_rowwise
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/bitsandbytes/triton/dequantize_rowwise.py", line 18, in
    @triton.autotune(
    ^^^^^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 378, in decorator
    return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 130, in init
    self.do_bench = driver.active.get_benchmarker()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in getattr
    self._initialize_obj()
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
    ^^^^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives0
    ^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 535, in init
    self.utils = CudaUtils() # TODO: make static
    ^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 89, in init
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 71, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ImportError: /home/tobias/.triton/cache/QLAEYTJR4KV5WSBGJKRUAKVP475DE47NW7P4XMI2RFXBOIE5TZ4Q/cuda_utils.so: undefined symbol: cuModuleGetFunction

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/tobias/AI/kohya_ss/sd-scripts/sdxl_train.py", line 20, in
from library import deepspeed_utils, sdxl_model_util, strategy_base, strategy_sd, strategy_sdxl
File "/home/tobias/AI/kohya_ss/sd-scripts/library/sdxl_model_util.py", line 8, in
from diffusers import AutoencoderKL, EulerDiscreteScheduler, UNet2DConditionModel
File "", line 1229, in _handle_fromlist
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/utils/import_utils.py", line 911, in getattr
value = getattr(module, name)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/utils/import_utils.py", line 910, in getattr
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/diffusers/utils/import_utils.py", line 922, in _get_module
raise RuntimeError(
RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback):
/home/tobias/.triton/cache/QLAEYTJR4KV5WSBGJKRUAKVP475DE47NW7P4XMI2RFXBOIE5TZ4Q/cuda_utils.so: undefined symbol: cuModuleGetFunction
Traceback (most recent call last):
File "/home/tobias/AI/kohya_ss/.venv/bin/accelerate", line 10, in
sys.exit(main())
^^^^^^
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1199, in launch_command
simple_launcher(args)
File "/home/tobias/AI/kohya_ss/.venv/lib/python3.11/site-packages/accelerate/commands/launch.py", line 785, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tobias/AI/kohya_ss/.venv/bin/python', '/home/tobias/AI/kohya_ss/sd-scripts/sdxl_train.py', '--config_file', '/home/tobias/Schreibtisch/Destination_training/model/config_dreambooth-20251115-173512.toml']' returned non-zero exit status 1.
17:35:18-079399 INFO Training has ended. `

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions