Skip to content

val_check_interval check not bypassed when limit_val_batches=0 #21553

@taha-yassine

Description

@taha-yassine

Bug description

When limit_val_batches=0 (validation disabled), the val_check_interval sanity check in FitLoop.setup_data() still raises a ValueError if val_check_interval > limit_train_batches. The check should be skipped entirely when validation is disabled.

Root cause

In this part of the code, setup_data() unconditionally validates val_check_interval against the training batch count:

if isinstance(trainer.val_check_interval, int):
trainer.val_check_batch = trainer.val_check_interval
if trainer.val_check_batch > self.max_batches and trainer.check_val_every_n_epoch is not None:
raise ValueError(
f" `val_check_interval` ({trainer.val_check_interval}) must be less than or equal"
f" to the number of the training batches ({self.max_batches})."
" If you want to disable validation set `limit_val_batches` to 0.0 instead."
" If you want to validate based on the total training batches, set `check_val_every_n_epoch=None`."
)

However, Trainer.enable_validation (which correctly checks self.limit_val_batches > 0) is only used later in
_should_check_val_epoch(). So validation would never actually run, but the early sanity check doesn't know that.

Expected behavior

When limit_val_batches=0, the val_check_interval check should be skipped. A minimal fix would be to guard the check:

if isinstance(trainer.val_check_interval, int):
    trainer.val_check_batch = trainer.val_check_interval
    if trainer.val_check_batch > self.max_batches and trainer.check_val_every_n_epoch is not None and
trainer.limit_val_batches > 0:
        raise ValueError(...)

Disclosure: This issue was drafted with the help of an AI assistant. The debugging process was guided by me and I validated the analysis and the final result.

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

from pytorch_lightning import Trainer

   trainer = Trainer(
       max_epochs=2,
       limit_train_batches=100,
       limit_val_batches=0,
       val_check_interval=2000,
   )
   trainer.fit(model, train_dataloaders=train_dl)

Error messages and logs

ValueError: `val_check_interval` (2000) must be less than or equal to the number of the training batches (100).
   If you want to disable validation set `limit_val_batches` to 0.0 instead.
   If you want to validate based on the total training batches, set `check_val_every_n_epoch=None`.

Environment

Current environment
  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3090
    - available: True
    - version: 12.8
  • Lightning:
    - lightning-utilities: 0.15.2
    - pytorch-lightning: 2.5.4
    - torch: 2.8.0
    - torchmetrics: 1.8.1
  • Packages:
    - GitPython: 3.1.46
    - Jinja2: 3.1.6
    - MarkupSafe: 3.0.3
    - PyYAML: 6.0.3
    - Pygments: 2.19.2
    - aiohappyeyeballs: 2.6.1
    - aiohttp: 3.13.3
    - aiosignal: 1.4.0
    - annotated-doc: 0.0.4
    - annotated-types: 0.7.0
    - antlr4-python3-runtime: 4.9.3
    - anyio: 4.12.1
    - asttokens: 3.0.1
    - attrs: 25.4.0
    - certifi: 2026.1.4
    - cffi: 2.0.0
    - charset-normalizer: 3.4.4
    - click: 8.3.1
    - cloudpickle: 3.1.2
    - comet_ml: 3.53.0
    - comm: 0.2.3
    - configobj: 5.0.9
    - contourpy: 1.3.3
    - cryptography: 46.0.5
    - cycler: 0.12.1
    - datasets: 4.1.1
    - debugpy: 1.8.20
    - decorator: 5.2.1
    - dill: 0.4.0
    - distro: 1.9.0
    - docstring_parser: 0.17.0
    - dulwich: 1.0.0
    - everett: 3.1.0
    - executing: 2.2.1
    - filelock: 3.24.2
    - fonttools: 4.61.1
    - frozenlist: 1.8.0
    - fsspec: 2025.9.0
    - gcsfs: 2025.9.0
    - gitdb: 4.0.12
    - google-api-core: 2.25.2
    - google-auth: 2.48.0
    - google-auth-oauthlib: 1.2.4
    - google-cloud-aiplatform: 1.119.0
    - google-cloud-bigquery: 3.13.0
    - google-cloud-bigquery-storage: 2.33.1
    - google-cloud-core: 2.5.0
    - google-cloud-resource-manager: 1.16.0
    - google-cloud-storage: 2.19.0
    - google-crc32c: 1.8.0
    - google-genai: 1.63.0
    - google-resumable-media: 2.8.0
    - googleapis-common-protos: 1.72.0
    - grpc-google-iam-v1: 0.14.3
    - grpcio: 1.78.0
    - grpcio-status: 1.62.3
    - h11: 0.16.0
    - hf-xet: 1.2.0
    - httpcore: 1.0.9
    - httpx: 0.28.1
    - huggingface_hub: 1.4.1
    - hydra-core: 1.3.2
    - idna: 3.11
    - importlib_metadata: 8.7.1
    - ipykernel: 7.2.0
    - ipython: 9.10.0
    - ipython_pygments_lexers: 1.1.1
    - jedi: 0.19.2
    - jsonschema: 4.26.0
    - jsonschema-specifications: 2025.9.1
    - jupyter_client: 8.8.0
    - jupyter_core: 5.9.1
    - kiwisolver: 1.4.9
    - lightning-utilities: 0.15.2
    - markdown-it-py: 4.0.0
    - matplotlib: 3.10.8
    - matplotlib-inline: 0.2.1
    - mdurl: 0.1.2
    - mpmath: 1.3.0
    - multidict: 6.7.1
    - multiprocess: 0.70.16
    - nest-asyncio: 1.6.0
    - networkx: 3.6.1
    - numpy: 2.4.2
    - nvidia-cublas-cu12: 12.8.4.1
    - nvidia-cuda-cupti-cu12: 12.8.90
    - nvidia-cuda-nvrtc-cu12: 12.8.93
    - nvidia-cuda-runtime-cu12: 12.8.90
    - nvidia-cudnn-cu12: 9.10.2.21
    - nvidia-cufft-cu12: 11.3.3.83
    - nvidia-cufile-cu12: 1.13.1.3
    - nvidia-curand-cu12: 10.3.9.90
    - nvidia-cusolver-cu12: 11.7.3.90
    - nvidia-cusparse-cu12: 12.5.8.93
    - nvidia-cusparselt-cu12: 0.7.1
    - nvidia-nccl-cu12: 2.27.3
    - nvidia-nvjitlink-cu12: 12.8.93
    - nvidia-nvtx-cu12: 12.8.90
    - oauthlib: 3.3.1
    - omegaconf: 2.3.0
    - packaging: 25.0
    - pandas: 3.0.0
    - parso: 0.8.6
    - pexpect: 4.9.0
    - pillow: 12.1.1
    - pip: 26.0.1
    - platformdirs: 4.9.2
    - prompt_toolkit: 3.0.52
    - propcache: 0.4.1
    - proto-plus: 1.27.1
    - protobuf: 4.25.8
    - psutil: 7.2.2
    - ptyprocess: 0.7.0
    - pure_eval: 0.2.3
    - pyarrow: 21.0.0
    - pyasn1: 0.6.2
    - pyasn1_modules: 0.4.2
    - pycparser: 3.0
    - pydantic: 2.12.5
    - pydantic_core: 2.41.5
    - pyparsing: 3.3.2
    - python-box: 6.1.0
    - python-dateutil: 2.9.0.post0
    - pytorch-lightning: 2.5.4
    - pyvers: 0.1.0
    - pyzmq: 27.1.0
    - referencing: 0.37.0
    - requests: 2.32.5
    - requests-oauthlib: 2.0.0
    - requests-toolbelt: 1.0.0
    - rich: 14.3.2
    - rpds-py: 0.30.0
    - rsa: 4.9.1
    - scipy: 1.17.0
    - seaborn: 0.13.2
    - semantic-version: 2.10.0
    - sentry-sdk: 2.53.0
    - setuptools: 82.0.0
    - shapely: 2.1.2
    - shellingham: 1.5.4
    - simplejson: 3.20.2
    - six: 1.17.0
    - smmap: 5.0.2
    - sniffio: 1.3.1
    - stack-data: 0.6.3
    - sympy: 1.14.0
    - tenacity: 9.1.4
    - tensorboardX: 2.6.4
    - tensordict: 0.9.1
    - torch: 2.8.0
    - torchmetrics: 1.8.1
    - tornado: 6.5.4
    - tqdm: 4.67.3
    - traitlets: 5.14.3
    - triton: 3.4.0
    - typer: 0.23.1
    - typer-slim: 0.23.1
    - typing-inspection: 0.4.2
    - typing_extensions: 4.15.0
    - urllib3: 2.6.3
    - wandb: 0.25.0
    - wcwidth: 0.6.0
    - websockets: 15.0.1
    - wrapt: 2.1.1
    - wurlitzer: 3.1.1
    - xxhash: 3.6.0
    - yarl: 1.22.0
    - zipp: 3.23.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor:
    - python: 3.13.7
    - release: 6.18.9
    - version: Proposal for help #1-NixOS SMP PREEMPT_DYNAMIC Fri Feb 6 15:57:45 UTC 2026

More info

The error message itself suggests setting limit_val_batches to 0.0, implying that doing so should resolve the issue, but it doesn't.

cc @ethanwharris @lantiga @justusschock

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions