-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Bug description
When limit_val_batches=0 (validation disabled), the val_check_interval sanity check in FitLoop.setup_data() still raises a ValueError if val_check_interval > limit_train_batches. The check should be skipped entirely when validation is disabled.
Root cause
In this part of the code, setup_data() unconditionally validates val_check_interval against the training batch count:
pytorch-lightning/src/lightning/pytorch/loops/fit_loop.py
Lines 286 to 294 in 58b89ed
| if isinstance(trainer.val_check_interval, int): | |
| trainer.val_check_batch = trainer.val_check_interval | |
| if trainer.val_check_batch > self.max_batches and trainer.check_val_every_n_epoch is not None: | |
| raise ValueError( | |
| f" `val_check_interval` ({trainer.val_check_interval}) must be less than or equal" | |
| f" to the number of the training batches ({self.max_batches})." | |
| " If you want to disable validation set `limit_val_batches` to 0.0 instead." | |
| " If you want to validate based on the total training batches, set `check_val_every_n_epoch=None`." | |
| ) |
However, Trainer.enable_validation (which correctly checks self.limit_val_batches > 0) is only used later in
_should_check_val_epoch(). So validation would never actually run, but the early sanity check doesn't know that.
Expected behavior
When limit_val_batches=0, the val_check_interval check should be skipped. A minimal fix would be to guard the check:
if isinstance(trainer.val_check_interval, int):
trainer.val_check_batch = trainer.val_check_interval
if trainer.val_check_batch > self.max_batches and trainer.check_val_every_n_epoch is not None and
trainer.limit_val_batches > 0:
raise ValueError(...)Disclosure: This issue was drafted with the help of an AI assistant. The debugging process was guided by me and I validated the analysis and the final result.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
from pytorch_lightning import Trainer
trainer = Trainer(
max_epochs=2,
limit_train_batches=100,
limit_val_batches=0,
val_check_interval=2000,
)
trainer.fit(model, train_dataloaders=train_dl)Error messages and logs
ValueError: `val_check_interval` (2000) must be less than or equal to the number of the training batches (100).
If you want to disable validation set `limit_val_batches` to 0.0 instead.
If you want to validate based on the total training batches, set `check_val_every_n_epoch=None`.
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA GeForce RTX 3090
- available: True
- version: 12.8 - Lightning:
- lightning-utilities: 0.15.2
- pytorch-lightning: 2.5.4
- torch: 2.8.0
- torchmetrics: 1.8.1 - Packages:
- GitPython: 3.1.46
- Jinja2: 3.1.6
- MarkupSafe: 3.0.3
- PyYAML: 6.0.3
- Pygments: 2.19.2
- aiohappyeyeballs: 2.6.1
- aiohttp: 3.13.3
- aiosignal: 1.4.0
- annotated-doc: 0.0.4
- annotated-types: 0.7.0
- antlr4-python3-runtime: 4.9.3
- anyio: 4.12.1
- asttokens: 3.0.1
- attrs: 25.4.0
- certifi: 2026.1.4
- cffi: 2.0.0
- charset-normalizer: 3.4.4
- click: 8.3.1
- cloudpickle: 3.1.2
- comet_ml: 3.53.0
- comm: 0.2.3
- configobj: 5.0.9
- contourpy: 1.3.3
- cryptography: 46.0.5
- cycler: 0.12.1
- datasets: 4.1.1
- debugpy: 1.8.20
- decorator: 5.2.1
- dill: 0.4.0
- distro: 1.9.0
- docstring_parser: 0.17.0
- dulwich: 1.0.0
- everett: 3.1.0
- executing: 2.2.1
- filelock: 3.24.2
- fonttools: 4.61.1
- frozenlist: 1.8.0
- fsspec: 2025.9.0
- gcsfs: 2025.9.0
- gitdb: 4.0.12
- google-api-core: 2.25.2
- google-auth: 2.48.0
- google-auth-oauthlib: 1.2.4
- google-cloud-aiplatform: 1.119.0
- google-cloud-bigquery: 3.13.0
- google-cloud-bigquery-storage: 2.33.1
- google-cloud-core: 2.5.0
- google-cloud-resource-manager: 1.16.0
- google-cloud-storage: 2.19.0
- google-crc32c: 1.8.0
- google-genai: 1.63.0
- google-resumable-media: 2.8.0
- googleapis-common-protos: 1.72.0
- grpc-google-iam-v1: 0.14.3
- grpcio: 1.78.0
- grpcio-status: 1.62.3
- h11: 0.16.0
- hf-xet: 1.2.0
- httpcore: 1.0.9
- httpx: 0.28.1
- huggingface_hub: 1.4.1
- hydra-core: 1.3.2
- idna: 3.11
- importlib_metadata: 8.7.1
- ipykernel: 7.2.0
- ipython: 9.10.0
- ipython_pygments_lexers: 1.1.1
- jedi: 0.19.2
- jsonschema: 4.26.0
- jsonschema-specifications: 2025.9.1
- jupyter_client: 8.8.0
- jupyter_core: 5.9.1
- kiwisolver: 1.4.9
- lightning-utilities: 0.15.2
- markdown-it-py: 4.0.0
- matplotlib: 3.10.8
- matplotlib-inline: 0.2.1
- mdurl: 0.1.2
- mpmath: 1.3.0
- multidict: 6.7.1
- multiprocess: 0.70.16
- nest-asyncio: 1.6.0
- networkx: 3.6.1
- numpy: 2.4.2
- nvidia-cublas-cu12: 12.8.4.1
- nvidia-cuda-cupti-cu12: 12.8.90
- nvidia-cuda-nvrtc-cu12: 12.8.93
- nvidia-cuda-runtime-cu12: 12.8.90
- nvidia-cudnn-cu12: 9.10.2.21
- nvidia-cufft-cu12: 11.3.3.83
- nvidia-cufile-cu12: 1.13.1.3
- nvidia-curand-cu12: 10.3.9.90
- nvidia-cusolver-cu12: 11.7.3.90
- nvidia-cusparse-cu12: 12.5.8.93
- nvidia-cusparselt-cu12: 0.7.1
- nvidia-nccl-cu12: 2.27.3
- nvidia-nvjitlink-cu12: 12.8.93
- nvidia-nvtx-cu12: 12.8.90
- oauthlib: 3.3.1
- omegaconf: 2.3.0
- packaging: 25.0
- pandas: 3.0.0
- parso: 0.8.6
- pexpect: 4.9.0
- pillow: 12.1.1
- pip: 26.0.1
- platformdirs: 4.9.2
- prompt_toolkit: 3.0.52
- propcache: 0.4.1
- proto-plus: 1.27.1
- protobuf: 4.25.8
- psutil: 7.2.2
- ptyprocess: 0.7.0
- pure_eval: 0.2.3
- pyarrow: 21.0.0
- pyasn1: 0.6.2
- pyasn1_modules: 0.4.2
- pycparser: 3.0
- pydantic: 2.12.5
- pydantic_core: 2.41.5
- pyparsing: 3.3.2
- python-box: 6.1.0
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.5.4
- pyvers: 0.1.0
- pyzmq: 27.1.0
- referencing: 0.37.0
- requests: 2.32.5
- requests-oauthlib: 2.0.0
- requests-toolbelt: 1.0.0
- rich: 14.3.2
- rpds-py: 0.30.0
- rsa: 4.9.1
- scipy: 1.17.0
- seaborn: 0.13.2
- semantic-version: 2.10.0
- sentry-sdk: 2.53.0
- setuptools: 82.0.0
- shapely: 2.1.2
- shellingham: 1.5.4
- simplejson: 3.20.2
- six: 1.17.0
- smmap: 5.0.2
- sniffio: 1.3.1
- stack-data: 0.6.3
- sympy: 1.14.0
- tenacity: 9.1.4
- tensorboardX: 2.6.4
- tensordict: 0.9.1
- torch: 2.8.0
- torchmetrics: 1.8.1
- tornado: 6.5.4
- tqdm: 4.67.3
- traitlets: 5.14.3
- triton: 3.4.0
- typer: 0.23.1
- typer-slim: 0.23.1
- typing-inspection: 0.4.2
- typing_extensions: 4.15.0
- urllib3: 2.6.3
- wandb: 0.25.0
- wcwidth: 0.6.0
- websockets: 15.0.1
- wrapt: 2.1.1
- wurlitzer: 3.1.1
- xxhash: 3.6.0
- yarl: 1.22.0
- zipp: 3.23.0 - System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor:
- python: 3.13.7
- release: 6.18.9
- version: Proposal for help #1-NixOS SMP PREEMPT_DYNAMIC Fri Feb 6 15:57:45 UTC 2026
More info
The error message itself suggests setting limit_val_batches to 0.0, implying that doing so should resolve the issue, but it doesn't.