From df928dbc3ec8692daa66f7840c03fb8d2e13acf8 Mon Sep 17 00:00:00 2001 From: Kavyansh Tyagi <142140238+KAVYANSHTYAGI@users.noreply.github.com> Date: Wed, 28 May 2025 15:02:59 +0530 Subject: [PATCH 1/5] Update profiler.rst --- docs/source-pytorch/tuning/profiler.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/source-pytorch/tuning/profiler.rst b/docs/source-pytorch/tuning/profiler.rst index 1ff7c24ff7dbb..73ba5364dc8f5 100644 --- a/docs/source-pytorch/tuning/profiler.rst +++ b/docs/source-pytorch/tuning/profiler.rst @@ -11,6 +11,19 @@ Find bottlenecks in your code .. Add callout items below this line +.. warning:: + + Do **not** wrap ``Trainer.fit()``, ``Trainer.validate()``, or similar Trainer methods inside a manual ``torch.profiler.profile`` context manager. + This will cause unexpected crashes and cryptic errors due to incompatibility between PyTorch Profiler's context and Lightning's training loop. + Instead, use the ``profiler`` argument of the ``Trainer``: + + .. code-block:: python + + trainer = pl.Trainer( + profiler="pytorch", # This is the correct and supported way + ... + ) + .. displayitem:: :header: Basic :description: Learn to find bottlenecks in the training loop. From 82d7717e63a69ca1c020caf7f19bf87282aa1546 Mon Sep 17 00:00:00 2001 From: Kavyansh Tyagi <142140238+KAVYANSHTYAGI@users.noreply.github.com> Date: Wed, 28 May 2025 15:37:47 +0530 Subject: [PATCH 2/5] Update profiler.rst --- docs/source-pytorch/tuning/profiler.rst | 33 ++++++++++++++++--------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/docs/source-pytorch/tuning/profiler.rst b/docs/source-pytorch/tuning/profiler.rst index 73ba5364dc8f5..ddb3704b4c79b 100644 --- a/docs/source-pytorch/tuning/profiler.rst +++ b/docs/source-pytorch/tuning/profiler.rst @@ -4,25 +4,36 @@ Find bottlenecks in your code ############################# -.. raw:: html - -
-
- -.. Add callout items below this line - .. warning:: - Do **not** wrap ``Trainer.fit()``, ``Trainer.validate()``, or similar Trainer methods inside a manual ``torch.profiler.profile`` context manager. - This will cause unexpected crashes and cryptic errors due to incompatibility between PyTorch Profiler's context and Lightning's training loop. - Instead, use the ``profiler`` argument of the ``Trainer``: + **Do not wrap** ``Trainer.fit()``, ``Trainer.validate()``, or other Trainer methods + inside a manual ``torch.profiler.profile`` context manager. + This will cause unexpected crashes and cryptic errors due to incompatibility between + PyTorch Profiler's context management and Lightning's internal training loop. + Instead, always use the ``profiler`` argument in the ``Trainer`` constructor. + + Example (correct usage): .. code-block:: python + import pytorch_lightning as pl + trainer = pl.Trainer( - profiler="pytorch", # This is the correct and supported way + profiler="pytorch", # <- This enables built-in profiling safely! ... ) + trainer.fit(model, train_dataloaders=...) + + **References:** + - https://github.com/pytorch/pytorch/issues/88472 + - https://github.com/Lightning-AI/lightning/issues/16958 + +.. raw:: html + +
+
+ +.. Add callout items below this line .. displayitem:: :header: Basic From c1c2f869d856fffe5bd940be2f96e2a557c38a89 Mon Sep 17 00:00:00 2001 From: Kavyansh Tyagi <142140238+KAVYANSHTYAGI@users.noreply.github.com> Date: Wed, 28 May 2025 16:26:15 +0530 Subject: [PATCH 3/5] Update trainer.py --- src/lightning/pytorch/trainer/trainer.py | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/lightning/pytorch/trainer/trainer.py b/src/lightning/pytorch/trainer/trainer.py index 8e4e2de97fd6a..26ef2c1ccc164 100644 --- a/src/lightning/pytorch/trainer/trainer.py +++ b/src/lightning/pytorch/trainer/trainer.py @@ -264,6 +264,14 @@ def __init__( profiler: To profile individual steps during training and assist in identifying bottlenecks. Default: ``None``. + .. note:: + Do **not** use a manual ``torch.profiler.profile`` context manager around + ``Trainer.fit()``, ``Trainer.validate()``, etc. + This will lead to internal errors and cryptic crashes due to incompatibility between + PyTorch Profiler and Lightning's training loop. + Always use this ``profiler`` argument to enable profiling in Lightning. + + detect_anomaly: Enable anomaly detection for the autograd engine. Default: ``False``. From c6c8d0ec2274c3d763c9692e77c3429b264d73c1 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 28 May 2025 11:03:43 +0000 Subject: [PATCH 4/5] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/source-pytorch/tuning/profiler.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/tuning/profiler.rst b/docs/source-pytorch/tuning/profiler.rst index ddb3704b4c79b..b4aa4e9e122cd 100644 --- a/docs/source-pytorch/tuning/profiler.rst +++ b/docs/source-pytorch/tuning/profiler.rst @@ -7,7 +7,7 @@ Find bottlenecks in your code .. warning:: **Do not wrap** ``Trainer.fit()``, ``Trainer.validate()``, or other Trainer methods - inside a manual ``torch.profiler.profile`` context manager. + inside a manual ``torch.profiler.profile`` context manager. This will cause unexpected crashes and cryptic errors due to incompatibility between PyTorch Profiler's context management and Lightning's internal training loop. Instead, always use the ``profiler`` argument in the ``Trainer`` constructor. From bb40babce437fc3c8dd1d765112b4995c3b1e7fc Mon Sep 17 00:00:00 2001 From: Kavyansh Tyagi <142140238+KAVYANSHTYAGI@users.noreply.github.com> Date: Mon, 2 Jun 2025 13:13:22 +0530 Subject: [PATCH 5/5] Update profiler.rst --- docs/source-pytorch/tuning/profiler.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source-pytorch/tuning/profiler.rst b/docs/source-pytorch/tuning/profiler.rst index b4aa4e9e122cd..792386e4846a2 100644 --- a/docs/source-pytorch/tuning/profiler.rst +++ b/docs/source-pytorch/tuning/profiler.rst @@ -26,7 +26,6 @@ Find bottlenecks in your code **References:** - https://github.com/pytorch/pytorch/issues/88472 - - https://github.com/Lightning-AI/lightning/issues/16958 .. raw:: html