Skip to content

Add xDeepONet family to experimental models#1576

Open
wdyab wants to merge 8 commits intoNVIDIA:mainfrom
wdyab:pr/xdeeponet
Open

Add xDeepONet family to experimental models#1576
wdyab wants to merge 8 commits intoNVIDIA:mainfrom
wdyab:pr/xdeeponet

Conversation

@wdyab
Copy link
Copy Markdown

@wdyab wdyab commented Apr 17, 2026

Summary

Introduces physicsnemo.experimental.models.xdeeponet — a config-driven,
unified implementation of eight DeepONet-based operator-learning
architectures for both 2D and 3D spatial domains:

  • deeponet, u_deeponet, fourier_deeponet, conv_deeponet,
    hybrid_deeponet — single-branch variants
  • mionet, fourier_mionet — two-branch multi-input variants
  • tno — Temporal Neural Operator (branch2 = previous solution)

This is the first of several PRs restructuring the Neural Operator
Factory per discussion with code owners. Subsequent PRs will upstream
xFNO (experimental) and refactor the reservoir-simulation NOF example
to consume these library models.

Closes Issue #1575

Key features

  • Composable spatial branches (Fourier, UNet, Conv in any combination)
  • Three decoder types: mlp, conv, temporal_projection
  • Automatic spatial padding
  • Automatic trunk coordinate extraction (time or grid)
  • Optional adaptive pooling for resolution-agnostic training

Design decisions (per @coreyjadams guidance)

  • Placed under experimental/ per the convention for new models.
  • Custom UNet dropped.
  • Tests live at test/experimental/models/ for CI coverage.

Checklist

  • I am familiar with the Contributing Guidelines
  • New tests cover these changes (29 tests under
    test/experimental/models/test_xdeeponet.py)
  • The documentation is up to date (package README, docstrings)
  • The CHANGELOG.md is up to date
  • An issue (Add xDeepONet family to experimental models #1575) is linked to this pull request
  • I have followed the Models Implementation Coding Standards

Test plan

  • 29 unit tests pass locally (branch shapes, wrappers, variants,
    decoder types, temporal projection, target_times override,
    gradient flow, adaptive pooling)
  • ruff check, ruff format, interrogate, markdownlint,
    license pre-commit hooks pass
  • No modifications to existing code — pure addition under
    physicsnemo/experimental/ and test/experimental/
  • All commits signed off per DCO

Related

Introduces physicsnemo.experimental.models.xdeeponet — a config-driven,
unified implementation of eight DeepONet-based operator-learning
architectures for both 2D and 3D spatial domains:

- deeponet, u_deeponet, fourier_deeponet, conv_deeponet, hybrid_deeponet
  (single-branch variants)
- mionet, fourier_mionet (two-branch multi-input variants)
- tno (Temporal Neural Operator; branch2 = previous solution)

Features:
- Composable spatial branches (Fourier, UNet, Conv in any combination)
- Three decoder types: mlp, conv, temporal_projection
- Automatic spatial padding to multiples of 8
- Automatic trunk coordinate extraction (time or grid)
- Optional adaptive pooling (internal_resolution) for
  resolution-agnostic training and inference

Uses physicsnemo.models.unet.UNet as the UNet sub-module; a small
internal adapter tiles a short time axis to reuse the library's 3D UNet
for 2D spatial branches.  Imports spectral, convolutional, and MLP
layers from physicsnemo.nn and physicsnemo.models.mlp.

Includes 29 unit tests covering all variants (2D/3D), decoder types,
temporal projection, target_times override, gradient flow, and
adaptive pooling.

Related discussion with code owners:
- Placed under experimental/ per PhysicsNeMo convention for new models.
- Custom UNet dropped in favour of library UNet.
- Tests under test/experimental/models/ for CI coverage.

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR adds the xdeeponet experimental package — a config-driven implementation of eight DeepONet-based architectures for 2D and 3D operator learning — with all previously identified issues now addressed: physicsnemo.Module inheritance, dual-branch construction guards, deterministic state_dict via output_window, MLPBranch num_layers guard, jaxtyping annotations with shape validation, case-insensitive decoder_type, and _SinActivation wrapper. Two minor P2 style items remain in deeponet.py: a shared activation instance in _build_conv_encoder and an implicit b2_out assignment pattern that static analysis tools will flag as possibly-undefined.

Important Files Changed

Filename Overview
physicsnemo/experimental/models/xdeeponet/deeponet.py Core 2D/3D DeepONet architectures with comprehensive construction-time guards, case-insensitive config handling, and jaxtyping annotations; two minor style issues: shared activation instance in _build_conv_encoder and b2_out possibly-undefined for static analysis.
physicsnemo/experimental/models/xdeeponet/branches.py TrunkNet, MLPBranch, SpatialBranch, SpatialBranch3D building blocks with correct lazy-init, jaxtyping annotations, shape guards, and the num_layers >= 2 guard for MLPBranch.
physicsnemo/experimental/models/xdeeponet/wrappers.py DeepONetWrapper and DeepONet3DWrapper inherit from physicsnemo.Module, perform correct spatial padding/crop and trunk coordinate extraction; no issues found.
physicsnemo/experimental/models/xdeeponet/padding.py Dimension-agnostic right-side padding helpers with correct replicate and constant modes; edge cases handled properly.
test/experimental/models/test_xdeeponet.py 29 unit tests covering branch shapes, all 8 variants, both 2D and 3D wrappers, temporal projection, target_times override, gradient flow, and error cases.
physicsnemo/experimental/models/xdeeponet/init.py Clean package init exporting all public symbols; no issues.
physicsnemo/experimental/models/xdeeponet/README.md Comprehensive module README with variant table, quick-start examples, config schema, and references.
CHANGELOG.md CHANGELOG entry for xDeepONet addition; correctly placed under Added.

Reviews (11): Last reviewed commit: "xdeeponet: fix _build_conv_encoder for "..." | Re-trigger Greptile

Comment thread physicsnemo/experimental/models/xdeeponet/wrappers.py Outdated
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py Outdated
Comment thread physicsnemo/experimental/models/xdeeponet/branches.py
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
wdyab added 3 commits April 17, 2026 11:35
Fix six issues flagged by the Greptile review:

- Make DeepONetWrapper / DeepONet3DWrapper inherit from
  physicsnemo.core.module.Module (MOD-001). Core DeepONet / DeepONet3D
  also pass proper MetaData dataclasses.
- Raise ValueError at __init__ when mionet / fourier_mionet / tno are
  constructed without branch2_config (prevents silent degradation to a
  single-branch model).
- Add optional output_window constructor parameter so the
  temporal_projection decoder registers temporal_head at __init__,
  producing a deterministic state_dict that round-trips cleanly.
  set_output_window is retained for backwards compatibility.
- Raise ValueError from MLPBranch when num_layers < 2.
- Convert public docstrings to r-prefixed raw strings with
  Parameters / Forward / Outputs sections and LaTeX shape notation
  per MOD-003.
- Add jaxtyping.Float annotations and torch.compiler.is_compiling()
  guarded shape validation to all public forward methods
  (MOD-005, MOD-006).

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Resolves CHANGELOG.md conflict under "### Added" by keeping the
xDeepONet entry alongside the updated GLOBE bullet from main.

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Fix the new P1 issue flagged in the second Greptile review and close
two secondary gaps the summary called out:

- DeepONet.forward / DeepONet3D.forward: raise RuntimeError when
  decoder_type='temporal_projection' is used but temporal_head is
  still None (i.e. the user neither passed output_window at
  construction nor called set_output_window before forward).
  Previously the silent ``if temporal_head is not None`` skip
  returned (B, H, W, width) instead of (B, H, W, K).
- Deduplicate the VALID_VARIANTS list: pulled to a module-level
  _VALID_VARIANTS tuple; both DeepONet and DeepONet3D still expose it
  as the VALID_VARIANTS class attribute for a stable public API.
- Extend the parametrized test lists to cover fourier_deeponet,
  hybrid_deeponet, and fourier_mionet, and add a dedicated
  TestFourierBranchPaths class with num_fourier_layers > 0 so the
  spectral-conv code path in SpatialBranch / SpatialBranch3D is
  actually exercised in CI.
- Add a TestTemporalProjectionGuard::test_forward_without_output_window_raises
  regression test for the new RuntimeError.

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Two new P1 issues flagged on 85076f6:

- Case-sensitive decoder_type check: __init__ lowered ``decoder_type``
  into ``self.decoder_type`` but then branched on the raw argument
  (``if decoder_type == "temporal_projection":``) and forwarded the
  raw value to ``_build_decoder``.  A user passing
  ``decoder_type="MLP"`` or ``"Temporal_Projection"`` ended up with
  ``Unknown decoder_type: MLP`` bubbling out of ``_build_decoder``.
  Both branches of the check now use ``self.decoder_type``; same fix
  in ``DeepONet3D.__init__``.
- MLP branch + decoder_type='temporal_projection' silently returned
  (B, T, width) instead of (B, K) because the MLP-branch path in
  ``forward`` never consulted ``self._temporal_projection``.  The
  incompatibility is static, so reject it at __init__ with a
  descriptive ``ValueError`` rather than at forward.  Same guard in
  ``DeepONet3D.__init__``.

Regression tests: ``TestDecoderTypeNormalization`` (mixed-case
``"MLP"`` / ``"Temporal_Projection"`` accepted) and
``TestMLPBranchTemporalProjectionGuard`` (2D and 3D both reject the
invalid combination).

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
Proactive audit on top of Greptile's round-4 findings.  All
plausible silent-degradation combinations at the config boundary
now fail loudly at __init__ instead of producing wrong shapes or
cryptic PyTorch errors at forward time.

Construction-time guards added to both DeepONet and DeepONet3D:

- Unknown decoder_type is rejected up front against a new
  module-level ``_VALID_DECODER_TYPES`` set (previously deferred
  to ``_build_decoder`` and only surfaced on the non-temporal
  branch).
- MLPBranch branch1 paired with decoder_type='conv' is rejected
  (would otherwise crash inside ``Conv2d`` with a generic
  "Expected 3D or 4D input" message).  Unified with the existing
  temporal_projection guard into a single check.
- MLPBranch branch1 paired with a non-MLPBranch branch2 is
  rejected (element-wise product assumed matching ranks;
  previously broadcast nonsensically or raised a cryptic dim
  mismatch at forward).

Regression tests:

- ``TestMLPBranchConvDecoderGuard`` -- 2D/3D
- ``TestMixedBranchTypeGuard`` -- 2D/3D
- ``TestInvalidDecoderTypeGuard`` -- 2D/3D

Full suite: 47 passed.

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
Comment thread physicsnemo/experimental/models/xdeeponet/deeponet.py
``_build_conv_encoder`` called ``get_activation`` directly without
the ``sin`` special-case handling used in every other activation
site in ``branches.py``.  Passing
``{"encoder": {"type": "conv", "activation_fn": "sin"}}`` therefore
raised ``KeyError: Activation function sin not found``.

``torch.sin`` is a bare callable and cannot be placed inside an
``nn.Sequential`` (which requires ``nn.Module`` instances), so the
fix introduces a small ``_SinActivation`` wrapper module alongside
``_build_conv_encoder``.  The helper is module-level and is called
from both ``DeepONet`` and ``DeepONet3D``; only one fix site exists
despite the function being invoked from both classes.

Regression test ``TestConvEncoderSinActivation`` constructs a
multi-layer conv encoder with ``activation_fn="sin"`` and runs a
forward pass to confirm neither the ``KeyError`` nor a
``nn.Sequential`` ``TypeError`` resurface.

Signed-off-by: wdyab <wdyab@nvidia.com>
Made-with: Cursor
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@wdyab
Copy link
Copy Markdown
Author

wdyab commented Apr 22, 2026

All Greptile feedback addressed across five rounds. Main fixes: wrappers inherit from physicsnemo.Module, construction-time validation for all the config combinations that were silently producing wrong shapes, deterministic state_dict for the temporal-projection decoder, case-insensitive decoder_type, and a few smaller bits. All tests are green, pre-commit clean.
@coreyjadams @ram-cherukuri @peterdsharpe

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it standard practice to have documentation shipped in some markdown files? It is not a critical issue for this PR, but that's a significant shift from docs in docstring + rst files... I am worried this will lead to more fragmenattion of the API docs,, which would make information more difficult to discover

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's ok to ship some informative docs right here with the code like this. It's especially helpful in the days of agents and such.

That said it's no substitute for API docs like we ship with rst on the physicsnemo docs, either, and those should be the source of truth.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr it's ok to drop hints here but not fully expect docs here, what do you think of that strategy @CharlelieLrt? We have some stuff like that with mesh, and I put some .md in the datapipes too. It's not about "how to use it" and more about "this is why it's designed this way in the code"

:math:`K` points instead of the times extracted from ``x``,
enabling autoregressive temporal bundling where :math:`K \neq T`.

Outputs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples section missing in nearly all doctrings (at least those that are user-facing)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these test are kinda worthless. It should be:

  • test for the constructor (both default parameters and custom ones) -> check all public attributes values
  • for all public methods, including the forward:
    • test instantiation + method non-regression vs. golden files
    • test loading from checkpoint (.mdlus) + method non-regression vs same golden file
  • test gradient flow
  • test compile
    Model and output should be as small and minimal as possible. No need to test internal submodule, layers and such, unless those are meant to be upstreamed to physicsnemo/nn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants