Migrate LlamaDecoderLayer to NNX #2178

bvandermoon · 2025-08-14T22:41:32Z

Description

Migrate LlamaDecoderLayer to NNX instead of Linen

Change module type to NNX
Initialize submodules in __init__ instead of __call__
Pass self.model_mode to module constructors instead of the one passed to call
Update to_linen_class to work for pipelining (thanks @cgarciae for this)
New init/apply functions in Transformer module to ensure the proper model_mode value is passed around
Add a new config flag enable_nnx to enable/disable NNX for this and future models (off by default)
- We will remove this flag and have everything in NNX when the inference memory issue is resolved

New PR due to some git issues. Addressed comments from #2155

Note: Continuing to investigate increased KVCache memory during inference. Considering if we should still merge this PR to unblock others waiting on this migration. Then continue investigating this in parallel

Tests

Llama2-7B train gives same memory/perf before/after on TPU VM. Can also continue training using a locally-generated checkpoint (with load_full_state_path) and an existing checkpoint with load_parameters_path:

python3 -m MaxText.train MaxText/configs/base.yml \
    run_name=<run_name> \
    base_output_directory=gs://<gcs_bucket> \
    dataset_type=synthetic \
    steps=10 \
    model_name=llama2-7b

Exact same peak memory allocation for jit_train_step in memory viewer before/after this change:
- Profile before
- Profile after
Successfully ran Llama3.1-8B decode on TPU VM. Can also run this from a checkpoint:

python3 -m MaxText.decode MaxText/configs/base.yml \
    model_name=llama2-7b \
    tokenizer_path=assets/tokenizer_llama3.tiktoken \
    tokenizer_type=tiktoken \
    scan_layers=false \
    per_device_batch_size=1 \
    ici_fsdp_parallelism=1 \
    ici_autoregressive_parallelism=-1 \
    max_prefill_predict_length=128 \
    max_target_length=256 \
    prompt="I love to" \
    attention=dot_product

KVCache performance integration test gives same perf before/after: https://diff.googleplex.com/#key=vkIlvvhKXSZ6
- Known issue around memory increase in the initial stats
Maxengine/jetstream giving same accuracy: https://diff.googleplex.com/#key=Q3vVNZ0LjpyW

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

MaxText/layers/llama2.py

gagika

Thanks, a few minor comments.

gagika · 2025-08-16T21:20:23Z

MaxText/layers/decoders.py

+    if self.model_mode == MODEL_MODE_PREFILL:
      inputs = nn.with_logical_constraint(inputs, logical_axis_names)
    else:
      inputs = nn.with_logical_constraint(inputs, logical_axis_names)


since you are touching this part, could you also fix it.

if/else blocks are doing the same thing, we can just call (without if/else):
inputs = nn.with_logical_constraint(inputs, logical_axis_names)

Updated, thanks @gagika

gagika · 2025-08-16T21:20:51Z

MaxText/layers/decoders.py

+    if self.model_mode == MODEL_MODE_PREFILL:
      lnx = nn.with_logical_constraint(lnx, logical_axis_names)
    else:
      lnx = nn.with_logical_constraint(lnx, logical_axis_names)


same as above, no need for if/else.

Updated, thanks @gagika

gagika · 2025-08-16T21:31:20Z

MaxText/layers/models.py

+  def init(self, *args, model_mode: str = MODEL_MODE_TRAIN, **kwargs):
+    """Initializes the model."""
+    module = self.clone(model_mode=model_mode)
+    return nn.Module.init(module, *args, **kwargs)
+
+  def apply(self, *args, model_mode: str = MODEL_MODE_TRAIN, **kwargs):
+    """Initializes the model."""
+    module = self.clone(model_mode=model_mode)
+    return nn.Module.apply(module, *args, **kwargs)


Could you comment why those functions with clones are needed?

Added a code comment for these. They are needed to ensure the same model_mode is passed to __init__ and __call__

gagika · 2025-08-16T21:50:15Z

MaxText/layers/llama2.py

@@ -61,7 +143,6 @@ def __call__(
      previous_chunk=None,
  ):
    cfg = self.config
-    mesh = self.mesh

    if model_mode == MODEL_MODE_PREFILL:


since your PR made model_mode a static argument, passing it in both __init__ and __call__ can be confusing. You could move activation_axis_names initialization in init function (or depend model_mode from config).

if model_mode == MODEL_MODE_PREFILL:
activation_axis_names = ("activation_batch", "prefill_activation_norm_length", "activation_embed")
else:
activation_axis_names = ("activation_batch", "activation_norm_length", "activation_embed")

Feel free to keep it as if if we expect model_mode to change during runtime, e.g. going from prefill to autoregressive decode.

cc @cgarciae

Thanks @gagika. Updated so that activation_axis_names is set in __init__ now. As a P1, we will want to remove the model_mode that is passed to __call__ later

dubstack · 2025-08-20T03:36:20Z

MaxText/layers/llama2.py

+    else:
+      seq_len = config.max_target_length
+
+    dummy_inputs_shape = (batch_size, seq_len, config.emb_dim)


I believe you want to use a shape that is sharded along the batch dim?

@dubstack can you clarify what you mean about "sharded shape"?

bvandermoon force-pushed the bvandermoon-llama branch 5 times, most recently from 6756c3e to 49d6cb0 Compare August 15, 2025 01:13

bvandermoon commented Aug 15, 2025

View reviewed changes

MaxText/layers/llama2.py Show resolved Hide resolved

bvandermoon force-pushed the bvandermoon-llama branch from 49d6cb0 to cc04158 Compare August 15, 2025 05:55

bvandermoon mentioned this pull request Aug 15, 2025

Add support for Qwen3-MoE Model #2092

Merged

4 tasks

bvandermoon marked this pull request as ready for review August 15, 2025 07:01

bvandermoon requested review from vipannalla, mitalisi, gpolovets1, mailvijayasingh, jrplatin, patemotter, Lumosis, richjames0, gobbleturk, khatwanimohit, RissyRan, gagika, shralex, yangyuwei, SurbhiJainUSC, hengtaoguo, A9isha, aireenmei and NuojCheng as code owners August 15, 2025 07:01

SurbhiJainUSC mentioned this pull request Aug 15, 2025

Migrate GemmaDecoderLayer and Gemma2DecoderLayer to NNX #2162

Draft

4 tasks

NuojCheng approved these changes Aug 15, 2025

View reviewed changes

bvandermoon force-pushed the bvandermoon-llama branch from cc04158 to e2d20ff Compare August 15, 2025 17:49

gagika reviewed Aug 16, 2025

View reviewed changes

This was referenced Aug 18, 2025

NNX Migration for Mistral models #2088

Draft

NNX Migration for Mixtral models #2166

Draft

bvandermoon mentioned this pull request Aug 19, 2025

port Transformer #2075

Merged

4 tasks

dubstack reviewed Aug 20, 2025

View reviewed changes

bvandermoon force-pushed the bvandermoon-llama branch 2 times, most recently from 642930d to d3c5290 Compare August 20, 2025 22:30

Migrate LlamaDecoderLayer to NNX

a79921b

bvandermoon force-pushed the bvandermoon-llama branch from d3c5290 to a79921b Compare August 20, 2025 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate LlamaDecoderLayer to NNX #2178

Migrate LlamaDecoderLayer to NNX #2178

Uh oh!

bvandermoon commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

gagika left a comment

Uh oh!

gagika Aug 16, 2025

Uh oh!

bvandermoon Aug 20, 2025

Uh oh!

gagika Aug 16, 2025

Uh oh!

bvandermoon Aug 20, 2025

Uh oh!

gagika Aug 16, 2025

Uh oh!

bvandermoon Aug 20, 2025

Uh oh!

gagika Aug 16, 2025

Uh oh!

gagika Aug 17, 2025

Uh oh!

bvandermoon Aug 20, 2025

Uh oh!

dubstack Aug 20, 2025

Uh oh!

cgarciae Aug 20, 2025

Uh oh!

Uh oh!

Migrate LlamaDecoderLayer to NNX #2178

Are you sure you want to change the base?

Migrate LlamaDecoderLayer to NNX #2178

Uh oh!

Conversation

bvandermoon commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

gagika left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bvandermoon commented Aug 14, 2025 •

edited

Loading