Migrate DotProductAttention to NNX #2198

hsuan-lun-chiang · 2025-08-18T11:19:00Z

Description

Migrate DotProductAttention to NNX.

Use nnx_wrappers.ToNNX to bridge DotProductAttention, which imported from NVIDIA's transformer library, to NNX.

Tests

Train Gemma-2b with attention=cudnn_flash_te, which use DotProductAttention:

python3 -m MaxText.train  MaxText/configs/base.yml run_name=gpt3-train-run base_output_directory=gs://maxtext-test/gemma-train/18/ max_target_length=128 model_name=gemma-2b dataset_type=synthetic steps=10 hardware=gpu attention=cudnn_flash_te

Logs - After Migration
Logs - Before Migration

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

bvandermoon

Thank you @hsuan-lun-chiang. Could you check why the GPU integration test is failing?

bvandermoon · 2025-08-18T23:53:12Z

MaxText/layers/attention_op.py

+      dummy_query_prefill = jnp.zeros((1, self.max_target_length, self.num_query_heads, config.head_dim), dtype=self.dtype)
+      dummy_key_prefill = jnp.zeros((1, self.max_target_length, self.num_kv_heads, config.head_dim), dtype=self.dtype)
+      dummy_value_prefill = jnp.zeros((1, self.max_target_length, self.num_kv_heads, config.head_dim), dtype=self.dtype)


@cgarciae are zeros the right value here?

hsuan-lun-chiang · 2025-08-19T06:16:28Z

Thank you @hsuan-lun-chiang. Could you check why the GPU integration test is failing?

Sure! It was causes by None being casting to uint8, fixed it. Thank you.

bvandermoon

LGTM. Thank you @hsuan-lun-chiang. Did you run the description test on a GPU VM? Let's get @cgarciae's thoughts as well

hsuan-lun-chiang · 2025-08-21T01:16:56Z

LGTM. Thank you @hsuan-lun-chiang. Did you run the description test on a GPU VM? Let's get @cgarciae's thoughts as well

Happy to help! Yes, I ran the test on a GPU VM with A100 80GB.

bvandermoon reviewed Aug 18, 2025

View reviewed changes

Migrate DotProductAttention to NNX

16ad76e

hsuan-lun-chiang force-pushed the feat/Migrate-DotProductAttention-to-NNX branch from bd06740 to 16ad76e Compare August 19, 2025 05:24

hsuan-lun-chiang marked this pull request as ready for review August 20, 2025 02:22

hsuan-lun-chiang requested review from gobbleturk, khatwanimohit, vipannalla, RissyRan, richjames0, gagika, shralex, yangyuwei, SurbhiJainUSC, hengtaoguo, A9isha, aireenmei and NuojCheng as code owners August 20, 2025 02:22

hsuan-lun-chiang requested a review from bvandermoon August 20, 2025 02:26

bvandermoon approved these changes Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate DotProductAttention to NNX #2198

Migrate DotProductAttention to NNX #2198

Uh oh!

hsuan-lun-chiang commented Aug 18, 2025

Uh oh!

bvandermoon left a comment

Uh oh!

bvandermoon Aug 18, 2025

Uh oh!

hsuan-lun-chiang commented Aug 19, 2025

Uh oh!

bvandermoon left a comment •

edited

Loading

Uh oh!

hsuan-lun-chiang commented Aug 21, 2025

Uh oh!

Uh oh!

Migrate DotProductAttention to NNX #2198

Are you sure you want to change the base?

Migrate DotProductAttention to NNX #2198

Uh oh!

Conversation

hsuan-lun-chiang commented Aug 18, 2025

Description

Tests

Checklist

Uh oh!

bvandermoon left a comment

Choose a reason for hiding this comment

Uh oh!

bvandermoon Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

hsuan-lun-chiang commented Aug 19, 2025

Uh oh!

bvandermoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsuan-lun-chiang commented Aug 21, 2025

Uh oh!

Uh oh!

bvandermoon left a comment •

edited

Loading