Skip to content

Conversation

ycchenzheng
Copy link
Collaborator

@ycchenzheng ycchenzheng commented Aug 15, 2025

Description

Add a new flag --enable_rich_metrics and output a JSON file that has the following data which is more comprehensive:

{
  "avg_step_time": "20s",
  "min_step_time": "19s on step 15",  # This can be split for better parsing
  "max_step_time": "25s on step 35",
  "goodput": "85%",
  "MFU": "100 Tflops/s"
  "badput_breakdown": {
    "checkpoint_loading": "2%",
    "data_loading": "5%",
    "failure": "5%",
    "reshard": "0.5%"
  },
  "links": {
    "cloud_logging": "...",
    "goodput_monitor": "...",
    "disruption_dashboard": "..."
  }
}

This will make it far easier to actually parse through the major landmarks needed in tests

FIXES: b/409799213

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@ycchenzheng ycchenzheng force-pushed the chzheng/modularizing_recipes branch 4 times, most recently from d2d3f26 to 61bd09d Compare August 20, 2025 17:15
@ycchenzheng ycchenzheng changed the title Add data to metrics_for_gcs Add new flag --enable_rich_metrics Aug 20, 2025
@ycchenzheng ycchenzheng force-pushed the chzheng/modularizing_recipes branch 2 times, most recently from 26f0c5e to 9301a31 Compare August 25, 2025 21:33
@ycchenzheng ycchenzheng force-pushed the chzheng/modularizing_recipes branch from 9301a31 to 89523e4 Compare August 25, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant