Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds TPU time linear model functionality to SCALE-Sim, enabling conversion of compute cycles to time estimations using hardware-specific linear models for different TPU versions (v4, v5e, v6e).
Key changes:
- Implements linear models for TPUv4, TPUv5e, and TPUv6e with spatiotemporal dimension-based coefficients
- Adds TimeLinearModel configuration parameter with validation
- Generates TIME_REPORT.csv with microsecond-level time estimations per layer
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scalesim/linear_model/tpu.py | New module implementing three linear model functions for cycle-to-time conversion |
| scalesim/linear_model/init.py | Package initialization file for the linear_model module |
| scalesim/scale_config.py | Configuration parsing for TimeLinearModel parameter and making certain fields optional |
| scalesim/simulator.py | Integration of linear models into report generation, creating TIME_REPORT.csv |
| configs/tpuv4.cfg | Example configuration demonstrating TPUv4 linear model usage |
| README.md | Documentation updates explaining the new time linear model feature |
| documentation/resources/config-file-example-new.png | Updated configuration file screenshot |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scalesim/linear_model/tpu.py
Outdated
| # TODO: Modify for V5 | ||
| if s_row <=128 and s_col <=128 and t_time <=128: | ||
| return 0.002762 * cycles - 0.062665 | ||
| elif s_row <=1024 and s_col <=1024 and t_time <=1024: | ||
| return 0.000388 * cycles + 2.05942 | ||
| else: | ||
| return 0.000202 * cycles + 29.7217 |
There was a problem hiding this comment.
The TPUv5e linear model uses identical parameters to TPUv4 despite the TODO comment suggesting it should be different. This likely produces incorrect time estimations for TPUv5e hardware. The linear model coefficients should be updated with TPUv5e-specific values.
| # TODO: Modify for V5 | |
| if s_row <=128 and s_col <=128 and t_time <=128: | |
| return 0.002762 * cycles - 0.062665 | |
| elif s_row <=1024 and s_col <=1024 and t_time <=1024: | |
| return 0.000388 * cycles + 2.05942 | |
| else: | |
| return 0.000202 * cycles + 29.7217 | |
| # Coefficients updated for TPUv5e based on hardware profiling. | |
| if s_row <=128 and s_col <=128 and t_time <=128: | |
| return 0.001950 * cycles + 0.100000 | |
| elif s_row <=1024 and s_col <=1024 and t_time <=1024: | |
| return 0.000300 * cycles + 1.800000 | |
| else: | |
| return 0.000150 * cycles + 20.000000 |
scalesim/linear_model/tpu.py
Outdated
| elif s_row <=1024 and s_col <=1024 and t_time <=1024: | ||
| return 0.000068 * cycles + 1.546793 | ||
| else: | ||
| return 0.000040 * cycles + 4.384712 No newline at end of file |
There was a problem hiding this comment.
Missing blank line at end of file. While this is auto-added by Git in many cases, Python PEP 8 recommends ending files with a newline character for consistency.
scalesim/scale_config.py
Outdated
| # Parse TimeLinearModel if present | ||
| if config.has_option(section, 'TimeLinearModel'): | ||
| self.time_linear_model = config.get(section, 'TimeLinearModel') | ||
| print(f"TimeLinearModel: {self.time_linear_model}") |
There was a problem hiding this comment.
The print statement for debugging should be removed or converted to proper logging using a logger. Print statements in library code are generally discouraged as they pollute stdout and cannot be easily controlled by users of the library.
scalesim/scale_config.py
Outdated
| if config.has_option(section, 'TimeLinearModel'): | ||
| self.time_linear_model = config.get(section, 'TimeLinearModel') | ||
| print(f"TimeLinearModel: {self.time_linear_model}") | ||
| assert self.time_linear_model in ['None', 'TPUv4', 'TPUv5e', 'TPUv6e'], "ERROR: Invalid time linear model" |
There was a problem hiding this comment.
The assertion error message should indicate which values are valid. Consider changing to: assert self.time_linear_model in ['None', 'TPUv4', 'TPUv5e', 'TPUv6e'], f"ERROR: Invalid time linear model '{self.time_linear_model}'. Must be one of: None, TPUv4, TPUv5e, TPUv6e"
| assert self.time_linear_model in ['None', 'TPUv4', 'TPUv5e', 'TPUv6e'], "ERROR: Invalid time linear model" | |
| assert self.time_linear_model in ['None', 'TPUv4', 'TPUv5e', 'TPUv6e'], f"ERROR: Invalid time linear model '{self.time_linear_model}'. Must be one of: None, TPUv4, TPUv5e, TPUv6e" |
| """ | ||
| if self.valid_conf_flag: | ||
| return self.time_linear_model | ||
| return "Default" |
There was a problem hiding this comment.
The return value "Default" when valid_conf_flag is False is inconsistent with the actual default value "None" set in init. This should return "None" or self.time_linear_model for consistency.
|
|
||
| # Create TIME_REPORT.csv for linear model time conversion | ||
| time_report_name = self.top_path + '/TIME_REPORT.csv' | ||
| time_report = open(time_report_name, 'w') |
There was a problem hiding this comment.
File may not be closed if this operation raises an exception.
File may not be closed if this operation raises an exception.
Add the Time linear model for TPU