-
Notifications
You must be signed in to change notification settings - Fork 19.7k
Support tpu tests allowing tpu precision for matmul #21887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @wenyi-guo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness of Keras's test suite when running on Tensor Processing Units (TPUs). It achieves this by re-integrating a large number of tests that were previously skipped on TPUs and by introducing a mechanism to define and apply specific numerical tolerances for assertions when executing on TPU hardware. This ensures that Keras functionalities are thoroughly validated against the unique precision characteristics of TPUs, leading to more reliable and performant models on these accelerators. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables a large number of TPU tests by introducing TPU-specific precision tolerances. The core changes in test_case.py to support tpu_atol and tpu_rtol are well-implemented. However, I've identified a critical issue in the TensorFlow TPU detection logic that could break tests in non-TPU environments. Additionally, there's a bug in a numpy test and several instances of extremely loose tolerances that could mask real issues on TPUs. Addressing these points will improve the robustness and reliability of the test suite.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21887 +/- ##
==========================================
+ Coverage 82.57% 82.61% +0.03%
==========================================
Files 577 578 +1
Lines 59650 59819 +169
Branches 9356 9391 +35
==========================================
+ Hits 49254 49417 +163
+ Misses 7984 7979 -5
- Partials 2412 2423 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
hertschuh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes!
keras/src/ops/linalg_test.py
Outdated
| if testing.jax_uses_tpu(): | ||
| self.skipTest("Skipping test with JAX + TPU as it's not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh weird, what's the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotImplementedError: MLIR translation rule for primitive 'eig' not found for platform tpu
hertschuh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to run the tests.
Removing review to re-trigger TPU tests
hertschuh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to re-trigger TPU tests
support more tpu tests