-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Crashlytics agent evals #9516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
69c0fd3 to
8321bf4
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes significant improvements by adding e2e tests for Crashlytics, introducing memory interaction and assertion negation for tests, and refactoring agent-evals to a CommonJS library. The use of a shared renderTemplate for mocks is a great change to prevent test drift. My review has identified a few issues. There are some incorrect paths in the test configuration (.mocharc.yml and package.json) and a critical path issue in the GeminiCliRunner that will likely cause tests to fail. I've also noted some debug console.log statements that should be removed and some copy-paste errors in the new template README files. Addressing these points will improve the robustness and clarity of the new testing infrastructure.
8321bf4 to
e3e7221
Compare
|
|
||
| export const getEnvironmentWithIosApp = { | ||
| firebase_get_environment: toMockContent( | ||
| renderTemplate({ ...BASE_ENVIRONMENT_CONFIG, detectedAppIds: { [IOS_APP_ID]: IOS_BUNDLE_ID } }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
templates are 🔥
samedson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some questions in there, but excited to get this in!
samedson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just have that one nit on the dirs variable
286aed7 to
f752ee4
Compare
…ional test coverage across supported platforms
8872903 to
c210cdc
Compare
joehan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits and small q's
| await run.type("/crashlytics:connect"); | ||
| await run.expectToolCalls(["firebase_get_environment"]); | ||
|
|
||
| await run.expectText("prioritize"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly for my own understanding - why do we look for the word 'prioritize' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end of the crashlytics:connect prompt asks the agent to ask the user whether they would like to take either of the following actions --
- Prioritize the most impactful stability issues
- Diagnose and propose a fix for a crash
I'm just making sure that it asks that question.
Description
Note: the bulk of the changes are in the template apps which are just small skeleton apps for the purpose of a smoke test for auto detection using the most common cases. The content of those apps is largely irrelevant but leads to quite a few file changes.
Scenarios Tested
Sample Commands
cd scripts/agent-evals
npm run test:dev