Add Crashlytics agent evals #9516

schnecle · 2025-11-24T17:34:27Z

Description

Adds e2e tests with gemini cli for Crashlytics
Adds functionality to interact with memories
Adds expectation negation. Chose to use "dont" to keep readability high (e.g. run.dont.expectText)
Converts agent-evals into a commonjs library so that we can pull in dependencies from core firebase-tools. This is demonstrated with the firebase_get_environment tool which now has an exported render function that can be used in mocks, preventing test drift.

Note: the bulk of the changes are in the template apps which are just small skeleton apps for the purpose of a smoke test for auto detection using the most common cases. The content of those apps is largely irrelevant but leads to quite a few file changes.

Scenarios Tested

See scripts/agent-evals/tests/crashlytics/connect.spec.ts

Sample Commands

cd scripts/agent-evals
npm run test:dev

schnecle · 2025-11-24T17:38:11Z

/gemini review

gemini-code-assist

Code Review

This pull request makes significant improvements by adding e2e tests for Crashlytics, introducing memory interaction and assertion negation for tests, and refactoring agent-evals to a CommonJS library. The use of a shared renderTemplate for mocks is a great change to prevent test drift. My review has identified a few issues. There are some incorrect paths in the test configuration (.mocharc.yml and package.json) and a critical path issue in the GeminiCliRunner that will likely cause tests to fail. I've also noted some debug console.log statements that should be removed and some copy-paste errors in the new template README files. Addressing these points will improve the robustness and clarity of the new testing infrastructure.

scripts/agent-evals/src/runner/gemini-cli-runner.ts

scripts/agent-evals/.mocharc.yml

scripts/agent-evals/package.json

scripts/agent-evals/src/runner/gemini-cli-runner.ts

scripts/agent-evals/templates/crashlytics-flutter/README.md

scripts/agent-evals/templates/crashlytics-ios/README.md

samedson · 2025-11-24T21:05:34Z

scripts/agent-evals/src/mock/mocks/get-environment-mock.ts

+
+export const getEnvironmentWithIosApp = {
+  firebase_get_environment: toMockContent(
+    renderTemplate({ ...BASE_ENVIRONMENT_CONFIG, detectedAppIds: { [IOS_APP_ID]: IOS_BUNDLE_ID } }),


templates are 🔥

scripts/agent-evals/src/mock/mock-tools-main.ts

scripts/agent-evals/src/runner/gemini-cli-runner.ts

src/mcp/tools/core/get_environment.ts

package.json

samedson

Have some questions in there, but excited to get this in!

scripts/agent-evals/src/runner/gemini-cli-runner.ts

samedson

LGTM! Just have that one nit on the dirs variable

…ional test coverage across supported platforms

joehan

LGTM with some nits and small q's

scripts/agent-evals/src/runner/gemini-cli-runner.ts

joehan · 2025-11-25T20:56:54Z

scripts/agent-evals/src/tests/crashlytics/connect.spec.ts

+    await run.type("/crashlytics:connect");
+    await run.expectToolCalls(["firebase_get_environment"]);
+
+    await run.expectText("prioritize");


Mostly for my own understanding - why do we look for the word 'prioritize' here?

The end of the crashlytics:connect prompt asks the agent to ask the user whether they would like to take either of the following actions --

Prioritize the most impactful stability issues

Diagnose and propose a fix for a crash

I'm just making sure that it asks that question.

src/mcp/tools/core/get_environment.ts

…in gemini cli

github-project-automation bot added this to [Cloud] Extensions + Functions Nov 24, 2025

schnecle force-pushed the schnecle/add-agent-evals branch from 69c0fd3 to 8321bf4 Compare November 24, 2025 17:35

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

schnecle force-pushed the schnecle/add-agent-evals branch from 8321bf4 to e3e7221 Compare November 24, 2025 20:53

schnecle requested review from joehan and samedson November 24, 2025 20:53

schnecle marked this pull request as ready for review November 24, 2025 20:59