✨(qa-agent): Use GPT-5-Codex for testcase generation #3634

hoshinotsuyoshi · 2025-09-29T04:06:43Z

Issue

resolve: route06/liam-internal#5746

Why is this change needed?

Replace GPT-5-Nano with GPT-5-Codex model in QA Agent's testcase generation to leverage enhanced code generation capabilities for creating SQL test cases. GPT-5-Codex provides better understanding of code structure and produces more accurate SQL DML operations for testing database schemas.

Summary by CodeRabbit

Refactor
- Upgraded the AI engine used for automatic test-case generation to a newer model, improving consistency and reliability of generated tests.
- Simplified model behavior by removing certain internal verbosity and reasoning toggles, resulting in clearer, less variable outputs while preserving existing workflows and integrations.

Replace GPT-5-Nano with GPT-5-Codex model to leverage enhanced code generation capabilities for creating SQL test cases. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

changeset-bot · 2025-09-29T04:06:47Z

⚠️ No Changeset found

Latest commit: feb4822

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2025-09-29T04:06:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
liam-app	Ready	Preview	Comment	Sep 29, 2025 4:36am
liam-assets	Ready	Preview	Comment	Sep 29, 2025 4:36am
liam-storybook	Ready	Preview	Comment	Sep 29, 2025 4:36am

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
liam-docs	Ignored	Preview		Sep 29, 2025 4:36am
liam-erd-sample	Skipped			Sep 29, 2025 4:36am

coderabbitai · 2025-09-29T04:06:50Z

Walkthrough

Replaced the ChatOpenAI model from gpt-5-nano to gpt-5-codex in the test-case generation node and removed the reasoning and verbosity options; useResponsesApi remains true. No control-flow or exported API changes.

Changes

Cohort / File(s)	Summary
Model & options update `frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts`	Switched ChatOpenAI model from `gpt-5-nano` to `gpt-5-codex`; removed `reasoning` and `verbosity` options; `useResponsesApi: true` retained; no control-flow or public API changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Enable strict mode for QA agent test case generation #3536 — Modifies the ChatOpenAI binding in the same file (adds strict:true and schema-driven changes); strongly related to model/options edits.
feat: Update QA Agent to generate id and dmlOperations fields #2901 — Addresses removal/handling of reasoning in AIMessages for QA workflows; directly related to removing reasoning here.

Suggested reviewers

FunamaYukina
junkisai
NoritakaIkeda
MH4GF

Poem

A rabbit hops in code so sleek,
Swapped nano for codex this week.
Options trimmed, the trail runs clear,
Tests will hum and logs will cheer.
Thump-thump, I nap — the merge is near. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the main change by indicating the QA Agent will now use GPT-5-Codex for testcase generation, following conventional commit conventions without extraneous file lists or vague wording.
Description Check	✅ Passed	The pull request description matches the repository’s template by including an Issue section with the resolve link and a Why is this change needed? section that succinctly explains the rationale for switching to GPT-5-Codex in the QA Agent.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch qa-gpt-5-codex

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 992eb1f and feb4822.

📒 Files selected for processing (1)

frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Supabase Preview
GitHub Check: frontend-ci
GitHub Check: frontend-lint
GitHub Check: codeql / languages (javascript) / Perform CodeQL for javascript
GitHub Check: agent-deep-modeling
GitHub Check: Supabase Preview

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

supabase · 2025-09-29T04:06:59Z

Updates to Preview Branch (qa-gpt-5-codex) ↗︎

Deployments	Status	Updated
Database	✅	Mon, 29 Sep 2025 04:33:42 UTC
Services	✅	Mon, 29 Sep 2025 04:33:42 UTC
APIs	✅	Mon, 29 Sep 2025 04:33:42 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks	Status	Updated
Configurations	✅	Mon, 29 Sep 2025 04:33:42 UTC
Migrations	✅	Mon, 29 Sep 2025 04:33:42 UTC
Seeding	✅	Mon, 29 Sep 2025 04:33:42 UTC
Edge Functions	✅	Mon, 29 Sep 2025 04:33:42 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

github-actions · 2025-09-29T04:29:37Z

🤖 Agent Deep Modeling Execution

Started at: 2025-09-29 04:36:19 UTC

Command Output

@liam-hq/[email protected] execute-deep-modeling /home/runner/work/liam/liam/frontend/internal-packages/agent
pnpm test:integration src/createGraph.integration.test.ts

@liam-hq/[email protected] test:integration /home/runner/work/liam/liam/frontend/internal-packages/agent
vitest --watch=false --passWithNoTests --config vitest.config.integration.ts src/createGraph.integration.test.ts

RUN v3.2.4 /home/runner/work/liam/liam/frontend/internal-packages/agent

(node:7661) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use node --trace-warnings ... to show where the warning was created)

✅ [INFO] 2025-09-29T04:36:22.376Z
LangSmith Trace URL: https://smith.langchain.com/o/eed4d2d8-0bd8-4ca4-a452-4da88ef63fd6/projects/p/99d63e81-58f9-475a-8985-409b591e26b0?searchModel=%7B%22filter%22%3A%22and(eq(is_root%2C%20true)%2C%20and(eq(metadata_key%2C%20%5C%22thread_id%5C%22)%2C%20eq(metadata_value%2C%20%5C%226cb96e5f-5c78-4969-835f-5bec96bf959d%5C%22)))%22%7D

⏺ (pm):

Checklist:

Define scope, objectives, actors, and assumptions for a simple user management system.
Structure functional requirements by feature area with clear, testable “must” statements.
Identify core data entities to guide database design.
Specify acceptance criteria and success metrics for QA verification.
Document constraints, out-of-scope items, and risks.
Save the finalized requirements artifact for downstream DB and QA use.

Business Requirements Document (BRD)

Overview and Objectives

Build a simple user management system that lets end users register, sign in, and manage their profile, and lets administrators manage users, roles, and access, with auditable actions and basic email notifications.
Primary goals: secure account lifecycle, minimal friction for end users, and clear administrative oversight.

Scope

In scope: registration, email verification, login/logout, password reset, profile management, role/permission-based access control, admin user CRUD, search/filter, audit logging, and email notifications.
Out of scope (Phase 2+): SSO/social login, multi-tenant orgs, fine-grained resource authorization beyond users, mobile apps, MFA/2FA, SCIM provisioning, webhooks.

Actors and Roles

End User: registers, verifies email, signs in/out, resets password, updates profile.
Administrator: manages users, roles, and permissions; views audits; deactivates/reactivates accounts.
System: sends emails, enforces policies, logs audit events.

Key Assumptions

Single-tenant, US-based deployment; email is the unique identifier for accounts.
Web UI and REST API are both available.
Timestamps stored in UTC; display in user’s local time where applicable.
Passwords stored using industry-standard one-way hashing; emails delivered via SMTP-compatible provider.

Core Data Entities (for DB design guidance)

User: id, email (unique), email_verified, password, first_name, last_name, phone (opt), status [pending, active, suspended, deactivated], last_login_at, created_at, updated_at.
Role: id, name (unique), description, created_at, updated_at.
Permission: id, key (unique), description.
UserRole (junction): user_id, role_id, assigned_by, assigned_at.
RolePermission (junction): role_id, permission_id, created_at.
AuditEvent: id, event_type, actor_user_id (nullable), target_user_id (nullable), timestamp, ip_address (nullable), user_agent (nullable), details (json/text).
AuthToken: id, user_id, type [password_reset, email_verify, session_revocation], token, expires_at, used_at (nullable), created_at.
EmailQueue/Notification (optional): id, to, subject, template_key, payload (json), status, created_at, sent_at.

Functional Requirements
A. User Registration & Verification

The system must allow a new user to register with email, password, and optional name fields.
The system must require acceptance of Terms of Service and Privacy Policy at registration.
The system must send a verification email containing a single-use link valid for 72 hours.
The system must mark accounts as “pending” until email is verified; unverified users cannot access authenticated features.
The system must support resending verification emails and expiring prior tokens.

B. Authentication & Session Management

The system must allow users to sign in with email and password.
The system must prevent sign-in for suspended or deactivated accounts and display an appropriate message.
The system must lock an account for 15 minutes after 5 consecutive failed login attempts.
The system must support logout and invalidate the active session.
The system must record last successful login timestamp.

C. Password & Credential Management

The system must allow users to change their password when signed in (old + new password flow).
The system must allow password reset via email using a single-use token valid for 60 minutes.
The system must invalidate a reset token immediately after use or expiration.
The system must enforce a configurable password policy (e.g., minimum length, complexity).

D. Profile Management

The system must allow users to view and update profile fields: first_name, last_name, phone (optional).
The system must allow users to view account status and email verification state.
The system must restrict email changes to verified users and require re-verification on email change.

E. Authorization, Roles, and Permissions

The system must support role-based access control (RBAC) with at least two default roles: Admin and User.
The system must allow Admins to create, read, update, and delete roles.
The system must allow Admins to assign and revoke roles for any user.
The system must support permissions that can be attached to roles; Admins can manage role-permission mappings.
The system must enforce permissions on administrative features and protected endpoints.

F. Administrative User Management

The system must allow Admins to create a user (invite), update user attributes, deactivate/reactivate, and soft-delete users.
The system must allow Admins to reset a user’s password (send reset link) without viewing or setting the password directly.
The system must display a paginated, sortable, and filterable user list with columns: email, name, roles, status, created_at, last_login_at.
The system must support search by email and name, and filters by role, status, and verification state.
The system must support bulk actions for resend verification and deactivate/reactivate (up to 100 users per action).

G. Notifications & Email

The system must send transactional emails for registration verification, password reset, user invitation, and account status changes.
The system must template emails with variables (e.g., user name, verification link) for localization support.
The system must record send status and enable retry on transient failures.

H. Audit & Compliance

The system must capture audit events for: user created/updated/deactivated/reactivated/deleted, role assigned/revoked, login success/failure, password changed/reset, email verified/changed, and bulk actions.
The system must record actor, target, timestamp, and contextual details (IP and user agent when available).
The system must allow Admins to search and filter audit events by event type, actor, target, and date range.
The system must retain audit events for at least 12 months and support export (CSV).

I. Data Governance & Privacy

The system must allow Admins to permanently delete deactivated users and associated personal data, excluding records required for legal or audit retention.
The system must provide users the ability to download their profile data (JSON or CSV).
The system must mask sensitive data in admin views (e.g., never display full password or reset tokens).
The system must log and disclose privacy policy acceptance with timestamp.

J. API & Integration

The system must expose authenticated REST APIs for all user and admin operations with role/permission enforcement.
The system must provide API keys or token-based access for service-to-service calls with revocation capability.
The system must provide API documentation describing endpoints, request/response schemas, and error codes.

K. Reporting & Metrics

The system must provide Admin dashboard metrics: total users, active users, unverified users, suspended users, new registrations by day (last 30 days).
The system must support exporting user lists and metrics to CSV.

L. Operational Constraints (Testable)

The system must return the user list within 2 seconds for a page of up to 50 users under normal load.
The system must support at least 10,000 total users without degradation of basic CRUD operations.
The system must provide 99.5% monthly uptime for the user management features.

Acceptance Criteria (samples for QA)

Registration: Given a new email and valid password, when the user submits registration, then a pending account is created and a verification email is sent within 30 seconds; sign-in is blocked until verification occurs.
Login lockout: After 5 failed attempts within 15 minutes, then the account is locked for 15 minutes and further attempts are rejected.
Admin deactivate: When an Admin deactivates a user, then the user cannot sign in, existing sessions are invalidated, and an audit event is recorded.
Role assignment: When an Admin assigns a role to a user, the role appears in the user’s detail view, permissions are effective immediately, and an audit event is recorded.
Password reset: When a user completes a reset with a valid token within 60 minutes, the token becomes unusable thereafter and last_password_change timestamp is updated.

Risks and Mitigations

Email deliverability issues: provide resend and delivery status tracking; support fallback provider later.
Account enumeration risk: use consistent messaging on login and reset flows.
Admin misuse: enforce RBAC and comprehensive audit logging.

Dependencies

Email service provider; persistent data store; authentication/session framework.

Purpose: Save analyzed requirements as an artifact; Inputs: businessRequirement summary and functionalRequirements object.

🔧 saveRequirementsToArtifactTool(businessRequirement: "Deliver a simple ...", functionalRequirements: {12 keys}) ⎿ unknown: Requirements saved successfully to artifact

🔧 schemaDesignTool(operations: [16 items])

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})x

⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯

FAIL src/createGraph.integration.test.ts > createGraph Integration > should execute complete workflow
Error: Failed to generate test case for Audit & Compliance: Request timed out.
❯ RunnableCallable.generateTestcaseNode [as func] src/qa-agent/testcaseGeneration/generateTestcaseNode.ts:69:11
67| if (streamResult.isErr()) {
68| // eslint-disable-next-line no-throw-error/no-throw-error -- Requi…
69| throw new Error(
| ^
70| `Failed to generate test case for ${currentRequirement.category}…
71| )
❯ RunnableCallable.invoke ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/utils.ts:85:21
❯ RunnableSequence.invoke ../../../node_modules/.pnpm/@langchain+core@0.3.75_@opentelemetry+api@1.9.0_@opentelemetry+sdk-trace-base@2.1.0_@op_ff334bb79525a10a995a9831d3da877b/node_modules/@langchain/core/dist/runnables/base.js:1308:33
❯ runWithRetry ../../../node_modules/.pnpm/@langchain [email protected]@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/retry.ts:103:16
❯ PregelRunner.executeTasksWithRetry ../../../node_modules/.pnpm/@langchain [email protected]@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/runner.ts:330:27
❯ PregelRunner.tick ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/runner.ts:138:50
❯ CompiledStateGraph.runLoop ../../../node_modules/.pnpm/@langchain [email protected]@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/index.ts:2233:9
❯ createAndRunLoop ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry [email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/index.ts:2092:9

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯

Test Files 1 failed (1)
Tests 1 failed (1)
Start at 04:36:20
Duration 1252.66s (transform 444ms, setup 0ms, collect 1.39s, tests 1250.98s, environment 0ms, prepare 67ms)

ELIFECYCLE Command failed with exit code 1.
/home/runner/work/liam/liam/frontend/internal-packages/agent:
ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL @liam-hq/[email protected] execute-deep-modeling: pnpm test:integration src/createGraph.integration.test.ts
Exit status 1

Remove reasoning and verbosity parameters that are not applicable to GPT-5-Codex model. These parameters were specific to GPT-5-Nano and are not needed for the Codex variant. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

✨ Use GPT-5-Codex for QA Agent testcase generation

992eb1f

Replace GPT-5-Nano with GPT-5-Codex model to leverage enhanced code generation capabilities for creating SQL test cases. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

github-actions bot assigned hoshinotsuyoshi Sep 29, 2025

vercel bot deployed to Preview – liam-app September 29, 2025 04:11 View deployment

vercel bot temporarily deployed to Preview – liam-erd-sample September 29, 2025 04:32 Inactive

vercel bot deployed to Preview – liam-storybook September 29, 2025 04:33 View deployment

vercel bot deployed to Preview – liam-assets September 29, 2025 04:33 View deployment

vercel bot deployed to Preview – liam-app September 29, 2025 04:36 View deployment

hoshinotsuyoshi closed this Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨(qa-agent): Use GPT-5-Codex for testcase generation #3634

✨(qa-agent): Use GPT-5-Codex for testcase generation #3634

Uh oh!

hoshinotsuyoshi commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

changeset-bot bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

vercel bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

supabase bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

✨(qa-agent): Use GPT-5-Codex for testcase generation #3634

✨(qa-agent): Use GPT-5-Codex for testcase generation #3634

Uh oh!

Conversation

hoshinotsuyoshi commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Why is this change needed?

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

vercel bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

supabase bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Agent Deep Modeling Execution

Command Output

Uh oh!

Uh oh!

hoshinotsuyoshi commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Sep 29, 2025 •

edited

Loading

vercel bot commented Sep 29, 2025 •

edited

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

supabase bot commented Sep 29, 2025 •

edited

Loading

github-actions bot commented Sep 29, 2025 •

edited

Loading