Skip to content

Conversation

hoshinotsuyoshi
Copy link
Member

@hoshinotsuyoshi hoshinotsuyoshi commented Sep 29, 2025

Issue

  • resolve: route06/liam-internal#5746

Why is this change needed?

Replace GPT-5-Nano with GPT-5-Codex model in QA Agent's testcase generation to leverage enhanced code generation capabilities for creating SQL test cases. GPT-5-Codex provides better understanding of code structure and produces more accurate SQL DML operations for testing database schemas.

Summary by CodeRabbit

  • Refactor
    • Upgraded the AI engine used for automatic test-case generation to a newer model, improving consistency and reliability of generated tests.
    • Simplified model behavior by removing certain internal verbosity and reasoning toggles, resulting in clearer, less variable outputs while preserving existing workflows and integrations.

Replace GPT-5-Nano with GPT-5-Codex model to leverage enhanced code generation capabilities for creating SQL test cases.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

changeset-bot bot commented Sep 29, 2025

⚠️ No Changeset found

Latest commit: feb4822

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Sep 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
liam-app Ready Ready Preview Comment Sep 29, 2025 4:36am
liam-assets Ready Ready Preview Comment Sep 29, 2025 4:36am
liam-storybook Ready Ready Preview Comment Sep 29, 2025 4:36am
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
liam-docs Ignored Ignored Preview Sep 29, 2025 4:36am
liam-erd-sample Skipped Skipped Sep 29, 2025 4:36am

Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

Walkthrough

Replaced the ChatOpenAI model from gpt-5-nano to gpt-5-codex in the test-case generation node and removed the reasoning and verbosity options; useResponsesApi remains true. No control-flow or exported API changes.

Changes

Cohort / File(s) Summary
Model & options update
frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts
Switched ChatOpenAI model from gpt-5-nano to gpt-5-codex; removed reasoning and verbosity options; useResponsesApi: true retained; no control-flow or public API changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • FunamaYukina
  • junkisai
  • NoritakaIkeda
  • MH4GF

Poem

A rabbit hops in code so sleek,
Swapped nano for codex this week.
Options trimmed, the trail runs clear,
Tests will hum and logs will cheer.
Thump-thump, I nap — the merge is near. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly summarizes the main change by indicating the QA Agent will now use GPT-5-Codex for testcase generation, following conventional commit conventions without extraneous file lists or vague wording.
Description Check ✅ Passed The pull request description matches the repository’s template by including an Issue section with the resolve link and a Why is this change needed? section that succinctly explains the rationale for switching to GPT-5-Codex in the QA Agent.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch qa-gpt-5-codex

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 992eb1f and feb4822.

📒 Files selected for processing (1)
  • frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/internal-packages/agent/src/qa-agent/testcaseGeneration/generateTestcaseNode.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Supabase Preview
  • GitHub Check: frontend-ci
  • GitHub Check: frontend-lint
  • GitHub Check: codeql / languages (javascript) / Perform CodeQL for javascript
  • GitHub Check: agent-deep-modeling
  • GitHub Check: Supabase Preview

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

supabase bot commented Sep 29, 2025

Updates to Preview Branch (qa-gpt-5-codex) ↗︎

Deployments Status Updated
Database Mon, 29 Sep 2025 04:33:42 UTC
Services Mon, 29 Sep 2025 04:33:42 UTC
APIs Mon, 29 Sep 2025 04:33:42 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Mon, 29 Sep 2025 04:33:42 UTC
Migrations Mon, 29 Sep 2025 04:33:42 UTC
Seeding Mon, 29 Sep 2025 04:33:42 UTC
Edge Functions Mon, 29 Sep 2025 04:33:42 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

Copy link
Contributor

github-actions bot commented Sep 29, 2025

🤖 Agent Deep Modeling Execution

Started at: 2025-09-29 04:36:19 UTC

Command Output

@liam-hq/[email protected] execute-deep-modeling /home/runner/work/liam/liam/frontend/internal-packages/agent
pnpm test:integration src/createGraph.integration.test.ts

@liam-hq/[email protected] test:integration /home/runner/work/liam/liam/frontend/internal-packages/agent
vitest --watch=false --passWithNoTests --config vitest.config.integration.ts src/createGraph.integration.test.ts

RUN v3.2.4 /home/runner/work/liam/liam/frontend/internal-packages/agent

(node:7661) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use node --trace-warnings ... to show where the warning was created)

✅ [INFO] 2025-09-29T04:36:22.376Z
LangSmith Trace URL: https://smith.langchain.com/o/eed4d2d8-0bd8-4ca4-a452-4da88ef63fd6/projects/p/99d63e81-58f9-475a-8985-409b591e26b0?searchModel=%7B%22filter%22%3A%22and(eq(is_root%2C%20true)%2C%20and(eq(metadata_key%2C%20%5C%22thread_id%5C%22)%2C%20eq(metadata_value%2C%20%5C%226cb96e5f-5c78-4969-835f-5bec96bf959d%5C%22)))%22%7D

⏺ (pm):

Checklist:

  • Define scope, objectives, actors, and assumptions for a simple user management system.
  • Structure functional requirements by feature area with clear, testable “must” statements.
  • Identify core data entities to guide database design.
  • Specify acceptance criteria and success metrics for QA verification.
  • Document constraints, out-of-scope items, and risks.
  • Save the finalized requirements artifact for downstream DB and QA use.

Business Requirements Document (BRD)

  1. Overview and Objectives
  • Build a simple user management system that lets end users register, sign in, and manage their profile, and lets administrators manage users, roles, and access, with auditable actions and basic email notifications.
  • Primary goals: secure account lifecycle, minimal friction for end users, and clear administrative oversight.
  1. Scope
  • In scope: registration, email verification, login/logout, password reset, profile management, role/permission-based access control, admin user CRUD, search/filter, audit logging, and email notifications.
  • Out of scope (Phase 2+): SSO/social login, multi-tenant orgs, fine-grained resource authorization beyond users, mobile apps, MFA/2FA, SCIM provisioning, webhooks.
  1. Actors and Roles
  • End User: registers, verifies email, signs in/out, resets password, updates profile.
  • Administrator: manages users, roles, and permissions; views audits; deactivates/reactivates accounts.
  • System: sends emails, enforces policies, logs audit events.
  1. Key Assumptions
  • Single-tenant, US-based deployment; email is the unique identifier for accounts.
  • Web UI and REST API are both available.
  • Timestamps stored in UTC; display in user’s local time where applicable.
  • Passwords stored using industry-standard one-way hashing; emails delivered via SMTP-compatible provider.
  1. Core Data Entities (for DB design guidance)
  • User: id, email (unique), email_verified, password, first_name, last_name, phone (opt), status [pending, active, suspended, deactivated], last_login_at, created_at, updated_at.
  • Role: id, name (unique), description, created_at, updated_at.
  • Permission: id, key (unique), description.
  • UserRole (junction): user_id, role_id, assigned_by, assigned_at.
  • RolePermission (junction): role_id, permission_id, created_at.
  • AuditEvent: id, event_type, actor_user_id (nullable), target_user_id (nullable), timestamp, ip_address (nullable), user_agent (nullable), details (json/text).
  • AuthToken: id, user_id, type [password_reset, email_verify, session_revocation], token, expires_at, used_at (nullable), created_at.
  • EmailQueue/Notification (optional): id, to, subject, template_key, payload (json), status, created_at, sent_at.
  1. Functional Requirements
    A. User Registration & Verification
  • The system must allow a new user to register with email, password, and optional name fields.
  • The system must require acceptance of Terms of Service and Privacy Policy at registration.
  • The system must send a verification email containing a single-use link valid for 72 hours.
  • The system must mark accounts as “pending” until email is verified; unverified users cannot access authenticated features.
  • The system must support resending verification emails and expiring prior tokens.

B. Authentication & Session Management

  • The system must allow users to sign in with email and password.
  • The system must prevent sign-in for suspended or deactivated accounts and display an appropriate message.
  • The system must lock an account for 15 minutes after 5 consecutive failed login attempts.
  • The system must support logout and invalidate the active session.
  • The system must record last successful login timestamp.

C. Password & Credential Management

  • The system must allow users to change their password when signed in (old + new password flow).
  • The system must allow password reset via email using a single-use token valid for 60 minutes.
  • The system must invalidate a reset token immediately after use or expiration.
  • The system must enforce a configurable password policy (e.g., minimum length, complexity).

D. Profile Management

  • The system must allow users to view and update profile fields: first_name, last_name, phone (optional).
  • The system must allow users to view account status and email verification state.
  • The system must restrict email changes to verified users and require re-verification on email change.

E. Authorization, Roles, and Permissions

  • The system must support role-based access control (RBAC) with at least two default roles: Admin and User.
  • The system must allow Admins to create, read, update, and delete roles.
  • The system must allow Admins to assign and revoke roles for any user.
  • The system must support permissions that can be attached to roles; Admins can manage role-permission mappings.
  • The system must enforce permissions on administrative features and protected endpoints.

F. Administrative User Management

  • The system must allow Admins to create a user (invite), update user attributes, deactivate/reactivate, and soft-delete users.
  • The system must allow Admins to reset a user’s password (send reset link) without viewing or setting the password directly.
  • The system must display a paginated, sortable, and filterable user list with columns: email, name, roles, status, created_at, last_login_at.
  • The system must support search by email and name, and filters by role, status, and verification state.
  • The system must support bulk actions for resend verification and deactivate/reactivate (up to 100 users per action).

G. Notifications & Email

  • The system must send transactional emails for registration verification, password reset, user invitation, and account status changes.
  • The system must template emails with variables (e.g., user name, verification link) for localization support.
  • The system must record send status and enable retry on transient failures.

H. Audit & Compliance

  • The system must capture audit events for: user created/updated/deactivated/reactivated/deleted, role assigned/revoked, login success/failure, password changed/reset, email verified/changed, and bulk actions.
  • The system must record actor, target, timestamp, and contextual details (IP and user agent when available).
  • The system must allow Admins to search and filter audit events by event type, actor, target, and date range.
  • The system must retain audit events for at least 12 months and support export (CSV).

I. Data Governance & Privacy

  • The system must allow Admins to permanently delete deactivated users and associated personal data, excluding records required for legal or audit retention.
  • The system must provide users the ability to download their profile data (JSON or CSV).
  • The system must mask sensitive data in admin views (e.g., never display full password or reset tokens).
  • The system must log and disclose privacy policy acceptance with timestamp.

J. API & Integration

  • The system must expose authenticated REST APIs for all user and admin operations with role/permission enforcement.
  • The system must provide API keys or token-based access for service-to-service calls with revocation capability.
  • The system must provide API documentation describing endpoints, request/response schemas, and error codes.

K. Reporting & Metrics

  • The system must provide Admin dashboard metrics: total users, active users, unverified users, suspended users, new registrations by day (last 30 days).
  • The system must support exporting user lists and metrics to CSV.

L. Operational Constraints (Testable)

  • The system must return the user list within 2 seconds for a page of up to 50 users under normal load.
  • The system must support at least 10,000 total users without degradation of basic CRUD operations.
  • The system must provide 99.5% monthly uptime for the user management features.
  1. Acceptance Criteria (samples for QA)
  • Registration: Given a new email and valid password, when the user submits registration, then a pending account is created and a verification email is sent within 30 seconds; sign-in is blocked until verification occurs.
  • Login lockout: After 5 failed attempts within 15 minutes, then the account is locked for 15 minutes and further attempts are rejected.
  • Admin deactivate: When an Admin deactivates a user, then the user cannot sign in, existing sessions are invalidated, and an audit event is recorded.
  • Role assignment: When an Admin assigns a role to a user, the role appears in the user’s detail view, permissions are effective immediately, and an audit event is recorded.
  • Password reset: When a user completes a reset with a valid token within 60 minutes, the token becomes unusable thereafter and last_password_change timestamp is updated.
  1. Risks and Mitigations
  • Email deliverability issues: provide resend and delivery status tracking; support fallback provider later.
  • Account enumeration risk: use consistent messaging on login and reset flows.
  • Admin misuse: enforce RBAC and comprehensive audit logging.
  1. Dependencies
  • Email service provider; persistent data store; authentication/session framework.

Purpose: Save analyzed requirements as an artifact; Inputs: businessRequirement summary and functionalRequirements object.

🔧 saveRequirementsToArtifactTool(businessRequirement: "Deliver a simple ...", functionalRequirements: {12 keys}) ⎿ unknown: Requirements saved successfully to artifact

🔧 schemaDesignTool(operations: [16 items])

🔧 schemaDesignTool(operations: [16 items])

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})

🔧 saveTestcase(testcaseWithDml: {6 keys})x

⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯

FAIL src/createGraph.integration.test.ts > createGraph Integration > should execute complete workflow
Error: Failed to generate test case for Audit & Compliance: Request timed out.
❯ RunnableCallable.generateTestcaseNode [as func] src/qa-agent/testcaseGeneration/generateTestcaseNode.ts:69:11
67| if (streamResult.isErr()) {
68| // eslint-disable-next-line no-throw-error/no-throw-error -- Requi…
69| throw new Error(
| ^
70| `Failed to generate test case for ${currentRequirement.category}…
71| )
❯ RunnableCallable.invoke ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/utils.ts:85:21
❯ RunnableSequence.invoke ../../../node_modules/.pnpm/@langchain+core@0.3.75_@opentelemetry+api@1.9.0_@opentelemetry+sdk-trace-base@2.1.0_@op_ff334bb79525a10a995a9831d3da877b/node_modules/@langchain/core/dist/runnables/base.js:1308:33
runWithRetry ../../../node_modules/.pnpm/@langchain[email protected]@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/retry.ts:103:16
❯ PregelRunner.executeTasksWithRetry ../../../node_modules/.pnpm/@langchain[email protected]@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/runner.ts:330:27
❯ PregelRunner.tick ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/runner.ts:138:50
❯ CompiledStateGraph.runLoop ../../../node_modules/.pnpm/@langchain[email protected]@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/index.ts:2233:9
❯ createAndRunLoop ../../../node_modules/.pnpm/@langchain+langgraph@0.4.9_@langchain+core@0.3.75_@opentelemetry[email protected]_@opentelemet_8cec001996aaf22af354b21d9734e3f3/node_modules/@langchain/langgraph/src/pregel/index.ts:2092:9

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[1/1]⎯

Test Files 1 failed (1)
Tests 1 failed (1)
Start at 04:36:20
Duration 1252.66s (transform 444ms, setup 0ms, collect 1.39s, tests 1250.98s, environment 0ms, prepare 67ms)

 ELIFECYCLE  Command failed with exit code 1.
/home/runner/work/liam/liam/frontend/internal-packages/agent:
 ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL  @liam-hq/[email protected] execute-deep-modeling: pnpm test:integration src/createGraph.integration.test.ts
Exit status 1

Remove reasoning and verbosity parameters that are not applicable to GPT-5-Codex model. These parameters were specific to GPT-5-Nano and are not needed for the Codex variant.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant