Conversation
Co-authored-by: jon <jon@jonbell.net>
Co-authored-by: jon <jon@jonbell.net>
|
Cursor Agent can help with this pull request. Just |
WalkthroughAdds empty-submission detection: stores per-handout file hashes from template commits, computes SHA-256 fingerprints for submitted files, flags submissions that match handout fingerprints, and optionally rejects and cleans up submissions when an assignment disallows empty submissions. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Autograder as Autograder Service
participant Storage as File Storage
participant DB as Database
User->>Autograder: Submit assignment files
Autograder->>Storage: Read submitted file blobs
Storage-->>Autograder: File contents
Autograder->>Autograder: Compute per-file SHA-256 & combined hash
Autograder->>DB: Query assignment_handout_file_hashes by assignment_id + combined_hash
DB-->>Autograder: Matching handout record? (yes/no)
alt Match found
Autograder->>DB: Set submissions.is_empty_submission = true
alt assignment.permit_empty_submissions = false and user not grader/instructor
Autograder->>DB: Cleanup related reviews/files and remove submission
Autograder-->>User: Reject submission (UserVisibleError)
else
Autograder->>DB: Persist submission (flagged empty)
Autograder-->>User: Accept submission (empty)
end
else No match
Autograder->>DB: Set submissions.is_empty_submission = false
Autograder->>DB: Persist submission
Autograder-->>User: Accept submission
end
sequenceDiagram
participant GitHub
participant Webhook as Webhook Handler
participant Octokit as GitHub API
participant DB as Database
GitHub->>Webhook: Push event (template repo)
Webhook->>Webhook: Determine commit SHA
Webhook->>Octokit: Fetch commit tree & blobs
Octokit-->>Webhook: File blobs
Webhook->>Webhook: Filter expected files, compute per-file & combined hashes
Webhook->>DB: Insert/update assignment_handout_file_hashes (assignment_id, sha, combined_hash, file_hashes, class_id)
Webhook->>DB: Update assignment.latest_template_sha
Webhook-->>GitHub: Acknowledge event
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: jon <jon@jonbell.net>
Co-authored-by: jon <jon@jonbell.net>
Co-authored-by: jon <jon@jonbell.net>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
supabase/functions/autograder-create-submission/index.ts (1)
1309-1391: Blocker: permit_empty_submissions is never selected, so rejections won’t happen.You check
repoData.assignments.permit_empty_submissions === false, but the repo lookup selects onlyassignments(class_id, due_date, allow_not_graded_submissions, autograder(*)), sopermit_empty_submissionswill beundefinedat runtime.Also: consider wrapping the GitHub
updateCheckRunin atry/catchso a GitHub API failure can’t prevent cleanup + the intended rejection.Proposed fix (include permit_empty_submissions in the repoData select)
- .select("*, assignments(class_id, due_date, allow_not_graded_submissions, autograder(*))") + .select("*, assignments(class_id, due_date, allow_not_graded_submissions, permit_empty_submissions, autograder(*))")
🧹 Nitpick comments (4)
tests/e2e/create-submission.test.tsx (2)
28-50: Keep hash canonicalization in one place (avoid subtle drift).This duplicates the combined-hash algorithm now implemented in edge functions; consider extracting to a shared helper to prevent mismatches (delimiter/sorting/encoding) across test + webhook + autograder.
210-297: Make handout-hash setup idempotent to avoid rare “duplicate key” flakes.These inserts into
assignment_handout_file_hashesare fine for fresh assignments, but usingupsert(..., { onConflict: "assignment_id,sha" })(or a uniqueshaper test run) would make reruns more robust.utils/supabase/SupabaseTypes.d.ts (1)
696-768: Consider stronger typing forfile_hashes(shape appears well-known).
Iffile_hashesis a “path -> hash” map, keeping it asJsonpushes a lot of validation to runtime. If feasible, consider adding a separate app-level type (e.g.,type HandoutFileHashes = Record<string, string>) and casting at the boundary, rather than relying onJsoneverywhere.supabase/migrations/20260114180000_empty_submission_detection.sql (1)
28-38: Constraint existence check could be more precise.The
pg_constraintquery only filters byconname, which could theoretically match a constraint with the same name on a different table or schema (unlikely but possible). A more robust check would also verify the constraint belongs to the correct table.♻️ Optional: More precise constraint existence check
DO $$ BEGIN IF NOT EXISTS ( SELECT 1 - FROM pg_constraint - WHERE conname = 'assignment_handout_file_hashes_assignment_id_sha_key' + FROM pg_constraint c + JOIN pg_class t ON c.conrelid = t.oid + JOIN pg_namespace n ON t.relnamespace = n.oid + WHERE c.conname = 'assignment_handout_file_hashes_assignment_id_sha_key' + AND t.relname = 'assignment_handout_file_hashes' + AND n.nspname = 'public' ) THEN ALTER TABLE public.assignment_handout_file_hashes ADD CONSTRAINT assignment_handout_file_hashes_assignment_id_sha_key UNIQUE (assignment_id, sha); END IF; END $$;That said, given the verbose constraint name, a collision is highly improbable in practice.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
supabase/functions/_shared/SupabaseTypes.d.tssupabase/functions/autograder-create-submission/index.tssupabase/functions/github-repo-webhook/index.tssupabase/migrations/20260114180000_empty_submission_detection.sqlsupabase/migrations/20260114193000_permit_empty_submissions.sqltests/e2e/TestingUtils.tstests/e2e/create-submission.test.tsxutils/supabase/SupabaseTypes.d.ts
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-12T05:59:48.513Z
Learnt from: koyohashi
Repo: pawtograder/platform PR: 514
File: supabase/migrations/20251204070101_live-polls.sql:38-38
Timestamp: 2025-12-12T05:59:48.513Z
Learning: In SQL migrations affecting live polls, ensure the UNIQUE constraint on (live_poll_id, public_profile_id) behaves with NULLs as intended: anonymous (public_profile_id NULL) can submit multiple responses, while duplicates are prevented only when require_login is true and public_profile_id IS NOT NULL. Review migration 20251204070101_live-polls.sql to confirm this constraint and related columns (public_profile_id nullable, require_login flag) are implemented accordingly and documented.
Applied to files:
supabase/migrations/20260114193000_permit_empty_submissions.sqlsupabase/migrations/20260114180000_empty_submission_detection.sql
📚 Learning: 2025-12-12T06:13:23.017Z
Learnt from: koyohashi
Repo: pawtograder/platform PR: 514
File: supabase/migrations/20251204070101_live-polls.sql:0-0
Timestamp: 2025-12-12T06:13:23.017Z
Learning: Migrations that implement RLS should clearly document the intended access boundaries. For live_polls (and similar features), ensure that unrestricted read access (USING (true)) is intentional and that real security checks occur at the response level (e.g., require_login and can_access_poll_response()). In the migration, add a comment explaining why read-level policy is permissive and that the enforcement happens downstream, so future reviewers understand the design choice and potential risks.
Applied to files:
supabase/migrations/20260114193000_permit_empty_submissions.sqlsupabase/migrations/20260114180000_empty_submission_detection.sql
📚 Learning: 2025-10-19T01:15:20.591Z
Learnt from: jon-bell
Repo: pawtograder/platform PR: 409
File: supabase/migrations/20251017190202_improve_help_request_rls_perf.sql:1-235
Timestamp: 2025-10-19T01:15:20.591Z
Learning: Before suggesting the creation of database indices, always check prior migration files to verify whether the index already exists. Use shell scripts to search for CREATE INDEX statements and PRIMARY KEY constraints in the supabase/migrations directory.
Applied to files:
supabase/migrations/20260114180000_empty_submission_detection.sql
🧬 Code graph analysis (4)
tests/e2e/create-submission.test.tsx (2)
utils/supabase/DatabaseTypes.d.ts (1)
Assignment(42-42)tests/e2e/TestingUtils.ts (3)
insertAssignment(1142-1427)insertSubmissionViaAPI(1429-1536)supabase(12-12)
supabase/functions/autograder-create-submission/index.ts (4)
supabase/functions/_shared/SupabaseTypes.d.ts (1)
Database(3-11377)utils/supabase/SupabaseTypes.d.ts (1)
Database(3-11377)supabase/functions/_shared/GitHubWrapper.ts (1)
updateCheckRun(1569-1580)supabase/functions/_shared/HandlerUtils.ts (1)
UserVisibleError(316-324)
supabase/functions/_shared/SupabaseTypes.d.ts (2)
utils/supabase/SupabaseTypes.d.ts (1)
Json(1-1)utils/supabase/DatabaseTypes.d.ts (1)
Json(3-3)
supabase/functions/github-repo-webhook/index.ts (3)
supabase/functions/_shared/GitHubWrapper.ts (1)
getOctoKit(244-319)supabase/functions/_shared/PawtograderYml.d.ts (1)
PawtograderConfig(42-49)utils/PawtograderYml.d.ts (1)
PawtograderConfig(42-49)
⏰ Context from checks skipped due to timeout of 180000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint (22.x)
- GitHub Check: deploy
- GitHub Check: lint (22.x)
🔇 Additional comments (12)
supabase/functions/_shared/SupabaseTypes.d.ts (3)
696-768: No action needed—schema is correctly designed.The
file_hashescolumn isjsonb NOT NULL DEFAULT '{}'::jsonbin the migration, which fully supports the TypeScript type contract. Insert/Update fields can safely omitfile_hashesbecause the database applies the default; Row correctly shows it as required because the column is never null. The same pattern applies tois_empty_submissionandpermit_empty_submissions, both with explicitNOT NULL DEFAULTconstraints. The schema is sound.Likely an incorrect or invalid review comment.
825-826: TypeScript pattern and database constraints confirmed as correct.The field is properly typed as required in the
Rowtype (line 825) but optional inInsert/Updatetypes (lines 860, 895), following the expected "defaulted column" pattern. The migration file correctly applies bothNOT NULLandDEFAULT trueconstraints, ensuring backwards compatibility with older insert paths and preventing runtime failures. The default value oftruealigns with the semantic meaning ("allow empty submissions by default"), maintaining safe backwards-compatible behavior.
8095-8096:submissions.is_empty_submissionandis_not_graded: Defaults correctly configured in migrations.Both fields have
DEFAULT falsein their migrations (20260114180000_empty_submission_detection.sqland20250712150000_update-due-date-logic-for-lab-sections.sqlrespectively), ensuring existing rows are backfilled and new inserts without explicit values default safely. The TypeScript schema correctly reflects this: Row types are non-nullable while Insert/Update types are optional.tests/e2e/TestingUtils.ts (1)
1142-1206: Permit-empty-submissions plumbing looks correct (defaulting to true).
permit_empty_submissions: permit_empty_submissions ?? truematches the DB default and keeps existing call sites stable.supabase/migrations/20260114193000_permit_empty_submissions.sql (1)
1-8: Migration is clean and idempotent.tests/e2e/create-submission.test.tsx (1)
84-101: Good coverage: explicit assignments for empty allowed vs prohibited.utils/supabase/SupabaseTypes.d.ts (2)
8095-8146: Issue verified as resolved —is_empty_submissionis properly defined withNOT NULL DEFAULT falsein the migration.The migration
20260114180000_empty_submission_detection.sqlcorrectly adds the column with both the NOT NULL constraint and DEFAULT false value, which handles backfilling existing rows automatically. No further action needed.
825-906: No action needed —assignments.permit_empty_submissionsalready has NOT NULL constraint with DEFAULT true.The column is properly defined in migration
20260114193000_permit_empty_submissions.sqlwithNOT NULL DEFAULT true, which safely backlills existing rows and prevents runtime surprises. The default value permits empty submissions by default, with ability to restrict when needed.supabase/migrations/20260114180000_empty_submission_detection.sql (4)
5-10: LGTM! Clean column addition with sensible defaults.Adding
is_empty_submissionasNOT NULL DEFAULT falseensures backward compatibility with existing rows. The comment clearly documents the column's purpose.
12-23: Table structure looks solid.Good choices here:
GENERATED BY DEFAULT AS IDENTITYfor the PK is the modern PostgreSQL approachON DELETE CASCADEon both FKs prevents orphaned records when assignments or classes are removedjsonbfor file_hashes is appropriate for the key-value mapping of paths to hashesOne minor note: the
file_hashesdefault is'{}'::jsonbwhich is an empty object — this aligns well with the intended usage.
40-45: Index choices are appropriate.
- The composite index on
(assignment_id, combined_hash)directly supports the empty-submission lookup pattern described in the comments.- The
class_idindex will help with RLS policy evaluation sinceauthorizeforclassinstructor(class_id)needs to filter by class.Based on learnings, I verified these are new indexes being introduced in this migration.
50-57: RLS policy is correctly designed — reads restricted to instructors/graders, writes via service role.The
SELECTpolicy appropriately limits read access to instructors and graders for debugging and admin visibility. All writes (upsertingithub-repo-webhookandautograder-create-submission) useadminSupabase, which operates as the service role and automatically bypasses RLS. NoINSERTorUPDATEpolicies are needed.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| async function safeCleanupRejectedSubmission(params: { | ||
| adminSupabase: SupabaseClient<Database>; | ||
| submissionId: number; | ||
| }) { | ||
| const { adminSupabase, submissionId } = params; | ||
| // Break circular-ish references: submissions.grading_review_id -> submission_reviews | ||
| await adminSupabase.from("submissions").update({ grading_review_id: null, is_active: false }).eq("id", submissionId); | ||
|
|
||
| // Delete any auto-created review row(s) for this submission. | ||
| await adminSupabase.from("submission_reviews").delete().eq("submission_id", submissionId); | ||
|
|
||
| // Remove files (these are the only child rows we definitely created in this path). | ||
| await adminSupabase.from("submission_files").delete().eq("submission_id", submissionId); | ||
|
|
||
| // Finally, remove the submission row itself. | ||
| await adminSupabase.from("submissions").delete().eq("id", submissionId); | ||
| } |
There was a problem hiding this comment.
Cleanup helper silently ignores PostgREST failures (won’t clean up in practice).
safeCleanupRejectedSubmission never inspects { error }, so cleanup can fail without throwing/capture, leaving rejected submissions behind.
Proposed fix
async function safeCleanupRejectedSubmission(params: {
adminSupabase: SupabaseClient<Database>;
submissionId: number;
}) {
const { adminSupabase, submissionId } = params;
// Break circular-ish references: submissions.grading_review_id -> submission_reviews
- await adminSupabase.from("submissions").update({ grading_review_id: null, is_active: false }).eq("id", submissionId);
+ {
+ const { error } = await adminSupabase
+ .from("submissions")
+ .update({ grading_review_id: null, is_active: false })
+ .eq("id", submissionId);
+ if (error) throw error;
+ }
// Delete any auto-created review row(s) for this submission.
- await adminSupabase.from("submission_reviews").delete().eq("submission_id", submissionId);
+ {
+ const { error } = await adminSupabase.from("submission_reviews").delete().eq("submission_id", submissionId);
+ if (error) throw error;
+ }
// Remove files (these are the only child rows we definitely created in this path).
- await adminSupabase.from("submission_files").delete().eq("submission_id", submissionId);
+ {
+ const { error } = await adminSupabase.from("submission_files").delete().eq("submission_id", submissionId);
+ if (error) throw error;
+ }
// Finally, remove the submission row itself.
- await adminSupabase.from("submissions").delete().eq("id", submissionId);
+ {
+ const { error } = await adminSupabase.from("submissions").delete().eq("id", submissionId);
+ if (error) throw error;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function safeCleanupRejectedSubmission(params: { | |
| adminSupabase: SupabaseClient<Database>; | |
| submissionId: number; | |
| }) { | |
| const { adminSupabase, submissionId } = params; | |
| // Break circular-ish references: submissions.grading_review_id -> submission_reviews | |
| await adminSupabase.from("submissions").update({ grading_review_id: null, is_active: false }).eq("id", submissionId); | |
| // Delete any auto-created review row(s) for this submission. | |
| await adminSupabase.from("submission_reviews").delete().eq("submission_id", submissionId); | |
| // Remove files (these are the only child rows we definitely created in this path). | |
| await adminSupabase.from("submission_files").delete().eq("submission_id", submissionId); | |
| // Finally, remove the submission row itself. | |
| await adminSupabase.from("submissions").delete().eq("id", submissionId); | |
| } | |
| async function safeCleanupRejectedSubmission(params: { | |
| adminSupabase: SupabaseClient<Database>; | |
| submissionId: number; | |
| }) { | |
| const { adminSupabase, submissionId } = params; | |
| // Break circular-ish references: submissions.grading_review_id -> submission_reviews | |
| { | |
| const { error } = await adminSupabase | |
| .from("submissions") | |
| .update({ grading_review_id: null, is_active: false }) | |
| .eq("id", submissionId); | |
| if (error) throw error; | |
| } | |
| // Delete any auto-created review row(s) for this submission. | |
| { | |
| const { error } = await adminSupabase.from("submission_reviews").delete().eq("submission_id", submissionId); | |
| if (error) throw error; | |
| } | |
| // Remove files (these are the only child rows we definitely created in this path). | |
| { | |
| const { error } = await adminSupabase.from("submission_files").delete().eq("submission_id", submissionId); | |
| if (error) throw error; | |
| } | |
| // Finally, remove the submission row itself. | |
| { | |
| const { error } = await adminSupabase.from("submissions").delete().eq("id", submissionId); | |
| if (error) throw error; | |
| } | |
| } |
| function sha256Hex(buf: Uint8Array): string { | ||
| const hash = createHash("sha256"); | ||
| hash.update(buf); | ||
| return hash.digest("hex"); | ||
| } | ||
|
|
||
| function computeCombinedHashFromFileHashes(file_hashes: Record<string, string>): string { | ||
| const combinedInput = Object.keys(file_hashes) | ||
| .sort() | ||
| .map((name) => `${name}\0${file_hashes[name]}\n`) | ||
| .join(""); | ||
| return sha256Hex(Buffer.from(combinedInput, "utf-8")); | ||
| } | ||
|
|
||
| async function computeHandoutFileHashesForCommit(params: { | ||
| templateRepo: string; | ||
| commitSha: string; | ||
| expectedFiles: string[]; | ||
| scope: Sentry.Scope; | ||
| }): Promise<{ file_hashes: Record<string, string>; combined_hash: string }> { | ||
| const { templateRepo, commitSha, expectedFiles, scope } = params; | ||
| const octokit = await getOctoKit(templateRepo, scope); | ||
| if (!octokit) { | ||
| throw new Error(`No octokit found for repository ${templateRepo}`); | ||
| } | ||
| const [owner, repo] = templateRepo.split("/"); | ||
|
|
||
| // Resolve commit -> tree sha | ||
| const { data: commit } = await octokit.request("GET /repos/{owner}/{repo}/git/commits/{commit_sha}", { | ||
| owner, | ||
| repo, | ||
| commit_sha: commitSha | ||
| }); | ||
| const treeSha = commit.tree.sha; | ||
|
|
||
| // List all files in tree | ||
| const { data: tree } = await octokit.request("GET /repos/{owner}/{repo}/git/trees/{tree_sha}", { | ||
| owner, | ||
| repo, | ||
| tree_sha: treeSha, | ||
| recursive: "true" | ||
| }); | ||
|
|
||
| const wantedPaths = (tree.tree || []) | ||
| .filter((item) => item.type === "blob" && !!item.path && !!item.sha) | ||
| .map((item) => ({ path: item.path!, sha: item.sha! })) | ||
| .filter(({ path }) => expectedFiles.some((pattern) => micromatch.isMatch(path, pattern))); | ||
|
|
||
| // Hash file contents for all wanted paths | ||
| const file_hashes: Record<string, string> = {}; | ||
| for (const { path, sha } of wantedPaths) { | ||
| const { data: blob } = await octokit.request("GET /repos/{owner}/{repo}/git/blobs/{file_sha}", { | ||
| owner, | ||
| repo, | ||
| file_sha: sha | ||
| }); | ||
| if (blob.encoding !== "base64") { | ||
| throw new Error(`Unexpected blob encoding for ${path}: ${blob.encoding}`); | ||
| } | ||
| const bytes = Buffer.from(blob.content, "base64"); | ||
| file_hashes[path] = sha256Hex(bytes); | ||
| } | ||
|
|
||
| const combined_hash = computeCombinedHashFromFileHashes(file_hashes); | ||
| return { file_hashes, combined_hash }; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub REST API git trees recursive parameter format "true" vs "1" octokit
💡 Result:
Both work — GitHub’s git/trees endpoint treats the recursive query as “any value enables recursion.” You can pass recursive=1 or recursive=true (or 0/"false" to disable). Octokit docs likewise note values like 0, 1, "true", and "false" are accepted. (docs.github.com)
Citations:
🏁 Script executed:
# Search the codebase to understand how the function is called and what payload.commits looks like
rg -A 5 -B 5 "computeHandoutFileHashesForCommit" /
# Check if there are multiple calls to this function or if it's called once per commit
rg "payload\.commits" /
# Look for how templateRepo is passed and if there's any caching mechanism
rg "assignment_handout_file_hashes" /Repository: pawtograder/platform
Length of output: 7580
🏁 Script executed:
# Check the Octokit documentation or type definitions in the codebase
fd "octokit" -t f | head -20
# Look for package.json to see what Octokit version is being used
fd "package.json" -t f | xargs grep -l "octokit" 2>/dev/null | head -5Repository: pawtograder/platform
Length of output: 76
🏁 Script executed:
# View the context around where computeHandoutFileHashesForCommit is called (around line 140-160)
sed -n '135,160p' /home/jailuser/git/supabase/functions/github-repo-webhook/index.tsRepository: pawtograder/platform
Length of output: 956
🏁 Script executed:
# View the lines 549-599 to see if this pattern repeats elsewhere
sed -n '540,605p' /home/jailuser/git/supabase/functions/github-repo-webhook/index.tsRepository: pawtograder/platform
Length of output: 2267
🏁 Script executed:
# Also check the overall structure of the main function to understand commit iteration
sed -n '140,250p' /home/jailuser/git/supabase/functions/github-repo-webhook/index.tsRepository: pawtograder/platform
Length of output: 5059
Handout hashing makes duplicate GitHub API blob calls across assignments; consider caching.
For each assignment with the same template repo and commit SHA, the function makes separate blob API calls to fetch the same files. With multiple assignments per webhook and broad globs (e.g., **/*.java), this quickly compounds: if 10 assignments share a template repo, you fetch the same N blobs 10 times. For large template repos with many matched files, this risks rate limiting or webhook timeouts. Consider caching blob fetches within a single webhook invocation, or enforcing a reasonable cap on matched files per assignment.
The recursive: "true" parameter is valid—Octokit accepts it without issue.
Co-authored-by: jon <jon@jonbell.net>
Co-authored-by: jon <jon@jonbell.net>
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
Adds automatic detection and flagging of empty submissions by comparing submitted files against stored handout file hashes for all template repo versions.
Summary by CodeRabbit
New Features
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.