Iterative cross-model review cuts AI-generated PR rejection rate in half #26397

kimjune01 · 2026-05-03T07:38:01Z

kimjune01
May 3, 2026

I tested 27 merged PRs across 9 repos (3 languages) to measure whether iterative LLM review catches slop (code that passes tests but isn't merge-ready). 9 of those trials were on gemini-cli.

The status quo for this repo is a one-shot review from Gemini Code Assist. That's the 43% baseline: a coin flip on merge-readiness. With an adversarial cross-model loop (hunt bugs → fix → rebuild → retest → repeat), 91%. Same code, same spec. I only added the loop.

If you have access to more than one SOTA model, iterating before submitting the PR is a certain improvement. Maintainer time is the bottleneck. Arriving pre-iterated means less back-and-forth and a higher chance of approval on first human review.

Writeup (methodology, per-language breakdowns, caveats): https://june.kim/does-iteration-mitigate-slop-slope

Experiment repo: https://github.com/kimjune01/refactor-equivalence

Forge diffs on this repo: #2, #3, #4, #5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative cross-model review cuts AI-generated PR rejection rate in half #26397

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Iterative cross-model review cuts AI-generated PR rejection rate in half #26397

Uh oh!

kimjune01 May 3, 2026

Replies: 0 comments

kimjune01
May 3, 2026