doc: introduce guidelines for AI-generated contributions#3948
doc: introduce guidelines for AI-generated contributions#3948flyingImer wants to merge 2 commits intoapache:mainfrom
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for driving this initiative, @flyingImer !
flyrain
left a comment
There was a problem hiding this comment.
Thanks @flyingImer for driving this. LGTM.
|
@jbonofre : wdyt? |
| * Regardless of how a change is produced, the individual submitting the pull request is considered the author of the contribution and is fully responsible for it. | ||
| * The pull request author **must understand the implementation end-to-end** and be able to **explain and justify the design and code** during review. | ||
| * Tools, including AI systems, **are not** considered contributors. **Responsibility and authorship remain with the human** submitting the change. | ||
| * Contributors are encouraged to **disclose** significant AI assistance in the pull request description for transparency. |
There was a problem hiding this comment.
Given that AI generated code and PRs put a lot of burden on the reviewers. Should we add something like
Authors must ensure that AI assisted pull requests remain focused, well explained, and of a size that can be reasonably reviewed by humans. Submissions that significantly increase reviewer burden may be asked to be reduced in scope or revised before review.
There was a problem hiding this comment.
That applies to all PRs, IMHO 😉
There was a problem hiding this comment.
it is, however, generated code tends to cover more things than manually written code, which make the situation worse. Also an explicit rule make it clear that keeping PRs smaller are prefereed
There was a problem hiding this comment.
Something important also to remember is that we can't have copyright on generated source (AI or other tool).
So ASF header should not be added on a class where the majority of the code is generated.
There was a problem hiding this comment.
Maybe we should ask to put a header like:
/**
* This code has been generated by AI Foo Model
*/
There was a problem hiding this comment.
The header problem exists, I think. So I think we need to resolve it and provide clear guidelines for contributors who want to use AI-generated code. Deferring it will make reviewing PRs with known AI-generated code pretty much impossible.
Re: comparison with OpenAPI code generation, from my POV the key distinction is that build-time tools (like OpenAPI generator) produce code that is not meant to be edited after generation. Any updates to sources require re-generating that code. On the other hand, AI-generated code is meant to be committed to the source repo like any human-written code, so it will be subject to editing and copying by (other) people after the initial contribution. Not having a copyright header will make the status of subsequent manual edits obscure. I believe manual edits should be covered by the normal ASF license, so a header would be valuable in these files... WDYT?
I agree that this discussion is growing in scope, but I think it is beneficial to the project even if it takes time to figure this out.
There was a problem hiding this comment.
@dimas-b @jbonofre I agree the header/copyright question is important! My concern is that this feels bigger than a Polaris-local CONTRIBUTING.md rule.
My bias is that Polaris should not try to define project-specific copyright/header treatment for AI-generated code ahead of clearer ASF-level guidance or discussion there.
For this PR, I’d prefer to keep the scope narrow: the submitting human remains accountable, contributors must understand and stand behind the change, and submissions should follow the ASF generative tooling guidance.
If we still feel project-specific header/provenance rules are needed after that, I think that should be handled as a separate follow-up discussion.
I'm inclined to merge as is and bring the discussion to ASF level. WDYT?
There was a problem hiding this comment.
From my POV if we encourage contributors to disclose AI contributions and at the same time we're not sure about how those contributions are supposed to be incorporated into the codebase, we're putting reviewers into a logical deadlock.
There was a problem hiding this comment.
The encouraging disclosure part isn't practical AFAIK. I'd remove them if it causes issues.
There was a problem hiding this comment.
The discussion already happens at ASF level.
I'm fine to merge as is, but the question I have is: what are the "core" differences with https://www.apache.org/legal/generative-tooling.html ? More practical ?
This PR proposes adding a short guideline around AI-assisted contributions. The goal is not to restrict how contributors use development tools, but to clarify contributor accountability.
In particular, the guideline emphasizes that the human submitting a pull request:
This is similar in spirit to recent discussions and updates in other ASF projects (for example Apache Iceberg). This PR is intended as a starting point for discussion and wording refinement.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)