Skip to content

doc: introduce guidelines for AI-generated contributions#3948

Open
flyingImer wants to merge 2 commits intoapache:mainfrom
flyingImer:doc-ai-guidelines
Open

doc: introduce guidelines for AI-generated contributions#3948
flyingImer wants to merge 2 commits intoapache:mainfrom
flyingImer:doc-ai-guidelines

Conversation

@flyingImer
Copy link

This PR proposes adding a short guideline around AI-assisted contributions. The goal is not to restrict how contributors use development tools, but to clarify contributor accountability.

In particular, the guideline emphasizes that the human submitting a pull request:

  • remains the author of the change
  • must understand the implementation end-to-end
  • must be able to justify the design and code during review

This is similar in spirit to recent discussions and updates in other ASF projects (for example Apache Iceberg). This PR is intended as a starting point for discussion and wording refinement.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this initiative, @flyingImer !

@flyingImer flyingImer marked this pull request as ready for review March 12, 2026 00:18
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @flyingImer for driving this. LGTM.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 12, 2026
@dimas-b
Copy link
Contributor

dimas-b commented Mar 12, 2026

@jbonofre : wdyt?

* Regardless of how a change is produced, the individual submitting the pull request is considered the author of the contribution and is fully responsible for it.
* The pull request author **must understand the implementation end-to-end** and be able to **explain and justify the design and code** during review.
* Tools, including AI systems, **are not** considered contributors. **Responsibility and authorship remain with the human** submitting the change.
* Contributors are encouraged to **disclose** significant AI assistance in the pull request description for transparency.
Copy link
Contributor

@flyrain flyrain Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that AI generated code and PRs put a lot of burden on the reviewers. Should we add something like

Authors must ensure that AI assisted pull requests remain focused, well explained, and of a size that can be reasonably reviewed by humans. Submissions that significantly increase reviewer burden may be asked to be reduced in scope or revised before review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That applies to all PRs, IMHO 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, however, generated code tends to cover more things than manually written code, which make the situation worse. Also an explicit rule make it clear that keeping PRs smaller are prefereed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something important also to remember is that we can't have copyright on generated source (AI or other tool).
So ASF header should not be added on a class where the majority of the code is generated.

Copy link
Member

@jbonofre jbonofre Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should ask to put a header like:

/**
 * This code has been generated by AI Foo Model
 */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header problem exists, I think. So I think we need to resolve it and provide clear guidelines for contributors who want to use AI-generated code. Deferring it will make reviewing PRs with known AI-generated code pretty much impossible.

Re: comparison with OpenAPI code generation, from my POV the key distinction is that build-time tools (like OpenAPI generator) produce code that is not meant to be edited after generation. Any updates to sources require re-generating that code. On the other hand, AI-generated code is meant to be committed to the source repo like any human-written code, so it will be subject to editing and copying by (other) people after the initial contribution. Not having a copyright header will make the status of subsequent manual edits obscure. I believe manual edits should be covered by the normal ASF license, so a header would be valuable in these files... WDYT?

I agree that this discussion is growing in scope, but I think it is beneficial to the project even if it takes time to figure this out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b @jbonofre I agree the header/copyright question is important! My concern is that this feels bigger than a Polaris-local CONTRIBUTING.md rule.

My bias is that Polaris should not try to define project-specific copyright/header treatment for AI-generated code ahead of clearer ASF-level guidance or discussion there.

For this PR, I’d prefer to keep the scope narrow: the submitting human remains accountable, contributors must understand and stand behind the change, and submissions should follow the ASF generative tooling guidance.

If we still feel project-specific header/provenance rules are needed after that, I think that should be handled as a separate follow-up discussion.

I'm inclined to merge as is and bring the discussion to ASF level. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my POV if we encourage contributors to disclose AI contributions and at the same time we're not sure about how those contributions are supposed to be incorporated into the codebase, we're putting reviewers into a logical deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encouraging disclosure part isn't practical AFAIK. I'd remove them if it causes issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion already happens at ASF level.
I'm fine to merge as is, but the question I have is: what are the "core" differences with https://www.apache.org/legal/generative-tooling.html ? More practical ?

@flyingImer flyingImer changed the title [DRAFT] doc: introduce guidelines for AI-generated contributions doc: introduce guidelines for AI-generated contributions Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants