doc: introduce guidelines for AI-generated contributions by flyingImer · Pull Request #3948 · apache/polaris

flyingImer · 2026-03-06T22:54:50Z

This PR proposes adding a short guideline around AI-assisted contributions. The goal is not to restrict how contributors use development tools, but to clarify contributor accountability.

In particular, the guideline emphasizes that the human submitting a pull request:

remains the author of the change
must understand the implementation end-to-end
must be able to justify the design and code during review

This is similar in spirit to recent discussions and updates in other ASF projects (for example Apache Iceberg). This PR is intended as a starting point for discussion and wording refinement.

Checklist

🛡️ Don't disclose security issues! (contact security@apache.org)
🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
🧪 Added/updated tests with good coverage, or manually tested (and explained how)
💡 Added comments for complex logic
🧾 Updated CHANGELOG.md (if needed)
📚 Updated documentation in site/content/in-dev/unreleased (if needed)

dimas-b

Thanks for driving this initiative, @flyingImer !

CONTRIBUTING.md

flyrain

Thanks @flyingImer for driving this. LGTM.

dimas-b · 2026-03-12T17:11:55Z

@jbonofre : wdyt?

flyrain · 2026-03-12T17:14:25Z

CONTRIBUTING.md

+* Regardless of how a change is produced, the individual submitting the pull request is considered the author of the contribution and is fully responsible for it.
+* The pull request author **must understand the implementation end-to-end** and be able to **explain and justify the design and code** during review.
+* Tools, including AI systems, **are not** considered contributors. **Responsibility and authorship remain with the human** submitting the change.
+* Contributors are encouraged to **disclose** significant AI assistance in the pull request description for transparency.


Given that AI generated code and PRs put a lot of burden on the reviewers. Should we add something like

Authors must ensure that AI assisted pull requests remain focused, well explained, and of a size that can be reasonably reviewed by humans. Submissions that significantly increase reviewer burden may be asked to be reduced in scope or revised before review.

That applies to all PRs, IMHO 😉

it is, however, generated code tends to cover more things than manually written code, which make the situation worse. Also an explicit rule make it clear that keeping PRs smaller are prefereed

Something important also to remember is that we can't have copyright on generated source (AI or other tool).
So ASF header should not be added on a class where the majority of the code is generated.

Maybe we should ask to put a header like:

/** * This code has been generated by AI Foo Model */

The header problem exists, I think. So I think we need to resolve it and provide clear guidelines for contributors who want to use AI-generated code. Deferring it will make reviewing PRs with known AI-generated code pretty much impossible.

Re: comparison with OpenAPI code generation, from my POV the key distinction is that build-time tools (like OpenAPI generator) produce code that is not meant to be edited after generation. Any updates to sources require re-generating that code. On the other hand, AI-generated code is meant to be committed to the source repo like any human-written code, so it will be subject to editing and copying by (other) people after the initial contribution. Not having a copyright header will make the status of subsequent manual edits obscure. I believe manual edits should be covered by the normal ASF license, so a header would be valuable in these files... WDYT?

I agree that this discussion is growing in scope, but I think it is beneficial to the project even if it takes time to figure this out.

@dimas-b @jbonofre I agree the header/copyright question is important! My concern is that this feels bigger than a Polaris-local CONTRIBUTING.md rule.

My bias is that Polaris should not try to define project-specific copyright/header treatment for AI-generated code ahead of clearer ASF-level guidance or discussion there.

For this PR, I’d prefer to keep the scope narrow: the submitting human remains accountable, contributors must understand and stand behind the change, and submissions should follow the ASF generative tooling guidance.

If we still feel project-specific header/provenance rules are needed after that, I think that should be handled as a separate follow-up discussion.

I'm inclined to merge as is and bring the discussion to ASF level. WDYT?

From my POV if we encourage contributors to disclose AI contributions and at the same time we're not sure about how those contributions are supposed to be incorporated into the codebase, we're putting reviewers into a logical deadlock.

The encouraging disclosure part isn't practical AFAIK. I'd remove them if it causes issues.

The discussion already happens at ASF level.
I'm fine to merge as is, but the question I have is: what are the "core" differences with https://www.apache.org/legal/generative-tooling.html ? More practical ?

doc: introduce guidelines for AI-generated contributions

4e5b80d

github-project-automation bot added this to Basic Kanban Board Mar 6, 2026

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Mar 6, 2026

jbonofre self-requested a review March 8, 2026 16:53

dimas-b reviewed Mar 9, 2026

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

Clarify responsibility for AI-generated contributions

b9ea08e

flyingImer marked this pull request as ready for review March 12, 2026 00:18

flyrain approved these changes Mar 12, 2026

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 12, 2026

dimas-b approved these changes Mar 12, 2026

View reviewed changes

flyrain reviewed Mar 12, 2026

View reviewed changes

flyingImer changed the title ~~[DRAFT] doc: introduce guidelines for AI-generated contributions~~ doc: introduce guidelines for AI-generated contributions Mar 12, 2026

Conversation

flyingImer commented Mar 6, 2026

Checklist

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

dimas-b commented Mar 12, 2026

Uh oh!

flyrain Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbonofre Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

flyrain Mar 12, 2026 •

edited

Loading

jbonofre Mar 12, 2026 •

edited

Loading