CONTRIBUTION.md: some sentences about LLM #22004
CONTRIBUTION.md: some sentences about LLM #22004kfessel wants to merge 17 commits intoRIOT-OS:masterfrom
Conversation
AnnsAnns
left a comment
There was a problem hiding this comment.
I'm sorry if this looks like a lot but we agreed to look at the details within the PR so I did 😛 Thank you for the PR though.
.github/PULL_REQUEST_TEMPLATE.md
Outdated
| Declaration of AI-Toolusage: | ||
| [] This PR was written using LLM assistance, | ||
| [] the LLM was tasked to produce whole file(s), | ||
| [] the LLM was tasked to produce function(s), | ||
| [] the LLM was tasked to produce codeblocks or snippets, | ||
| [] the LLM was used to check the code, | ||
| [] the LLM was used to produce documentation, | ||
| [] the LLM was used to spellcheck. | ||
| Tools that where used are: |
There was a problem hiding this comment.
I'd personally argue that this is a bit too much, the differentiators don't feel significant enough to have 7 different options
There was a problem hiding this comment.
If we only would have "this PR was written using LLM assistance" and assume not knowlegebase based AI was used other wise LLM spell checker would carry the same sentence as "I like a riot driver for IC128 can you produce a PR for me"
There was a problem hiding this comment.
What's the consequence of selecting one of these options though?
Do we collect statistics, is there some action associated with one of those 'levels'?
CONTRIBUTING.md
Outdated
| Large language models are known to sometimes reproduce large amounts of code, | ||
| that was feed into their database as such it might produce a copy and | ||
| the pervious sentence must be applied. |
There was a problem hiding this comment.
I don't really understand this sentence
that was feed into their database as such it might produce a copy and the pervious sentence must be applied.
There was a problem hiding this comment.
llm ... reproduce large amounts of code that was feed into their database.
i think the , is wrong
CONTRIBUTING.md
Outdated
|
|
||
|
|
||
| If a PR was written using tooling that is able to produce large of code from a knowlegebase | ||
| (e.g.: large language model assisted tools like copilot, devin or cursor or LLM like GPT, Ollama) |
There was a problem hiding this comment.
Isn't Ollama a tool to run LLMs itself?
There was a problem hiding this comment.
i wanted the example list to convey that it is about full tools and also about basic tools like a chat or any other LLM run locally ( the reason for mentioning ollama) or remote (gpt)
(running local does not change the rights issue, having a tool integrated in your editor does not change the issue)
as well doesn't the opposite remote or less integration
There was a problem hiding this comment.
I'd restructure the sentence to something like:
If a PR was written using tooling that is able to produce code without attribution
(e.g.: LLM tools like Copilot, Devin, Cursor, Claude, ChatGPT or Ollama)
Co-authored-by: Ann🐸 <git@annsann.eu>
.github/PULL_REQUEST_TEMPLATE.md
Outdated
| --> | ||
|
|
||
| Declaration of AI-Toolusage: | ||
| [] This PR was written using LLM assistance, |
There was a problem hiding this comment.
This PR was written using LLM assistance to produce whole file(s), function(s), codeblocks or snippets, to check the code, produce documentation, spellcheck.
Than you for the review this PR needs input (-> published as draft) |
.github/PULL_REQUEST_TEMPLATE.md
Outdated
| This PR was written using LLM assistance: | ||
| - [ ] the LLM was tasked to produce whole file(s), | ||
| - [ ] the LLM was tasked to produce function(s), | ||
| - [ ] the LLM was tasked to produce codeblocks or snippets, | ||
| - [ ] the LLM was used to check the code, | ||
| - [ ] the LLM was used to produce documentation, | ||
| - [ ] the LLM was used to spellcheck.- [ ] This PR was written using LLM assistance, | ||
| - [ ] the LLM was tasked to produce whole file(s), | ||
| - [ ] the LLM was tasked to produce function(s), | ||
| - [ ] the LLM was tasked to produce codeblocks or snippets, | ||
| - [ ] the LLM was used to check the code, | ||
| - [ ] the LLM was used to produce documentation, | ||
| - [ ] the LLM was used to spellcheck. |
There was a problem hiding this comment.
| This PR was written using LLM assistance: | |
| - [ ] the LLM was tasked to produce whole file(s), | |
| - [ ] the LLM was tasked to produce function(s), | |
| - [ ] the LLM was tasked to produce codeblocks or snippets, | |
| - [ ] the LLM was used to check the code, | |
| - [ ] the LLM was used to produce documentation, | |
| - [ ] the LLM was used to spellcheck.- [ ] This PR was written using LLM assistance, | |
| - [ ] the LLM was tasked to produce whole file(s), | |
| - [ ] the LLM was tasked to produce function(s), | |
| - [ ] the LLM was tasked to produce codeblocks or snippets, | |
| - [ ] the LLM was used to check the code, | |
| - [ ] the LLM was used to produce documentation, | |
| - [ ] the LLM was used to spellcheck. | |
| This PR was written using LLM assistance: | |
| - [ ] the LLM was tasked to produce whole file(s), | |
| - [ ] the LLM was tasked to produce function(s), | |
| - [ ] the LLM was tasked to produce codeblocks or snippets, | |
| - [ ] the LLM was used to check the code, | |
| - [ ] the LLM was used to produce documentation, | |
| - [ ] the LLM was used to spellcheck. |
Does it really matter if a regular spellchecker or an LLM was used to spellcheck? If not then I don't think the spellcheck checkbox is needed.
There was a problem hiding this comment.
Does it really matter if a regular spellchecker or an LLM was used to spellcheck? If not then I don't think the spellcheck checkbox is needed.
Sorry, just saw #22004 (comment).
Co-authored-by: Elena Frank <elena.frank@proton.me>
Co-authored-by: Michael Richardson <mcr@sandelman.ca>
.github/PULL_REQUEST_TEMPLATE.md
Outdated
| - This PR was written using LLM assistance: | ||
| - [ ] the LLM was tasked to produce whole file(s), | ||
| - [ ] the LLM was tasked to produce function(s), | ||
| - [ ] the LLM was tasked to produce codeblocks or snippets, | ||
| - [ ] the LLM was used to check the code, | ||
| - [ ] the LLM was used to produce documentation, | ||
| - [ ] the LLM was used to spellcheck. | ||
| - No LLM ("AI") was used in any way. |
There was a problem hiding this comment.
| - This PR was written using LLM assistance: | |
| - [ ] the LLM was tasked to produce whole file(s), | |
| - [ ] the LLM was tasked to produce function(s), | |
| - [ ] the LLM was tasked to produce codeblocks or snippets, | |
| - [ ] the LLM was used to check the code, | |
| - [ ] the LLM was used to produce documentation, | |
| - [ ] the LLM was used to spellcheck. | |
| - No LLM ("AI") was used in any way. | |
| - [ ] An LLM was tasked to produce whole file(s), | |
| - [ ] An LLM was tasked to produce function(s), | |
| - [ ] An LLM was tasked to produce codeblocks or snippets, | |
| - [ ] An LLM was used to check the code, | |
| - [ ] An LLM was used to produce documentation, | |
| - [ ] An LLM was used to spellcheck. | |
| - [ ] No LLM ("AI") was used in any way. |
how about this? I'd argue it is easier to check a checkbox than to remove one or several lines, I already see a lot of PRs with both bullet points still present.
On the other hand, this might also clutter the PR description of every PR, even those that declare "no AI usage" a bit too much.
Btw, why not generalizing to "AI" instead of saying LLM?
There was a problem hiding this comment.
To reduce the clutter, we could maybe merge the first three points, and people can remove parts of the sentence if it does not apply.
And I agree with @elenaf9 that spellchecking may not be worth declaring, so we could remove that and maybe change the last line to allow spellchecking.
There was a problem hiding this comment.
I'd also merge the first three points + remove "check code" & "spellcheck"
There was a problem hiding this comment.
I already see a lot of PRs with both bullet points still present.
I think this is a feature indicating that the pr author did not yet check.
There was a problem hiding this comment.
If we join the frist three how do we handle "Threshold of originality" (De: Schöpfungshöhe) and concepts like fair use.
There was a problem hiding this comment.
Or would we just assume not enough originality if "A LLM was used to produce whole file(s), function(s), codeblocks or snippets."
There was a problem hiding this comment.
Btw, why not generalizing to "AI" instead of saying LLM?
AI is a meaningless marketing term at this point.
LLM is a very specific technical thing.
There was a problem hiding this comment.
To reduce the clutter, we could maybe merge the first three points, and people can remove parts of the sentence if it does not apply.
Or would we just assume not enough originality if "A LLM was used to produce whole file(s), function(s), codeblocks or snippets."
IMHO we should still differentiate between generating whole files vs function/ codeblocks / snippets.
The former definitely violates originality and could be reason to reject a PR, the latter not necessarily?
Edit: my logic behind this differentiation is that usually (independently of LLM usage) if I just add function or code snipped to a file I wouldn't add myself to the authors list in the header, but if I write a whole file I'd have the copy-right.
AnnsAnns
left a comment
There was a problem hiding this comment.
Second review (Some comments from the first review are still open)
.github/PULL_REQUEST_TEMPLATE.md
Outdated
| --> | ||
|
|
||
| ### Declaration of AI-Toolusage: | ||
| <!-- You may delete the either sentence in case. --> |
There was a problem hiding this comment.
| <!-- You may delete the either sentence in case. --> | |
| <!-- You may delete the either sentence in case. --> |
Whitespace (I also don't really understand the sentence itself)
There was a problem hiding this comment.
Is a hint to the PR author that they do not need to declare LLM use if they didn't.
following it are two senteces
- 'This PR was written using LLM assistance ... ' ( the sentence goes on listing option of llm use)
the other - 'No LLM ("AI") was used in any way.'
only one of then is true -> the other one may be deleted when creating the PR
|
|
||
| ## Copyright and AI | ||
|
|
||
| RIOT itself applies the LGPL license, see [LICENSE.md], to most of its code exclusively, |
There was a problem hiding this comment.
| RIOT itself applies the LGPL license, see [LICENSE.md], to most of its code exclusively, | |
| RIOT is licensed under the LGPL license, see [LICENSE.md], unless stated otherwise. |
| ## Copyright and AI | ||
|
|
||
| RIOT itself applies the LGPL license, see [LICENSE.md], to most of its code exclusively, | ||
| authors of PRs are assumed to do so as well (not necessary exclusive). |
There was a problem hiding this comment.
| authors of PRs are assumed to do so as well (not necessary exclusive). |
I'd honestly just remove this, the same statement is covered in the next sentence.
CONTRIBUTING.md
Outdated
| Large language models (LLM, "AI") are known to sometimes reproduce large amounts of code | ||
| that was feed into their database as such it might produce a copy and | ||
| the previous sentence must be applied. | ||
| When reworking a PR and copying its code to a new one the author of that code still is the original author | ||
| and their rights to the code must be respected such as naming them and keeping the license. |
There was a problem hiding this comment.
| Large language models (LLM, "AI") are known to sometimes reproduce large amounts of code | |
| that was feed into their database as such it might produce a copy and | |
| the previous sentence must be applied. | |
| When reworking a PR and copying its code to a new one the author of that code still is the original author | |
| and their rights to the code must be respected such as naming them and keeping the license. | |
| Large language models (LLM, "AI") are known to sometimes | |
| reproduce large amounts of code that was fed into their database. | |
| As such it might produce copies of copyrighted works | |
| which do not comply with the license of RIOT. | |
| When reworking a PR or copying code to a new one | |
| the author of that code continues to be the original author | |
| and their rights to the code must be respected. | |
| This includes naming them and respecting their license. |
There was a problem hiding this comment.
| Large language models (LLM, "AI") are known to sometimes reproduce large amounts of code | |
| that was feed into their database as such it might produce a copy and | |
| the previous sentence must be applied. | |
| When reworking a PR and copying its code to a new one the author of that code still is the original author | |
| and their rights to the code must be respected such as naming them and keeping the license. | |
| Large language models (LLM, "AI") are known to sometimes | |
| reproduce large amounts of code that was fed into their database. | |
| As such it might produce copies of works that are | |
| [original](https://en.wikipedia.org/wiki/Threshold_of_originality) | |
| and already licensed incompatible to RIOT or are not licensed at all. | |
| When reworking a PR or copying code to a new one | |
| the author of that code continues to be the original author | |
| and their rights to the code must be respected. | |
| This includes naming them and respecting their license. |
There was a problem hiding this comment.
Read work as work that passes the "Threshold of originality".
It is about compiling with the license of the thing that was copied not about RIOTs license. For RIOT the work must be either co- or re- license-able to LGPL or keep its license and be link-able (vendor files) so the original license must allow for that.
There also is no not protected work if original (in germany, in the US its possible to declare your own work public domain)
If a work has no license it is unlicensed -> its use is unlicensed. A work does not become copyrighted by declaring copyright but by being created (even in the US) if a work is created by an unknown author with no copyright declared they still got the copyright (until sold) and the moral right (until death (in Europe for N years) (or US where they can be sold or public domain))
Adding a copyright note helps your case in court (Notes are assumed to be true unless proven) but the authors rights are there even without (just need to prove) and without a license a original work is protected (from many things).
Co-authored-by: mguetschow <mikolai.guetschow@tu-dresden.de> Co-authored-by: Ann🐸 <git@annsann.eu>
Co-authored-by: Martine Lenders <martine.lenders@tu-dresden.de>
| authors of PRs are assumed to do so as well (not necessary exclusive). | ||
| When a PR is published it must also respect all authorship-rights, copyrights and | ||
| licenses of code it uses. If code is copied the original author usually must be named and | ||
| the original license must be kept, unless the original license states differently. |
There was a problem hiding this comment.
I think we should tone it down a bit for the common pattern of copying an existing board / driver /example and starting from there.
We aren't interested in the pedigree of the boilerplate, if it ends up being a complete rewrite the author of the skeleton is just a distraction.
I've seen this a couple of times in the past and always suggested to drop the old @author when they obviously had nothing to do with the new module (and might get asked to support it).
Co-authored-by: Martine Lenders <martine.lenders@tu-dresden.de>
Contribution description
Testing procedure
testing the declaration
Declaration of AI-Toolusage 1 :
Tools that were used are:
none
Declaration of AI-Toolusage 2 :
Tools that were used are:
none
Declaration of AI-Toolusage 3:
Note
Please delete the sentences that are untrue
Tools that were used are:
none
Declaration of AI-Toolusage short:
Note
Please delete the sentences that are untrue
Tools that were used are:
none
Declaration of AI-Toolusage supershort:
Note
Please remove the parts of sentences that are untrue
A LLM was used to produce whole file(s), function(s), codeblocks, snippets, documentation, check the code and spelling.
AI-Tools / LLMs that were used are:
Declaration of Generative AI-Tool-usage weekly short:
Note
Please delete the sentences and parts of them that are untrue.
Please add information (e.g., name tools or technologies) if needed.
Tools that were used are:
none
Issues/PRs references
meeting-notes