Skip to content

Conversation

@chadlwilson
Copy link
Contributor

This does a few clean-ups for correctness to the normalizer rules

  • synchronises regexes/rules between the two bundles
    • this mainly fixes some regex style to prefer character classes to (x|a) groups and do it consistently.
    • makes normalizing of EDL > BSD-3-Clause consistent as originally done in Update SPDX normalizations #336
    • Correct Apache 1.0 to be matched separately to Apache 1.1 in both bundles. These are two different licenses, with two different SPDX IDs, differing in attribution requirements.
  • removes a couple of duplicate/unnecessary regexes where a looser match has been later added as a new rule
  • removes Public-Domain from SPDX normalizer. This is not an SPDX ID, so this is wrong/misleading. We need proper rules for various "public domain" variants IMHO as you will see there are lots of public domain variants. independent of the true "public domain" meaning.
  • removes Creative Commons from matching Public Domain in the default ruleset. I cant find why this was added, but this is really not correct. There are multiple Public Domain CC (and non-CC) licenses, and many other licenses with and without attribution within CC but this regex is extremely broad. Arguably the other rules should be removed too since they are overly broad on content, but have left here for now.
  • Use non-deprecated SPDX IDs for GPL-2.0-only WITH Classpath-Exception-2.0. GPL-2.0 and such are deprecated. Matching rules are unchanged

These rules have diverged somewhat. Synchronising them improves consistency between usages.
…sed on content

There are various creative commons licenses, and it does not seem correct to normalise them to the somewhat dubious "PUBLIC DOMAIN" overall license. There are also a number of different public domain licenses recognised within SPDX including the CC ones (https://spdx.org/licenses), so this does not seem appropriate.

We should remove these and add appropriate rules on license content as licenses are re-analyzed and matched to CC PD variants.
…tifiers

These identifiers have been deprecated for some time now, as they are ambiguous. Although in practice the Java/Sun/Oracle projects using this license are technically "GPL-2.0-or-later" when you read the license text, it is impossible to know that from the license name, so better to just assume GPL-2.0-only (allowing `-or-later` licenses is more liberal/flexible)
@jk1 jk1 merged commit f93ec5b into jk1:master Nov 2, 2025
1 check passed
@jk1
Copy link
Owner

jk1 commented Nov 2, 2025

Thanks, merged!

@chadlwilson chadlwilson deleted the cleanup-normalizer-rules branch November 2, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants