-
Notifications
You must be signed in to change notification settings - Fork 906
[PM-14880] ci: Add automated PR labelling based on file paths and title patterns #6157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Claude finished @vvolkgang's task —— View job Code Review CompleteOverall Assessment: APPROVE This PR introduces a well-structured automated PR labeling system for CI/CD. The implementation is clean, uses appropriate tooling (Python + GitHub CLI), and includes sensible configuration management. The code quality is solid for an infrastructure change. Review Findings: I've identified a few areas for improvement in inline comments: Warnings (
Improvements (🎨):
Questions (💭):
Previous Review Status: PR Description Quality: See inline comments for detailed recommendations. |
|
Great job! No new security vulnerabilities introduced in this pull request |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6157 +/- ##
=======================================
Coverage 84.97% 84.97%
=======================================
Files 735 735
Lines 53087 53087
Branches 7678 7678
=======================================
Hits 45113 45113
Misses 5291 5291
Partials 2683 2683 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1ba0d46 to
1e39e26
Compare
| LABEL_PATH_PATTERNS = config["path_patterns"] | ||
|
|
||
| pr_number = args.pr_number | ||
| mode = "replace" if args.replace else "add" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
The implementation defaults to "add" when neither flag is provided:
mode = "replace" if args.replace else "add"However, the documentation at lines 11-12 states:
-a, --add: Add labels without removing existing ones (default)
-r, --replace: Replace all existing labelsThis creates confusion about actual vs documented behavior. The workflow at .github/workflows/sdlc-label-pr.yml:68-69 also uses --replace as the default for normal PRs, suggesting replace should be the default.
Recommendation:
Update documentation to reflect actual behavior:
-a, --add: Add labels without removing existing ones
-r, --replace: Replace all existing labels (default for normal PRs)Or update DEFAULT_MODE constant at line 30 to match documentation.
| "network/", | ||
| "ui/", | ||
| "authenticatorbridge/", | ||
| "gradle/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Pattern overlap: gradle/ appears in both app:shared and t:deps
Details
The gradle/ directory pattern is defined in:
app:shared(line 22): Triggersapp:password-manager+app:authenticatorlabelst:deps(line 43): Triggerst:depslabel
When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.
This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.
| ) | ||
| changed_files = result.stdout.strip().split("\n") | ||
| return list(filter(None, changed_files)) | ||
| except subprocess.CalledProcessError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
When gh pr diff fails, the function prints an error message but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.
The caller at line 212 doesn't distinguish between:
- API failure (should potentially fail the workflow)
- Legitimately no files changed (should continue with title-based labeling)
Current behavior:
changed_files = gh_get_changed_files(pr_number) # Returns [] on error
print("👀 Changed files:\n" + "\n".join(changed_files) + "\n") # Prints empty listRecommendation:
Either:
- Exit the script on critical failures:
def gh_get_changed_files(pr_number: str) -> list[str]:
try:
# ... existing code ...
except subprocess.CalledProcessError as e:
print(f"::error::Error getting changed files: {e}")
sys.exit(1) # Critical failure- Or add explicit validation after the call:
changed_files = gh_get_changed_files(pr_number)
if changed_files is None: # Return None on error instead of []
print("::error::Failed to retrieve changed files")
sys.exit(1)Same applies to gh_get_pr_title() at line 93.
| GH_TOKEN: ${{ github.token }} | ||
| _PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }} | ||
| _LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }} | ||
| _DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dry-run is false
Details
When inputs.dry-run is false, the _DRY_RUN variable is set to an empty string '':
_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}This empty string is then passed as a positional argument to the Python script:
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"When _DRY_RUN="", the shell expands this to:
python3 .github/scripts/label-pr.py "1234" "--replace" ""The Python script receives an empty string as the third argument, which argparse treats as an unknown argument. The script handles this with parse_known_args() at line 191, but this is a workaround for a shell quoting issue.
Recommendation:
Only pass _DRY_RUN if it's not empty:
- name: Label PR based on changed files
env:
GH_TOKEN: ${{ github.token }}
_PR_NUMBER: ${{ inputs.pr-number || github.event.pull_request.number }}
_LABEL_MODE: ${{ inputs.mode && format('--{0}', inputs.mode) || steps.label-mode.outputs.label_mode }}
_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}
run: |
if [ -n "$_DRY_RUN" ]; then
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"
else
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE"
fiThis eliminates the need for the parse_known_args() workaround.
| title_labels = label_title(pr_title, LABEL_TITLE_PATTERNS) | ||
| all_labels = set(filepath_labels + title_labels) | ||
|
|
||
| if not any(label.startswith("t:") for label in all_labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Catch-all label logic may produce unexpected results
Details
The catch-all label (t:misc) is added only when there are no t: prefixed labels:
if not any(label.startswith("t:") for label in all_labels):
all_labels.add(CATCH_ALL_LABEL)Scenario: A PR that:
- Modifies
app/MainActivity.kt→ getsapp:password-manager - Has title "Update MainActivity layout" → no title pattern match
Result: Only app:password-manager label, no type label. Then t:misc is added.
Question: Should the catch-all only apply when both filepath and title labeling fail to produce type labels, or is the current behavior (adding t:misc whenever no type label exists) correct?
Consider if this edge case needs explicit documentation or if the logic should check:
# Only add catch-all if BOTH filepath and title labeling found no type labels
if not title_labels and not any(label.startswith("t:") for label in filepath_labels):
all_labels.add(CATCH_ALL_LABEL)Current behavior seems reasonable, but worth confirming intent.
| for label, patterns in title_patterns.items(): | ||
| for pattern in patterns: | ||
| # Check for pattern with : or ( suffix (conventional commits format) | ||
| if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 Title pattern matching could be more precise
Details
Current logic checks if pattern appears anywhere in the title with : or ( suffix:
if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:This could produce false positives. Examples:
- Pattern
"fix"matches title"Add prefix: bug"(contains"fix:"in"prefix:") - Pattern
"feat"matches title"defeat: the enemy"(contains"feat:")
Recommendation:
Use more precise matching to ensure pattern is at start of title or after whitespace:
# Check for pattern at beginning or after whitespace
import re
pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]"
if re.search(pattern_regex, title_lower):
print(f"📝 Title matches pattern '{pattern}' for label '{label}'")
labels_to_apply.add(label)
breakOr simpler string-based approach:
# Check pattern is at start or preceded by space
if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \
f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:Current implementation works for well-formatted conventional commits, but could be more robust.
| name: SDLC / Label PR by Files | ||
|
|
||
| on: | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Workflow only supports manual dispatch - PR events not yet configured
Details
The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:
pull_requestworkflow target will be added in a future PR after testing withworkflow_dispatch.
This is fine for initial testing, but worth noting that automated labeling won't occur until the trigger is updated to something like:
on:
pull_request:
types: [opened, synchronize, reopened]
pull_request_target: # For fork PRs
types: [opened, synchronize, reopened]
workflow_dispatch:
# ... keep existing inputs for manual testingSecurity consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running untrusted code, or limit to trusted users only.
Current implementation is safe for testing phase. Just flagging for future work.
| import subprocess | ||
| import sys | ||
|
|
||
| DEFAULT_MODE = "add" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
The constant DEFAULT_MODE = "add" and documentation at lines 11-12 state that --add is the default:
-a, --add: Add labels without removing existing ones (default)
-r, --replace: Replace all existing labelsHowever, the implementation at line 202 defaults to "add" only when --add is explicitly provided, otherwise it uses "replace":
mode = "replace" if args.replace else "add"This means when neither flag is provided, the mode is "add", which actually matches the documentation. However, the workflow at .github/workflows/sdlc-label-pr.yml:68-69 uses --replace as the default for normal PRs, creating confusion about intended behavior.
Recommendation: Clarify whether the default should be "add" or "replace" and update either the documentation or the workflow accordingly for consistency.
| ) | ||
| changed_files = result.stdout.strip().split("\n") | ||
| return list(filter(None, changed_files)) | ||
| except subprocess.CalledProcessError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details
When gh pr diff fails, the function prints an error but returns an empty list [], which is indistinguishable from a PR with no changed files. This could mask genuine API failures.
The caller at line 212 cannot distinguish between:
- API failure (should potentially fail the workflow)
- Legitimately no files changed (should continue with title-based labeling)
Current behavior:
changed_files = gh_get_changed_files(pr_number) # Returns [] on error
print("👀 Changed files:\n" + "\n".join(changed_files) + "\n") # Prints empty listRecommendation: Either exit on critical failures or return None on error and handle it explicitly:
def gh_get_changed_files(pr_number: str) -> list[str] | None:
try:
# ... existing code ...
except subprocess.CalledProcessError as e:
print(f"::error::Error getting changed files: {e}")
return None # Signal error explicitly
# At call site:
changed_files = gh_get_changed_files(pr_number)
if changed_files is None:
print("::error::Failed to retrieve changed files")
sys.exit(1)Same issue exists in gh_get_pr_title() at line 92.
| for label, patterns in title_patterns.items(): | ||
| for pattern in patterns: | ||
| # Check for pattern with : or ( suffix (conventional commits format) | ||
| if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎨 Title pattern matching could produce false positives.
Details
Current logic checks if pattern appears anywhere in the title:
if f"{pattern}:" in title_lower or f"{pattern}(" in title_lower:This could match incorrectly. Examples:
- Pattern
"fix"matches title"Add prefix: bug"(contains"fix:"within"prefix:") - Pattern
"feat"matches title"defeat: the enemy"(contains"feat:"within"defeat:")
Recommendation: Use more precise matching to ensure pattern is at word boundary:
import re
# At function level, check pattern at start or after whitespace
pattern_regex = rf"(^|\s){re.escape(pattern)}[:(]"
if re.search(pattern_regex, title_lower):
print(f"📝 Title matches pattern '{pattern}' for label '{label}'")
labels_to_apply.add(label)
breakOr simpler string-based approach:
# Check pattern is at start or preceded by space
if title_lower.startswith(f"{pattern}:") or title_lower.startswith(f"{pattern}(") or \
f" {pattern}:" in title_lower or f" {pattern}(" in title_lower:Current implementation works for well-formatted conventional commits, but could be more robust.
| default=DEFAULT_CONFIG_PATH, | ||
| help=f"Path to JSON config file (default: {DEFAULT_CONFIG_PATH})" | ||
| ) | ||
| args, unknown = parser.parse_known_args() # required to handle --dry-run passed as an empty string ("") by the workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 parse_known_args() is a workaround for shell quoting issue in workflow.
Details
The comment states this is "required to handle --dry-run passed as an empty string".
This happens because the workflow at .github/workflows/sdlc-label-pr.yml:76-79 passes $_DRY_RUN which can be an empty string when dry-run is false:
_DRY_RUN: ${{ inputs.dry-run == true && '--dry-run' || '' }}When _DRY_RUN="", the shell command becomes:
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" ""The empty string is passed as a positional argument, which argparse treats as unknown.
Recommendation: Fix at the workflow level instead of working around in Python:
run: |
if [ -n "$_DRY_RUN" ]; then
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE" "$_DRY_RUN"
else
python3 .github/scripts/label-pr.py "$_PR_NUMBER" "$_LABEL_MODE"
fiThis eliminates the need for parse_known_args() and makes the interface cleaner.
| "network/", | ||
| "ui/", | ||
| "authenticatorbridge/", | ||
| "gradle/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Pattern overlap: gradle/ appears in both app:shared and t:deps.
Details
The gradle/ directory pattern is defined in:
app:shared(line 22): Triggers expansion toapp:password-manager+app:authenticatorlabelst:deps(line 43): Triggerst:depslabel
When a PR modifies gradle/ files, it will receive three labels: app:password-manager, app:authenticator, and t:deps.
This might be intentional (gradle changes affect both apps and are dependency-related), but worth confirming this is the desired behavior. If gradle changes should only be labeled as t:deps, remove it from the app:shared list.
| @@ -0,0 +1,80 @@ | |||
| name: SDLC / Label PR by Files | |||
|
|
|||
| on: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Workflow only supports manual dispatch - automated triggers not yet configured.
Details
The workflow currently only triggers via workflow_dispatch, requiring manual execution. The PR description mentions:
pull_requestworkflow target will be added in a future PR after testing withworkflow_dispatch.
This is appropriate for initial testing. When adding automated triggers, consider:
on:
pull_request:
types: [opened, synchronize, reopened]
pull_request_target: # For fork PRs with write permissions
types: [opened, synchronize, reopened]
workflow_dispatch:
# ... keep existing inputs for manual testingSecurity consideration: When adding pull_request_target, ensure the workflow checks out the base branch (not PR branch) before running any code from the PR, or validate that the Python script execution is safe for untrusted PRs.
Current implementation is safe for testing phase.

🎟️ Tracking
PM-14880
📔 Objective
Automate pull request labelling based on changed file paths and PR title patterns following conventional commits format, as a first step towards release notes with automated ✨ categories ✨ !
.github/label-pr.json- planned using yaml initially but it would require an external dependency, introducing the need for python environment and dependency management. Python supports json deserialization out of the box. File based config will allow us to centralise these scripts in our gh-actions repo later down the line.Note
pull_requestworkflow target will be added in a future PR after testing withworkflow_dispatch.t:miscafter much deliberation, due to (1) conciseness and (2) it's just a fallback for unhandled exceptions which we should monitor and address, not expecting us to use it much.Labels
App Labels (from file paths):
app:password-manager- Password manager app changesapp:authenticator- Authenticator app changesChange Type Labels:
t:new-feature- New product featurest:enhancement- Existing product feature enhancementst:bug- Bug fixest:tech-debt- Technical debt, refactoring, cleanupt:docs- Documentation changest:ci- CI/CD changest:deps- Dependency updatest:breaking-change- Breaking changest:misc- Miscellaneous (catch-all)⏰ Reminders before review
🦮 Reviewer guidelines
:+1:) or similar for great changes:memo:) or ℹ️ (:information_source:) for notes or general info:question:) for questions:thinking:) or 💭 (:thought_balloon:) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:) for suggestions / improvements:x:) or:warning:) for more significant problems or concerns needing attention:seedling:) or ♻️ (:recycle:) for future improvements or indications of technical debt:pick:) for minor or nitpick changes